Age | Commit message (Collapse) | Author | Files | Lines |
|
grab_mapping_entry() has a bug in handling of ENOMEM condition. Suppose
we have a PMD entry at index i which we are downgrading to a PTE entry.
grab_mapping_entry() will set pmd_downgrade to true, lock the entry, clear
the entry in xarray, and decrement mapping->nrpages. The it will call:
entry = dax_make_entry(pfn_to_pfn_t(0), flags);
dax_lock_entry(xas, entry);
which inserts new PTE entry into xarray. However this may fail allocating
the new node. We handle this by:
if (xas_nomem(xas, mapping_gfp_mask(mapping) & ~__GFP_HIGHMEM))
goto retry;
however pmd_downgrade stays set to true even though 'entry' returned from
get_unlocked_entry() will be NULL now. And we will go again through the
downgrade branch. This is mostly harmless except that mapping->nrpages is
decremented again and we temporarily have an invalid entry stored in
xarray. Fix the problem by setting pmd_downgrade to false each time we
lookup the entry we work with so that it matches the entry we found.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: b15cd800682f ("dax: Convert page fault handlers to XArray")
Signed-off-by: Jan Kara <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: "Aneesh Kumar K.V" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The variable ret is being initialized with a value that is never read, the
assignment is redundant and can be removed.
Addresses-Coverity: ("Unused value")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
simple_strtoull() is deprecated in some situation since it does not check
for the range overflow, use kstrtoull() instead.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Chen Huang <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In commit 60f91826ca62 ("buffer: Avoid setting buffer bits that are
already set"), function set_buffer_##name was added a test_bit() to check
buffer, which is the same as function buffer_##name. The
!buffer_uptodate(bh) here is a repeated check. Remove it.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Wan Jiabing <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The pointer queue is being initialized with a value that is never read and
it is being updated later with a new value. The initialization is
redundant and can be removed.
Addresses-Coverity: ("Unused value")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Colin Ian King <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The snprintf() function returns the number of bytes which would have been
printed if the buffer was large enough. In other words it can return ">=
remain" but this code assumes it returns "== remain".
The run time impact of this bug is not very severe. The next iteration
through the loop would trigger a WARN() when we pass a negative limit to
snprintf(). We would then return success instead of -E2BIG.
The kernel implementation of snprintf() will never return negatives so
there is no need to check and I have deleted that dead code.
Link: https://lkml.kernel.org/r/20210511135350.GV1955@kadam
Fixes: a860f6eb4c6a ("ocfs2: sysfile interfaces for online file check")
Fixes: 74ae4e104dfc ("ocfs2: Create stack glue sysfs files.")
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The list_head o2hb_node_events is initialized statically. It is
unnecessary to initialize by INIT_LIST_HEAD().
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Yingliang <[email protected]>
Reported-by: Hulk Robot <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Joseph Qi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Add an errors=panic mount option to make squashfs trigger a panic when
errors are encountered, similar to several other filesystems. This allows
a kernel dump to be saved using which the corruption can be analysed and
debugged.
Inspired by a pre-fs_context patch by Anton Eliasson.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Vincent Whitchurch <[email protected]>
Signed-off-by: Phillip Lougher <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When checking the file name attribute, we want to ensure that it fits
within the bounds of ATTR_RECORD. To do this, we should check that (attr
record + file name offset + file name length) < (attr record + attr record
length).
However, the original check did not include the file name offset in the
calculation. This means that corrupted on-disk metadata might not caught
by the incorrect file name check, and lead to an invalid memory access.
An example can be seen in the crash report of a memory corruption error
found by Syzbot:
https://syzkaller.appspot.com/bug?id=a1a1e379b225812688566745c3e2f7242bffc246
Adding the file name offset to the validity check fixes this error and
passes the Syzbot reproducer test.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
Reported-by: [email protected]
Tested-by: [email protected]
Acked-by: Anton Altaparmakov <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Instead of returning ENOLCK when we can't hand out a lease, we should be
returning EAGAIN.
Signed-off-by: Trond Myklebust <[email protected]>
|
|
If a file has already been closed, then it should not be selected to
support further I/O.
Signed-off-by: Trond Myklebust <[email protected]>
[Trond: Fix an invalid pointer deref reported by Colin Ian King]
|
|
Split __gfs2_unstuff_inode off from gfs2_unstuff_dinode and clean up the
code a little. All remaining callers now pass NULL as the page argument
of gfs2_unstuff_dinode, so remove that argument.
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
In gfs2_page_mkwrite, unstuff inodes before locking the page. That
way, we won't have to pass in the locked page to gfs2_unstuff_inode,
and gfs2_unstuff_inode can look up and lock the page itself.
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
We're setting an error number so that block_page_mkwrite_return
translates it into the corresponding VM_FAULT_* code in several places,
but this is getting confusing, so set the VM_FAULT_* codes directly
instead. (No change in functionality.)
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace rlimit handling update from Eric Biederman:
"This is the work mainly by Alexey Gladkov to limit rlimits to the
rlimits of the user that created a user namespace, and to allow users
to have stricter limits on the resources created within a user
namespace."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
cred: add missing return error code when set_cred_ucounts() failed
ucounts: Silence warning in dec_rlimit_ucounts
ucounts: Set ucount_max to the largest positive value the type can hold
kselftests: Add test to check for rlimit changes in different user namespaces
Reimplement RLIMIT_MEMLOCK on top of ucounts
Reimplement RLIMIT_SIGPENDING on top of ucounts
Reimplement RLIMIT_MSGQUEUE on top of ucounts
Reimplement RLIMIT_NPROC on top of ucounts
Use atomic_t for ucounts reference counting
Add a reference to ucounts for each cred
Increase size of ucounts to atomic_long_t
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux
Pull fallthrough fixes from Gustavo Silva:
"Fix many fall-through warnings when building with Clang 12.0.0 and
'-Wimplicit-fallthrough' so that we at some point will be able to
enable that warning by default"
* tag 'fallthrough-fixes-clang-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: (26 commits)
rxrpc: Fix fall-through warnings for Clang
drm/nouveau/clk: Fix fall-through warnings for Clang
drm/nouveau/therm: Fix fall-through warnings for Clang
drm/nouveau: Fix fall-through warnings for Clang
xfs: Fix fall-through warnings for Clang
xfrm: Fix fall-through warnings for Clang
tipc: Fix fall-through warnings for Clang
sctp: Fix fall-through warnings for Clang
rds: Fix fall-through warnings for Clang
net/packet: Fix fall-through warnings for Clang
net: netrom: Fix fall-through warnings for Clang
ide: Fix fall-through warnings for Clang
hwmon: (max6621) Fix fall-through warnings for Clang
hwmon: (corsair-cpro) Fix fall-through warnings for Clang
firewire: core: Fix fall-through warnings for Clang
braille_console: Fix fall-through warnings for Clang
ipv4: Fix fall-through warnings for Clang
qlcnic: Fix fall-through warnings for Clang
bnxt_en: Fix fall-through warnings for Clang
netxen_nic: Fix fall-through warnings for Clang
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore updates from Kees Cook:
"Use normal block device I/O path for pstore/blk. (Christoph Hellwig,
Kees Cook, Pu Lehui)"
* tag 'pstore-v5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/blk: Include zone in pstore_device_info
pstore/blk: Fix kerndoc and redundancy on blkdev param
pstore/blk: Use the normal block device I/O path
pstore/blk: Move verify_size() macro out of function
pstore/blk: Improve failure reporting
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs updates from David Sterba:
"A normal mix of improvements, core changes and features that user have
been missing or complaining about.
User visible changes:
- new sysfs exports:
- add sysfs knob to limit scrub IO bandwidth per device
- device stats are also available in
/sys/fs/btrfs/FSID/devinfo/DEVID/error_stats
- support cancellable resize and device delete ioctls
- change how the empty value is interpreted when setting a property,
so far we have only 'btrfs.compression' and we need to distinguish
a reset to defaults and setting "do not compress", in general the
empty value will always mean 'reset to defaults' for any other
property, for compression it's either 'no' or 'none' to forbid
compression
Performance improvements:
- no need for full sync when truncation does not touch extents,
reported run time change is -12%
- avoid unnecessary logging of xattrs during fast fsyncs (+17%
throughput, -17% runtime on xattr stress workload)
Core:
- preemptive flushing improvements and fixes
- adjust clamping logic on multi-threaded workloads to avoid
flushing too soon
- take into account global block reserve, may help on almost full
filesystems
- continue flushing when there are enough pending delalloc and
ordered bytes
- simplify logic around conditional transaction commit, a workaround
used in the past for throttling that's been superseded by ticket
reservations that manage the throttling in a better way
- subpage blocksize preparation:
- submit read time repair only for each corrupted sector
- scrub repair now works with sectors and not pages
- free space cache (v1) works with sectors and not pages
- more fine grained bio tracking for extents
- subpage support in page callbacks, extent callbacks, end io
callbacks
- simplify transaction abort logic and always abort and don't check
various potentially unreliable stats tracked by the transaction
- exclusive operations can do more checks when started and allow eg.
cancellation of the same running operation
- ensure relocation never runs while we have send operations running,
e.g. when zoned background auto reclaim starts
Fixes:
- zoned: more sanity checks of write pointer
- improve error handling in delayed inodes
- send:
- fix invalid path for unlink operations after parent
orphanization
- fix crash when memory allocations trigger reclaim
- skip compression of we have only one page (can't make things
better)
- empty value of a property newly means reset to default
Other:
- lots of cleanups, comment updates, yearly typo fixing
- disable build on platforms having page size 256K"
* tag 'for-5.14-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (101 commits)
btrfs: remove unused btrfs_fs_info::total_pinned
btrfs: rip out btrfs_space_info::total_bytes_pinned
btrfs: rip the first_ticket_bytes logic from fail_all_tickets
btrfs: remove FLUSH_DELAYED_REFS from data ENOSPC flushing
btrfs: rip out may_commit_transaction
btrfs: send: fix crash when memory allocations trigger reclaim
btrfs: ensure relocation never runs while we have send operations running
btrfs: shorten integrity checker extent data mount option
btrfs: switch mount option bits to enums and use wider type
btrfs: props: change how empty value is interpreted
btrfs: compression: don't try to compress if we don't have enough pages
btrfs: fix unbalanced unlock in qgroup_account_snapshot()
btrfs: sysfs: export dev stats in devinfo directory
btrfs: fix typos in comments
btrfs: remove a stale comment for btrfs_decompress_bio()
btrfs: send: use list_move_tail instead of list_del/list_add_tail
btrfs: disable build on platforms having page size 256K
btrfs: send: fix invalid path for unlink operations after parent orphanization
btrfs: inline wait_current_trans_commit_start in its caller
btrfs: sink wait_for_unblock parameter to async commit
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs
Pull erofs updates from Gao Xiang:
"No noticable change available for this cycle. Just a bugfix related to
sb chksum feature, two minor cleanups and Chao's email address update:
- fix wrong error code overwritten due to sb checksum feature
- two minor cleanups
- update Chao's email address"
* tag 'erofs-for-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
MAINTAINERS: erofs: update my email address
erofs: clean up file headers & footers
erofs: remove the occupied parameter from z_erofs_pagevec_enqueue()
erofs: fix error return code in erofs_read_superblock()
|
|
Pull fscrypt updates from Eric Biggers:
"A couple bug fixes for fs/crypto/:
- Fix handling of major dirhash values that happen to be 0.
- Fix cases where keys were derived differently on big endian systems
than on little endian systems (affecting some newer features only)"
* tag 'fscrypt-for-linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fscrypt: fix derivation of SipHash keys on big endian CPUs
fscrypt: don't ignore minor_hash when hash is 0
|
|
Certain uses of "do once" functionality reside outside of fast path,
and so do not require jump label patching via static keys, making
existing DO_ONCE undesirable in such cases.
Replace uses of __section(".data.once") with DO_ONCE_LITE(_IF)?
This patch changes the return values of xfs_printk_once, printk_once,
and printk_deferred_once. Before, they returned whether the print was
performed, but now, they always return true. This is okay because the
return values of the following macros are entirely ignored throughout
the kernel:
- xfs_printk_once
- xfs_warn_once
- xfs_notice_once
- xfs_info_once
- printk_once
- pr_emerg_once
- pr_alert_once
- pr_crit_once
- pr_err_once
- pr_warn_once
- pr_notice_once
- pr_info_once
- pr_devel_once
- pr_debug_once
- printk_deferred_once
- orc_warn
Changes
v3:
- Expand commit message to explain why changing return values of
xfs_printk_once, printk_once, printk_deferred_once is benign
v2:
- Fix i386 build warnings
Signed-off-by: Tanner Love <[email protected]>
Acked-by: Eric Dumazet <[email protected]>
Acked-by: Mahesh Bandewar <[email protected]>
Acked-by: Steven Rostedt (VMware) <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently, we set the r_parent pointer but then don't take a reference
to it until we submit the request. If we end up freeing the req before
that point, then we'll do a iput when we shouldn't.
Instead, take the inode reference in the callers, so that it's always
safe to call ceph_mdsc_put_request on the req, even before submission.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Luis Henriques <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
Now that we don't need to hold session->s_mutex or the snap_rwsem when
calling ceph_check_caps, we can eliminate ceph_async_iput and just use
normal iput calls.
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Luis Henriques <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
The s_mutex doesn't protect anything in this codepath.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Luis Henriques <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
These locks appear to be completely unnecessary. Almost all of this
function is done under the inode->i_ceph_lock, aside from the actual
sending of the message. Don't take either lock in this function.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Luis Henriques <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
Turn s_cap_gen field into an atomic_t, and just rely on the fact that we
hold the s_mutex when changing the s_cap_ttl field.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Luis Henriques <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
...to simplify some error paths.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Luis Henriques <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
__lookup_snap_realm
They both say that the snap_rwsem must be held for write, but I don't
see any real reason for it, and it's not currently always called that
way.
The lookup is just walking the rbtree, so holding it for read should be
fine there. The "get" is bumping the refcount and (possibly) removing
it from the empty list. I see no need to hold the snap_rwsem for write
for that.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
Turn some comments into lockdep asserts.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
Currently ceph_update_snap_realm returns -EINVAL when it hits a decoding
error, which is the wrong error code. -EINVAL implies that the user gave
us a bogus argument to a syscall or something similar. -EIO is more
descriptive when we hit a decoding error.
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
This will collect IO's total size and then calculate the average
size, and also will collect the min/max IO sizes.
The debugfs will show the size metrics in bytes and will let the
userspace applications to switch to what they need.
URL: https://tracker.ceph.com/issues/49913
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
The new __update_stdev() helper will only compute the standard
deviation.
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
Signed-off-by: Xiubo Li <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
Signed-off-by: Jeff Layton <[email protected]>
Reviewed-by: Xiubo Li <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
The sparse tool complains as follows:
fs/ceph/addr.c:316:37: warning:
symbol 'ceph_netfs_read_ops' was not declared. Should it be static?
This symbol is not used outside of addr.c, so mark it static.
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Wei Yongjun <[email protected]>
Reviewed-by: Ilya Dryomov <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
The checks for page->mapping are odd, as set_page_dirty is an
address_space operation, and I don't see where it would be called on a
non-pagecache page.
The warning about the page lock also seems bogus. The comment over
set_page_dirty() says that it can be called without the page lock in
some rare cases. I don't think we want to warn if that's the case.
Reported-by: Matthew Wilcox <[email protected]>
Signed-off-by: Jeff Layton <[email protected]>
Signed-off-by: Ilya Dryomov <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler udpates from Ingo Molnar:
- Changes to core scheduling facilities:
- Add "Core Scheduling" via CONFIG_SCHED_CORE=y, which enables
coordinated scheduling across SMT siblings. This is a much
requested feature for cloud computing platforms, to allow the
flexible utilization of SMT siblings, without exposing untrusted
domains to information leaks & side channels, plus to ensure more
deterministic computing performance on SMT systems used by
heterogenous workloads.
There are new prctls to set core scheduling groups, which allows
more flexible management of workloads that can share siblings.
- Fix task->state access anti-patterns that may result in missed
wakeups and rename it to ->__state in the process to catch new
abuses.
- Load-balancing changes:
- Tweak newidle_balance for fair-sched, to improve 'memcache'-like
workloads.
- "Age" (decay) average idle time, to better track & improve
workloads such as 'tbench'.
- Fix & improve energy-aware (EAS) balancing logic & metrics.
- Fix & improve the uclamp metrics.
- Fix task migration (taskset) corner case on !CONFIG_CPUSET.
- Fix RT and deadline utilization tracking across policy changes
- Introduce a "burstable" CFS controller via cgroups, which allows
bursty CPU-bound workloads to borrow a bit against their future
quota to improve overall latencies & batching. Can be tweaked via
/sys/fs/cgroup/cpu/<X>/cpu.cfs_burst_us.
- Rework assymetric topology/capacity detection & handling.
- Scheduler statistics & tooling:
- Disable delayacct by default, but add a sysctl to enable it at
runtime if tooling needs it. Use static keys and other
optimizations to make it more palatable.
- Use sched_clock() in delayacct, instead of ktime_get_ns().
- Misc cleanups and fixes.
* tag 'sched-core-2021-06-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
sched/doc: Update the CPU capacity asymmetry bits
sched/topology: Rework CPU capacity asymmetry detection
sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag
psi: Fix race between psi_trigger_create/destroy
sched/fair: Introduce the burstable CFS controller
sched/uclamp: Fix uclamp_tg_restrict()
sched/rt: Fix Deadline utilization tracking during policy change
sched/rt: Fix RT utilization tracking during policy change
sched: Change task_struct::state
sched,arch: Remove unused TASK_STATE offsets
sched,timer: Use __set_current_state()
sched: Add get_current_state()
sched,perf,kvm: Fix preemption condition
sched: Introduce task_is_running()
sched: Unbreak wakeups
sched/fair: Age the average idle time
sched/cpufreq: Consider reduced CPU capacity in energy calculation
sched/fair: Take thermal pressure into account while estimating energy
thermal/cpufreq_cooling: Update offline CPUs per-cpu thermal_pressure
sched/fair: Return early from update_tg_cfs_load() if delta == 0
...
|
|
The caller of wb_get_create() should pin the memcg, because
wb_get_create() relies on this guarantee. The rcu read lock
only can guarantee that the memcg css returned by css_from_id()
cannot be released, but the reference of the memcg can be zero.
rcu_read_lock()
memcg_css = css_from_id()
wb_get_create(memcg_css)
cgwb_create(memcg_css)
// css_get can change the ref counter from 0 back to 1
css_get(memcg_css)
rcu_read_unlock()
Fix it by holding a reference to the css before calling
wb_get_create(). This is not a problem I encountered in the
real world. Just the result of a code review.
Fixes: 682aa8e1a6a1 ("writeback: implement unlocked_inode_to_wb transaction and use it for stat updates")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Muchun Song <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Tejun Heo <[email protected]>
Signed-off-by: Jan Kara <[email protected]>
|
|
This patch removes setting SBI_NEED_FSCK when GC gets an error on f2fs_iget,
since f2fs_iget can give ENOMEM and others by race condition.
If we set this critical fsck flag, we'll get EIO during fsync via the below
code path.
In f2fs_inplace_write_data(),
if (is_sbi_flag_set(sbi, SBI_NEED_FSCK) || f2fs_cp_error(sbi)) {
err = -EIO;
goto drop_bio;
}
Fixes: 9557727876674 ("f2fs: drop inplace IO if fs status is abnormal")
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
|
|
Let's allow extent cache for RO partition.
Signed-off-by: Daeho Jeong <[email protected]>
Reviewed-by: Chao Yu <[email protected]>
Signed-off-by: Jaegeuk Kim <[email protected]>
|
|
Simplify nfs_pageio_complete_read() by using the inode pointer saved
inside nfs_pageio_descriptor.
Signed-off-by: Dave Wysochanski <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
After calling security_sb_clone_mnt_opts() in nfs_get_root(), it's
necessary to copy the value of has_sec_mnt_opts from the cloned
super_block's nfs_server. Otherwise, calls to nfs_compare_super()
using this super_block may not return the correct result, leading to
mount failures.
For example, mounting an nfs server with the following in /etc/exports:
/export *(rw,insecure,crossmnt,no_root_squash,security_label)
and having /export/scratch on a separate block device.
mount -o v4.2,context=system_u:object_r:root_t:s0 server:/export/test /mnt/test
mount -o v4.2,context=system_u:object_r:swapfile_t:s0 server:/export/scratch /mnt/scratch
The second mount would fail with "mount.nfs: /mnt/scratch is busy or
already mounted or sharecache fail" and "SELinux: mount invalid. Same
superblock, different security settings for..." would appear in the
syslog.
Also while we're in there, replace several instances of "NFS_SB(s)"
with "server", which was already declared at the top of the
nfs_get_root().
Fixes: ec1ade6a0448 ("nfs: account for selinux security context when deciding to share superblock")
Signed-off-by: Scott Mayhew <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
In d_make_root, when we fail to allocate dentry for root inode,
we will iput root inode and returned value is NULL in this function.
So we do not need to release this inode again at d_make_root's caller.
Signed-off-by: Chen Li <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
|
|
Orangefs df output is whacky. Walt Ligon suggested this might fix it.
It seems way more in line with reality now...
Signed-off-by: Mike Marshall <[email protected]>
|
|
Matthew Wilcox suggested that perhaps it could be "possible for
rac->file to be NULL if the caller doesn't have a struct file"...
Signed-off-by: Mike Marshall <[email protected]>
|
|
On an error path, init_statfs calls iput(pn) after pn has already been put.
Fix that by setting pn to NULL after the initial iput.
Fixes: 97fd734ba17e ("gfs2: lookup local statfs inodes prior to journal recovery")
Cc: [email protected] # v5.10+
Reported-by: Jing Xiangfeng <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
On filesystems with a block size smaller than PAGE_SIZE and non-empty
files smaller then PAGE_SIZE, gfs2_page_mkwrite could end up allocating
excess blocks beyond the end of the file, similar to fallocate. This
doesn't make sense; fix it.
Reported-by: Bob Peterson <[email protected]>
Fixes: 184b4e60853d ("gfs2: Fix end-of-file handling in gfs2_page_mkwrite")
Cc: [email protected] # v5.5+
Signed-off-by: Andreas Gruenbacher <[email protected]>
|
|
Using list_move_tail() instead of list_del() + list_add_tail().
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Baokun Li <[email protected]>
Signed-off-by: Andreas Gruenbacher <[email protected]>
|