Age | Commit message (Collapse) | Author | Files | Lines |
|
There's a loop in afs_extend_writeback() that adds extra pages to a write
we want to make to improve the efficiency of the writeback by making it
larger. This loop stops, however, if we hit a page we can't write back
from immediately, but it doesn't get rid of the page ref we speculatively
acquired.
This was caused by the removal of the cleanup loop when the code switched
from using find_get_pages_contig() to xarray scanning as the latter only
gets a single page at a time, not a batch.
Fix this by putting the page on a ref on an early break from the loop.
Unfortunately, we can't just add that page to the pagevec we're employing
as we'll go through that and add those pages to the RPC call.
This was found by the generic/074 test. It leaks ~4GiB of RAM each time it
is run - which can be observed with "top".
Fixes: e87b03f5830e ("afs: Prepare for use of THPs")
Reported-by: Marc Dionne <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-and-tested-by: Marc Dionne <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163111666635.283156.177701903478910460.stgit@warthog.procyon.org.uk/
|
|
The afs_read objects created by afs_req_issue_op() get leaked because
afs_alloc_read() returns a ref and then afs_fetch_data() gets its own ref
which is released when the operation completes, but the initial ref is
never released.
Fix this by discarding the initial ref at the end of afs_req_issue_op().
This leak also covered another bug whereby a ref isn't got on the key
attached to the read record by afs_req_issue_op(). This isn't a problem as
long as the afs_read req never goes away...
Fix this by calling key_get() in afs_req_issue_op().
This was found by the generic/074 test. It leaks a bunch of kmalloc-192
objects each time it is run, which can be observed by watching
/proc/slabinfo.
Fixes: f7605fa869cf ("afs: Fix leak of afs_read objects")
Reported-by: Marc Dionne <[email protected]>
Signed-off-by: David Howells <[email protected]>
Reviewed-and-tested-by: Marc Dionne <[email protected]>
cc: [email protected]
Link: https://lore.kernel.org/r/163010394740.3035676.8516846193899793357.stgit@warthog.procyon.org.uk/
Link: https://lore.kernel.org/r/163111665914.283156.3038561975681836591.stgit@warthog.procyon.org.uk/
|
|
Fix a leak in s_fsnotify_connectors counter in case of a race between
concurrent add of new fsnotify mark to an object.
The task that lost the race fails to drop the counter before freeing
the unused connector.
Following umount() hangs in fsnotify_sb_delete()/wait_var_event(),
because s_fsnotify_connectors never drops to zero.
Fixes: ec44610fe2b8 ("fsnotify: count all objects with attached connectors")
Reported-by: Murphy Zhou <[email protected]>
Link: https://lore.kernel.org/linux-fsdevel/[email protected]/
Signed-off-by: Amir Goldstein <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Build check of __REQ_F_LAST_BIT should be larger than, not equal or larger
than. It's perfectly valid to have __REQ_F_LAST_BIT be 32, as that means
that the last valid bit is 31 which does fit in the type.
Signed-off-by: Hao Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Pull ksmbd fixes from Steve French:
- various fixes pointed out by coverity, and a minor cleanup patch
- id mapping and ownership fixes
- an smbdirect fix
* tag '5.15-rc-ksmbd-part2' of git://git.samba.org/ksmbd:
ksmbd: fix control flow issues in sid_to_id()
ksmbd: fix read of uninitialized variable ret in set_file_basic_info
ksmbd: add missing assignments to ret on ndr_read_int64 read calls
ksmbd: add validation for ndr read/write functions
ksmbd: remove unused ksmbd_file_table_flush function
ksmbd: smbd: fix dma mapping error in smb_direct_post_send_data
ksmbd: Reduce error log 'speed is unknown' to debug
ksmbd: defer notify_change() call
ksmbd: remove setattr preparations in set_file_basic_info()
ksmbd: ensure error is surfaced in set_file_basic_info()
ndr: fix translation in ndr_encode_posix_acl()
ksmbd: fix translation in sid_to_id()
ksmbd: fix subauth 0 handling in sid_to_id()
ksmbd: fix translation in acl entries
ksmbd: fix translation in ksmbd_acls_fattr()
ksmbd: fix translation in create_posix_rsp_buf()
ksmbd: fix translation in smb2_populate_readdir_entry()
ksmbd: fix lookup on idmapped mounts
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix max_inline mount option limit on 64k page system
- lockdep fixes:
- update bdev time in a safer way
- move bdev put outside of sb write section when removing device
- fix possible deadlock when mounting seed/sprout filesystem
- zoned mode: fix split extent accounting
- minor include fixup
* tag 'for-5.15-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: fix double counting of split ordered extent
btrfs: fix lockdep warning while mounting sprout fs
btrfs: delay blkdev_put until after the device remove
btrfs: update the bdev time directly when closing
btrfs: use correct header for div_u64 in misc.h
btrfs: fix upper limit for max_inline for page size 64K
|
|
Cached root file was not being completely invalidated sometimes.
Reproducing:
- With a DFS share with 2 targets, one disabled and one enabled
- start some I/O on the mount
# while true; do ls /mnt/dfs; done
- at the same time, disable the enabled target and enable the disabled
one
- wait for DFS cache to expire
- on reconnect, the previous cached root handle should be invalid, but
open_cached_dir_by_dentry() will still try to use it, but throws a
use-after-free warning (kref_get())
Make smb2_close_cached_fid() invalidate all fields every time, but only
send an SMB2_close() when the entry is still valid.
Signed-off-by: Enzo Matsumiya <[email protected]>
Reviewed-by: Paulo Alcantara (SUSE) <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML updates from Richard Weinberger:
- Support for VMAP_STACK
- Support for splice_write in hostfs
- Fixes for virt-pci
- Fixes for virtio_uml
- Various fixes
* tag 'for-linus-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
um: fix stub location calculation
um: virt-pci: fix uapi documentation
um: enable VMAP_STACK
um: virt-pci: don't do DMA from stack
hostfs: support splice_write
um: virtio_uml: fix memory leak on init failures
um: virtio_uml: include linux/virtio-uml.h
lib/logic_iomem: fix sparse warnings
um: make PCI emulation driver init/exit static
|
|
Pull ARM development updates from Russell King:
- Rename "mod_init" and "mod_exit" so that initcall debug output is
actually useful (Randy Dunlap)
- Update maintainers entries for linux-arm-kernel to indicate it is
moderated for non-subscribers (Randy Dunlap)
- Move install rules to arch/arm/Makefile (Masahiro Yamada)
- Drop unnecessary ARCH_NR_GPIOS definition (Linus Walleij)
- Don't warn about atags_to_fdt() stack size (David Heidelberg)
- Speed up unaligned copy_{from,to}_kernel_nofault (Arnd Bergmann)
- Get rid of set_fs() usage (Arnd Bergmann)
- Remove checks for GCC prior to v4.6 (Geert Uytterhoeven)
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9118/1: div64: Remove always-true __div64_const32_is_OK() duplicate
ARM: 9117/1: asm-generic: div64: Remove always-true __div64_const32_is_OK()
ARM: 9116/1: unified: Remove check for gcc < 4
ARM: 9110/1: oabi-compat: fix oabi epoll sparse warning
ARM: 9113/1: uaccess: remove set_fs() implementation
ARM: 9112/1: uaccess: add __{get,put}_kernel_nofault
ARM: 9111/1: oabi-compat: rework fcntl64() emulation
ARM: 9114/1: oabi-compat: rework sys_semtimedop emulation
ARM: 9108/1: oabi-compat: rework epoll_wait/epoll_pwait emulation
ARM: 9107/1: syscall: always store thread_info->abi_syscall
ARM: 9109/1: oabi-compat: add epoll_pwait handler
ARM: 9106/1: traps: use get_kernel_nofault instead of set_fs()
ARM: 9115/1: mm/maccess: fix unaligned copy_{from,to}_kernel_nofault
ARM: 9105/1: atags_to_fdt: don't warn about stack size
ARM: 9103/1: Drop ARCH_NR_GPIOS definition
ARM: 9102/1: move theinstall rules to arch/arm/Makefile
ARM: 9100/1: MAINTAINERS: mark all linux-arm-kernel@infradead list as moderated
ARM: 9099/1: crypto: rename 'mod_init' & 'mod_exit' functions to be module-specific
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull more s390 updates from Heiko Carstens:
"Except for the xpram device driver removal it is all about fixes and
cleanups.
- Fix topology update on cpu hotplug, so notifiers see expected
masks. This bug was uncovered with SCHED_CORE support.
- Fix stack unwinding so that the correct number of entries are
omitted like expected by common code. This fixes KCSAN selftests.
- Add kmemleak annotation to stack_alloc to avoid false positive
kmemleak warnings.
- Avoid layering violation in common I/O code and don't unregister
subchannel from child-drivers.
- Remove xpram device driver for which no real use case exists since
the kernel is 64 bit only. Also all hypervisors got required
support removed in the meantime, which means the xpram device
driver is dead code.
- Fix -ENODEV handling of clp_get_state in our PCI code.
- Enable KFENCE in debug defconfig.
- Cleanup hugetlbfs s390 specific Kconfig dependency.
- Quite a lot of trivial fixes to get rid of "W=1" warnings, and and
other simple cleanups"
* tag 's390-5.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
hugetlbfs: s390 is always 64bit
s390/ftrace: remove incorrect __va usage
s390/zcrypt: remove incorrect kernel doc indicators
scsi: zfcp: fix kernel doc comments
s390/sclp: add __nonstring annotation
s390/hmcdrv_ftp: fix kernel doc comment
s390: remove xpram device driver
s390/pci: read clp_list_pci_req only once
s390/pci: fix clp_get_state() handling of -ENODEV
s390/cio: fix kernel doc comment
s390/ctrlchar: fix kernel doc comment
s390/con3270: use proper type for tasklet function
s390/cpum_cf: move array from header to C file
s390/mm: fix kernel doc comments
s390/topology: fix topology information when calling cpu hotplug notifiers
s390/unwind: use current_frame_address() to unwind current task
s390/configs: enable CONFIG_KFENCE in debug_defconfig
s390/entry: make oklabel within CHKSTG macro local
s390: add kmemleak annotation in stack_alloc()
s390/cio: dont unregister subchannel from child-drivers
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull gfs2 setattr updates from Al Viro:
"Make it possible for filesystems to use a generic 'may_setattr()' and
switch gfs2 to using it"
* 'work.gfs2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
gfs2: Switch to may_setattr in gfs2_setattr
fs: Move notify_change permission checks into may_setattr
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull root filesystem type handling updates from Al Viro:
"Teach init/do_mounts.c to handle non-block filesystems, hopefully
preventing even more special-cased kludges (such as root=/dev/nfs,
etc)"
* 'work.init' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: simplify get_filesystem_list / get_all_fs_names
init: allow mounting arbitrary non-blockdevice filesystems as root
init: split get_fs_names
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull iov_iter fixes from Al Viro:
"Fixes for io-uring handling of iov_iter reexpands"
* 'work.iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
io_uring: reexpand under-reexpanded iters
iov_iter: track truncated size
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm updates from Dan Williams:
- Fix a race condition in the teardown path of raw mode pmem
namespaces.
- Cleanup the code that filesystems use to detect filesystem-dax
capabilities of their underlying block device.
* tag 'libnvdimm-for-5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: remove bdev_dax_supported
xfs: factor out a xfs_buftarg_is_dax helper
dax: stub out dax_supported for !CONFIG_FS_DAX
dax: remove __generic_fsdax_supported
dax: move the dax_read_lock() locking into dax_supported
dax: mark dax_get_by_host static
dm: use fs_dax_get_by_bdev instead of dax_get_by_host
dax: stop using bdevname
fsdax: improve the FS_DAX Kconfig description and help text
libnvdimm/pmem: Fix crash triggered when I/O in-flight during unbind
|
|
Show options should show option according documentation when some value
is not default or when ever coder wants. Uid/gid are problematic because
it is hard to know which are defaults. In file system there is many
different implementation for this problem.
Some file systems show uid/gid when they are different than root, some
when user has set them and some show them always. There is also problem
that what if root uid/gid change. This code just choose to show them
always. This way we do not need to think this any more.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
Rename mount option no_acs_rules to (no)acsrules. This allow us to use
possibility to mount with options noaclrules or aclrules.
Acked-by: Christian Brauner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
Other fs drivers are using iocharset= mount option for specifying charset.
So add it also for ntfs3 and mark old nls= mount option as deprecated.
Reviewed-by: Pali Rohár <[email protected]>
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
If we call Opt_nohidden with just keyword hidden, then we can use
hidden/nohidden when mounting. We already use this method for almoust
all other parameters so it is just logical that this will use same
method.
Acked-by: Christian Brauner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Pali Rohár <[email protected]>
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
init_fs_context() is meant to initialize s_fs_info (spi). Move spi
initializing code there which we can initialize before fill_super().
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
We have now new mount api as described in Documentation/filesystems. We
should use it as it gives us some benefits which are desribed here
lore.kernel.org/linux-fsdevel/159646178122.1784947.11705396571718464082.stgit@warthog.procyon.org.uk/
Nls loading is changed a to load with string. This did make code also
little cleaner.
Also try to use fsparam_flag_no as much as possible. This is just nice
little touch and is not mandatory but it should not make any harm. It
is just convenient that we can use example acl/noacl mount options.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
Use pointer to mount options. We want to do this because we will use new
mount api which will benefit that we have spi and mount options in
different allocations. When we remount we do not have to make whole new
spi it is enough that we will allocate just mount options.
Please note that we can do example remount lot cleaner but things will
change in next patch so this should be just functional.
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
Remove unnecesarry remount flag handling. This does not do anything for
this driver. We have already set SB_NODIRATIME when we fill super. Also
noatime should be set from mount option. Now for some reson we try to
set it when remounting.
Lazytime part looks like it is copied from f2fs and there is own mount
parameter for it. That is why they use it. We do not set lazytime
anywhere in our code. So basically this just blocks lazytime when
remounting.
Acked-by: Christian Brauner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
Remove unnecesarry mount option noatime because this will be handled
by VFS. Our option parser will never get opt like this.
Acked-by: Christian Brauner <[email protected]>
Reviewed-by: Christoph Hellwig <[email protected]>
Reviewed-by: Pali Rohár <[email protected]>
Signed-off-by: Kari Argillander <[email protected]>
Signed-off-by: Konstantin Komarov <[email protected]>
|
|
When we cancel a timeout we should mark it with REQ_F_FAIL, so
linked requests are cancelled as well, but not queued for further
execution.
Cc: [email protected]
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/fff625b44eeced3a5cae79f60e6acf3fbdf8f990.1631192135.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
|
|
Remove the code that re-initializes a buffer head with an invalid block
number and BH_New and BH_Delay bits when a matching delayed and
unwritten block has been found in the extent status cache. Replace it
with assertions that verify the buffer head already has this state
correctly set. The current code masked an inline data truncation bug
that left stale entries in the extent status cache. With this change,
generic/130 can be used to reproduce and detect that bug.
Signed-off-by: Eric Whitney <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Conditionally remove all cached extents belonging to an inode
when truncating its inline data. It's only necessary to attempt to
remove cached extents when a conversion from inline to extent storage
has been initiated (!EXT4_STATE_MAY_INLINE_DATA). This avoids
unnecessary es lock overhead in the more common inline case.
Signed-off-by: Eric Whitney <[email protected]>
Signed-off-by: Theodore Ts'o <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
Fix a bug in how we update i_disksize, and the error path in
inline_data_end. Finally, drop an unnecessary creation of a journal
handle which was only needed for inline data, which can give us a
large performance gain in delayed allocation writes.
Signed-off-by: Theodore Ts'o <[email protected]>
|
|
BUG: memory leak
unreferenced object 0xffff888126fcd6c0 (size 192):
comm "syz-executor.1", pid 11934, jiffies 4294983026 (age 15.690s)
backtrace:
[<ffffffff81632c91>] kmalloc_node include/linux/slab.h:609 [inline]
[<ffffffff81632c91>] kzalloc_node include/linux/slab.h:732 [inline]
[<ffffffff81632c91>] create_io_worker+0x41/0x1e0 fs/io-wq.c:739
[<ffffffff8163311e>] io_wqe_create_worker fs/io-wq.c:267 [inline]
[<ffffffff8163311e>] io_wqe_enqueue+0x1fe/0x330 fs/io-wq.c:866
[<ffffffff81620b64>] io_queue_async_work+0xc4/0x200 fs/io_uring.c:1473
[<ffffffff8162c59c>] __io_queue_sqe+0x34c/0x510 fs/io_uring.c:6933
[<ffffffff8162c7ab>] io_req_task_submit+0x4b/0xa0 fs/io_uring.c:2233
[<ffffffff8162cb48>] io_async_task_func+0x108/0x1c0 fs/io_uring.c:5462
[<ffffffff816259e3>] tctx_task_work+0x1b3/0x3a0 fs/io_uring.c:2158
[<ffffffff81269b43>] task_work_run+0x73/0xb0 kernel/task_work.c:164
[<ffffffff812dcdd1>] tracehook_notify_signal include/linux/tracehook.h:212 [inline]
[<ffffffff812dcdd1>] handle_signal_work kernel/entry/common.c:146 [inline]
[<ffffffff812dcdd1>] exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
[<ffffffff812dcdd1>] exit_to_user_mode_prepare+0x151/0x180 kernel/entry/common.c:209
[<ffffffff843ff25d>] __syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
[<ffffffff843ff25d>] syscall_exit_to_user_mode+0x1d/0x40 kernel/entry/common.c:302
[<ffffffff843fa4a2>] do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
[<ffffffff84600068>] entry_SYSCALL_64_after_hwframe+0x44/0xae
when create_io_thread() return error, and not retry, the worker object
need to be freed.
Reported-by: [email protected]
Signed-off-by: Qiang.zhang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
The FSCTL definitions are in smbfsctl.h which should be
shared by client and server. Move the updated version of
smbfsctl.h into smbfs_common and have the client code use
it (subsequent patch will change the server to use this
common version of the header).
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
As we move to common code between client and server, we have
been asked to make the names less confusing, and refer less
to "cifs" and more to words which include "smb" instead to
e.g. "smbfs" for the client (we already have "ksmbd" for the
kernel server, and "smbd" for the user space Samba daemon).
So to be more consistent in the naming of common code between
client and server and reduce the risk of merge conflicts as
more common code is added - rename "cifs_common" to
"smbfs_common" (in future releases we also will rename
the fs/cifs directory to fs/smbfs)
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Add some missing defines used by ksmbd to the client
version of smbfsctl.h, and add a missing newer define
mentioned in the protocol definitions (MS-FSCC).
This will also make it easier to move to common code.
Reviewed-by: Ronnie Sahlberg <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
We check for the func with an OR condition, which means it always ends
up being false and we never match the task_work we want to cancel. In
the unexpected case that we do exit with that pending, we can trigger
a hang waiting for a worker to exit, but it was never created. syzbot
reports that as such:
INFO: task syz-executor687:8514 blocked for more than 143 seconds.
Not tainted 5.14.0-syzkaller #0
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:syz-executor687 state:D stack:27296 pid: 8514 ppid: 8479 flags:0x00024004
Call Trace:
context_switch kernel/sched/core.c:4940 [inline]
__schedule+0x940/0x26f0 kernel/sched/core.c:6287
schedule+0xd3/0x270 kernel/sched/core.c:6366
schedule_timeout+0x1db/0x2a0 kernel/time/timer.c:1857
do_wait_for_common kernel/sched/completion.c:85 [inline]
__wait_for_common kernel/sched/completion.c:106 [inline]
wait_for_common kernel/sched/completion.c:117 [inline]
wait_for_completion+0x176/0x280 kernel/sched/completion.c:138
io_wq_exit_workers fs/io-wq.c:1162 [inline]
io_wq_put_and_exit+0x40c/0xc70 fs/io-wq.c:1197
io_uring_clean_tctx fs/io_uring.c:9607 [inline]
io_uring_cancel_generic+0x5fe/0x740 fs/io_uring.c:9687
io_uring_files_cancel include/linux/io_uring.h:16 [inline]
do_exit+0x265/0x2a30 kernel/exit.c:780
do_group_exit+0x125/0x310 kernel/exit.c:922
get_signal+0x47f/0x2160 kernel/signal.c:2868
arch_do_signal_or_restart+0x2a9/0x1c40 arch/x86/kernel/signal.c:865
handle_signal_work kernel/entry/common.c:148 [inline]
exit_to_user_mode_loop kernel/entry/common.c:172 [inline]
exit_to_user_mode_prepare+0x17d/0x290 kernel/entry/common.c:209
__syscall_exit_to_user_mode_work kernel/entry/common.c:291 [inline]
syscall_exit_to_user_mode+0x19/0x60 kernel/entry/common.c:302
do_syscall_64+0x42/0xb0 arch/x86/entry/common.c:86
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x445cd9
RSP: 002b:00007fc657f4b308 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: 0000000000000001 RBX: 00000000004cb448 RCX: 0000000000445cd9
RDX: 00000000000f4240 RSI: 0000000000000081 RDI: 00000000004cb44c
RBP: 00000000004cb440 R08: 000000000000000e R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000049b154
R13: 0000000000000003 R14: 00007fc657f4b400 R15: 0000000000022000
While in there, also decrement accr->nr_workers. This isn't strictly
needed as we're exiting, but let's make sure the accounting matches up.
Fixes: 3146cba99aa2 ("io-wq: make worker creation resilient against signals")
Reported-by: [email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
The SQPOLL thread dictates the lock order, and we hold the ctx->uring_lock
for all the registration opcodes. We also hold a ref to the ctx, and we
do drop the lock for other reasons to quiesce, so it's fine to drop the
ctx lock temporarily to grab the sqd->lock. This fixes the following
lockdep splat:
======================================================
WARNING: possible circular locking dependency detected
5.14.0-syzkaller #0 Not tainted
------------------------------------------------------
syz-executor.5/25433 is trying to acquire lock:
ffff888023426870 (&sqd->lock){+.+.}-{3:3}, at: io_register_iowq_max_workers fs/io_uring.c:10551 [inline]
ffff888023426870 (&sqd->lock){+.+.}-{3:3}, at: __io_uring_register fs/io_uring.c:10757 [inline]
ffff888023426870 (&sqd->lock){+.+.}-{3:3}, at: __do_sys_io_uring_register+0x10aa/0x2e70 fs/io_uring.c:10792
but task is already holding lock:
ffff8880885b40a8 (&ctx->uring_lock){+.+.}-{3:3}, at: __do_sys_io_uring_register+0x2e1/0x2e70 fs/io_uring.c:10791
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (&ctx->uring_lock){+.+.}-{3:3}:
__mutex_lock_common kernel/locking/mutex.c:596 [inline]
__mutex_lock+0x131/0x12f0 kernel/locking/mutex.c:729
__io_sq_thread fs/io_uring.c:7291 [inline]
io_sq_thread+0x65a/0x1370 fs/io_uring.c:7368
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
-> #0 (&sqd->lock){+.+.}-{3:3}:
check_prev_add kernel/locking/lockdep.c:3051 [inline]
check_prevs_add kernel/locking/lockdep.c:3174 [inline]
validate_chain kernel/locking/lockdep.c:3789 [inline]
__lock_acquire+0x2a07/0x54a0 kernel/locking/lockdep.c:5015
lock_acquire kernel/locking/lockdep.c:5625 [inline]
lock_acquire+0x1ab/0x510 kernel/locking/lockdep.c:5590
__mutex_lock_common kernel/locking/mutex.c:596 [inline]
__mutex_lock+0x131/0x12f0 kernel/locking/mutex.c:729
io_register_iowq_max_workers fs/io_uring.c:10551 [inline]
__io_uring_register fs/io_uring.c:10757 [inline]
__do_sys_io_uring_register+0x10aa/0x2e70 fs/io_uring.c:10792
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&ctx->uring_lock);
lock(&sqd->lock);
lock(&ctx->uring_lock);
lock(&sqd->lock);
*** DEADLOCK ***
Fixes: 2e480058ddc2 ("io-wq: provide a way to limit max number of workers")
Reported-by: [email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Include Christoph's rework of the dax_supported() helpers in the v5.15
libnvdimm update. This supports the ongoing dax-reflink enabling effort.
|
|
Pull ceph updates from Ilya Dryomov:
- a set of patches to address fsync stalls caused by depending on
periodic rather than triggered MDS journal flushes in some cases
(Xiubo Li)
- a fix for mtime effectively not getting updated in case of competing
writers (Jeff Layton)
- a couple of fixes for inode reference leaks and various WARNs after
"umount -f" (Xiubo Li)
- a new ceph.auth_mds extended attribute (Jeff Layton)
- a smattering of fixups and cleanups from Jeff, Xiubo and Colin.
* tag 'ceph-for-5.15-rc1' of git://github.com/ceph/ceph-client:
ceph: fix dereference of null pointer cf
ceph: drop the mdsc_get_session/put_session dout messages
ceph: lockdep annotations for try_nonblocking_invalidate
ceph: don't WARN if we're forcibly removing the session caps
ceph: don't WARN if we're force umounting
ceph: remove the capsnaps when removing caps
ceph: request Fw caps before updating the mtime in ceph_write_iter
ceph: reconnect to the export targets on new mdsmaps
ceph: print more information when we can't find snaprealm
ceph: add ceph_change_snap_realm() helper
ceph: remove redundant initializations from mdsc and session
ceph: cancel delayed work instead of flushing on mdsc teardown
ceph: add a new vxattr to return auth mds for an inode
ceph: remove some defunct forward declarations
ceph: flush the mdlog before waiting on unsafe reqs
ceph: flush mdlog before umounting
ceph: make iterate_sessions a global symbol
ceph: make ceph_create_session_msg a global symbol
ceph: fix comment about short copies in ceph_write_end
ceph: fix memory leak on decode error in ceph_handle_caps
|
|
Addresses-Coverity reported Control flow issues in sid_to_id()
/fs/ksmbd/smbacl.c: 277 in sid_to_id()
271
272 if (sidtype == SIDOWNER) {
273 kuid_t uid;
274 uid_t id;
275
276 id = le32_to_cpu(psid->sub_auth[psid->num_subauth - 1]);
>>> CID 1506810: Control flow issues (NO_EFFECT)
>>> This greater-than-or-equal-to-zero comparison of an unsigned value
>>> is always true. "id >= 0U".
277 if (id >= 0) {
278 /*
279 * Translate raw sid into kuid in the server's user
280 * namespace.
281 */
282 uid = make_kuid(&init_user_ns, id);
Addresses-Coverity: ("Control flow issues")
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Addresses-Coverity reported Uninitialized variables warninig :
/fs/ksmbd/smb2pdu.c: 5525 in set_file_basic_info()
5519 if (!rc) {
5520 inode->i_ctime = ctime;
5521 mark_inode_dirty(inode);
5522 }
5523 inode_unlock(inode);
5524 }
>>> CID 1506805: Uninitialized variables (UNINIT)
>>> Using uninitialized value "rc".
5525 return rc;
5526 }
5527
5528 static int set_file_allocation_info(struct ksmbd_work *work,
5529 struct ksmbd_file *fp, char *buf)
5530 {
Addresses-Coverity: ("Uninitialized variable")
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Currently there are two ndr_read_int64 calls where ret is being checked
for failure but ret is not being assigned a return value from the call.
Static analyis is reporting the checks on ret as dead code. Fix this.
Addresses-Coverity: ("Logical dead code")
Reviewed-by: Sergey Senozhatsky <[email protected]>
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Namjae Jeon <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
In case of !SQPOLL, io_cqring_ev_posted_iopoll() doesn't provide a
memory barrier required by waitqueue_active(&ctx->poll_wait). There is
a wq_has_sleeper(), which does smb_mb() inside, but it's called only for
SQPOLL.
Fixes: 5fd4617840596 ("io_uring: be smarter about waking multiple CQ ring waiters")
Signed-off-by: Pavel Begunkov <[email protected]>
Link: https://lore.kernel.org/r/2982e53bcea2274006ed435ee2a77197107d8a29.1631130542.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <[email protected]>
|
|
Merge more updates from Andrew Morton:
"147 patches, based on 7d2a07b769330c34b4deabeed939325c77a7ec2f.
Subsystems affected by this patch series: mm (memory-hotplug, rmap,
ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan),
alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib,
checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig,
selftests, ipc, and scripts"
* emailed patches from Andrew Morton <[email protected]>: (94 commits)
scripts: check_extable: fix typo in user error message
mm/workingset: correct kernel-doc notations
ipc: replace costly bailout check in sysvipc_find_ipc()
selftests/memfd: remove unused variable
Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH
configs: remove the obsolete CONFIG_INPUT_POLLDEV
prctl: allow to setup brk for et_dyn executables
pid: cleanup the stale comment mentioning pidmap_init().
kernel/fork.c: unexport get_{mm,task}_exe_file
coredump: fix memleak in dump_vma_snapshot()
fs/coredump.c: log if a core dump is aborted due to changed file permissions
nilfs2: use refcount_dec_and_lock() to fix potential UAF
nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group
nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group
nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group
nilfs2: fix NULL pointer in nilfs_##name##_attr_release
nilfs2: fix memory leak in nilfs_sysfs_create_device_group
trap: cleanup trap_init()
init: move usermodehelper_enable() to populate_rootfs()
...
|
|
dump_vma_snapshot() allocs memory for *vma_meta, when dump_vma_snapshot()
returns -EFAULT, the memory will be leaked, so we free it correctly.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: a07279c9a8cd7 ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot")
Signed-off-by: QiuXi <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
For obvious security reasons, a core dump is aborted if the filesystem
cannot preserve ownership or permissions of the dump file.
This affects filesystems like e.g. vfat, but also something like a 9pfs
share in a Qemu test setup, running as a regular user, depending on the
security model used. In those cases, the result is an empty core file and
a confused user.
To hopefully save other people a lot of time figuring out the cause, this
patch adds a simple log message for those specific cases.
[[email protected]: s/|%s/%s/ in printk text]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: David Oberhollenzer <[email protected]>
Cc: Al Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When the refcount is decreased to 0, the resource reclamation branch is
entered. Before CPU0 reaches the race point (1), CPU1 may obtain the
spinlock and traverse the rbtree to find 'root', see
nilfs_lookup_root().
Although CPU1 will call refcount_inc() to increase the refcount, it is
obviously too late. CPU0 will release 'root' directly, CPU1 then
accesses 'root' and triggers UAF.
Use refcount_dec_and_lock() to ensure that both the operations of
decrease refcount to 0 and link deletion are lock protected eliminates
this risk.
CPU0 CPU1
nilfs_put_root():
<-------- (1)
spin_lock(&nilfs->ns_cptree_lock);
rb_erase(&root->rb_node, &nilfs->ns_cptree);
spin_unlock(&nilfs->ns_cptree_lock);
kfree(root);
<-------- use-after-free
refcount_t: underflow; use-after-free.
WARNING: CPU: 2 PID: 9476 at lib/refcount.c:28 \
refcount_warn_saturate+0x1cf/0x210 lib/refcount.c:28
Modules linked in:
CPU: 2 PID: 9476 Comm: syz-executor.0 Not tainted 5.10.45-rc1+ #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ...
RIP: 0010:refcount_warn_saturate+0x1cf/0x210 lib/refcount.c:28
... ...
Call Trace:
__refcount_sub_and_test include/linux/refcount.h:283 [inline]
__refcount_dec_and_test include/linux/refcount.h:315 [inline]
refcount_dec_and_test include/linux/refcount.h:333 [inline]
nilfs_put_root+0xc1/0xd0 fs/nilfs2/the_nilfs.c:795
nilfs_segctor_destroy fs/nilfs2/segment.c:2749 [inline]
nilfs_detach_log_writer+0x3fa/0x570 fs/nilfs2/segment.c:2812
nilfs_put_super+0x2f/0xf0 fs/nilfs2/super.c:467
generic_shutdown_super+0xcd/0x1f0 fs/super.c:464
kill_block_super+0x4a/0x90 fs/super.c:1446
deactivate_locked_super+0x6a/0xb0 fs/super.c:335
deactivate_super+0x85/0x90 fs/super.c:366
cleanup_mnt+0x277/0x2e0 fs/namespace.c:1118
__cleanup_mnt+0x15/0x20 fs/namespace.c:1125
task_work_run+0x8e/0x110 kernel/task_work.c:151
tracehook_notify_resume include/linux/tracehook.h:188 [inline]
exit_to_user_mode_loop kernel/entry/common.c:164 [inline]
exit_to_user_mode_prepare+0x13c/0x170 kernel/entry/common.c:191
syscall_exit_to_user_mode+0x16/0x30 kernel/entry/common.c:266
do_syscall_64+0x45/0x80 arch/x86/entry/common.c:56
entry_SYSCALL_64_after_hwframe+0x44/0xa9
There is no reproduction program, and the above is only theoretical
analysis.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: ba65ae4729bf ("nilfs2: add checkpoint tree to nilfs object")
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Zhen Lei <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
kobject_put() should be used to cleanup the memory associated with the
kobject instead of kobject_del(). See the section "Kobject removal" of
"Documentation/core-api/kobject.rst".
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nanyong Sun <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
If kobject_init_and_add returns with error, kobject_put() is needed here
to avoid memory leak, because kobject_init_and_add may return error
without freeing the memory associated with the kobject it allocated.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nanyong Sun <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The kobject_put() should be used to cleanup the memory associated with the
kobject instead of kobject_del. See the section "Kobject removal" of
"Documentation/core-api/kobject.rst".
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nanyong Sun <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
If kobject_init_and_add return with error, kobject_put() is needed here to
avoid memory leak, because kobject_init_and_add may return error without
freeing the memory associated with the kobject it allocated.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nanyong Sun <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In nilfs_##name##_attr_release, kobj->parent should not be referenced
because it is a NULL pointer. The release() method of kobject is always
called in kobject_put(kobj), in the implementation of kobject_put(), the
kobj->parent will be assigned as NULL before call the release() method.
So just use kobj to get the subgroups, which is more efficient and can fix
a NULL pointer reference problem.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nanyong Sun <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "nilfs2: fix incorrect usage of kobject".
This patchset from Nanyong Sun fixes memory leak issues and a NULL
pointer dereference issue caused by incorrect usage of kboject in nilfs2
sysfs implementation.
This patch (of 6):
Reported by syzkaller:
BUG: memory leak
unreferenced object 0xffff888100ca8988 (size 8):
comm "syz-executor.1", pid 1930, jiffies 4294745569 (age 18.052s)
hex dump (first 8 bytes):
6c 6f 6f 70 31 00 ff ff loop1...
backtrace:
kstrdup+0x36/0x70 mm/util.c:60
kstrdup_const+0x35/0x60 mm/util.c:83
kvasprintf_const+0xf1/0x180 lib/kasprintf.c:48
kobject_set_name_vargs+0x56/0x150 lib/kobject.c:289
kobject_add_varg lib/kobject.c:384 [inline]
kobject_init_and_add+0xc9/0x150 lib/kobject.c:473
nilfs_sysfs_create_device_group+0x150/0x7d0 fs/nilfs2/sysfs.c:986
init_nilfs+0xa21/0xea0 fs/nilfs2/the_nilfs.c:637
nilfs_fill_super fs/nilfs2/super.c:1046 [inline]
nilfs_mount+0x7b4/0xe80 fs/nilfs2/super.c:1316
legacy_get_tree+0x105/0x210 fs/fs_context.c:592
vfs_get_tree+0x8e/0x2d0 fs/super.c:1498
do_new_mount fs/namespace.c:2905 [inline]
path_mount+0xf9b/0x1990 fs/namespace.c:3235
do_mount+0xea/0x100 fs/namespace.c:3248
__do_sys_mount fs/namespace.c:3456 [inline]
__se_sys_mount fs/namespace.c:3433 [inline]
__x64_sys_mount+0x14b/0x1f0 fs/namespace.c:3433
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
If kobject_init_and_add return with error, then the cleanup of kobject
is needed because memory may be allocated in kobject_init_and_add
without freeing.
And the place of cleanup_dev_kobject should use kobject_put to free the
memory associated with the kobject. As the section "Kobject removal" of
"Documentation/core-api/kobject.rst" says, kobject_del() just makes the
kobject "invisible", but it is not cleaned up. And no more cleanup will
do after cleanup_dev_kobject, so kobject_put is needed here.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Reported-by: Hulk Robot <[email protected]>
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nanyong Sun <[email protected]>
Signed-off-by: Ryusuke Konishi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This counter tracks the number of watches a user has, to compare against
the 'max_user_watches' limit. This causes a scalability bottleneck on
SPECjbb2015 on large systems as there is only one user. Changing to a
per-cpu counter increases throughput of the benchmark by about 30% on a
16-socket, > 1000 thread system.
[[email protected]: fix build errors in kernel/user.c when CONFIG_EPOLL=n]
[[email protected]: move ifdefs into wrapper functions, slightly improve panic message]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: tweak user_epoll_alloc(), per Guenter]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nicholas Piggin <[email protected]>
Reported-by: Anton Blanchard <[email protected]>
Cc: Alexander Viro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|