aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-09-08Merge tag 'docs-5.15-2' of git://git.lwn.net/linuxLinus Torvalds73-129/+2686
Pull more documentation updates from Jonathan Corbet: "Another collection of documentation patches, mostly fixes but also includes another set of traditional Chinese translations" * tag 'docs-5.15-2' of git://git.lwn.net/linux: docs: pdfdocs: Fix typo in CJK-language specific font settings docs: kernel-hacking: Remove inappropriate text docs/zh_TW: add translations for zh_TW/filesystems docs/zh_TW: add translations for zh_TW/cpu-freq docs/zh_TW: add translations for zh_TW/arm64 docs/zh_CN: Modify the translator tag and fix the wrong word Documentation/features/vm: correct huge-vmap APIs Documentation: block: blk-mq: Fix small typo in multi-queue docs Documentation: in_irq() cleanup Documentation: arm: marvell: Add 88F6825 model into list Documentation/process/maintainer-pgp-guide: Replace broken link to PGP path finder Documentation: locking: fix references Documentation: Update details of The Linux Kernel Module Programming Guide docs: x86: Remove obsolete information about x86_64 vmalloc() faulting Documentation/process/applying-patches: Activate linux-next man hyperlink
2021-09-09compiler_attributes.h: move __compiletime_{error|warning}Nick Desaulniers3-9/+24
Clang 14 will add support for __attribute__((__error__(""))) and __attribute__((__warning__(""))). To make use of these in __compiletime_error and __compiletime_warning (as used by BUILD_BUG and friends) for newer clang and detect/fallback for older versions of clang, move these to compiler_attributes.h and guard them with __has_attribute preprocessor guards. Link: https://reviews.llvm.org/D106030 Link: https://bugs.llvm.org/show_bug.cgi?id=16428 Link: https://github.com/ClangBuiltLinux/linux/issues/1173 Signed-off-by: Nick Desaulniers <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Reviewed-by: Kees Cook <[email protected]> [Reworded, landed in Clang 14] Signed-off-by: Miguel Ojeda <[email protected]>
2021-09-08Merge tag 'modules-for-v5.15' of ↵Linus Torvalds2-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull module updates from Jessica Yu: "The only main change I have for this round of updates is the modules MAINTAINERS update. As I find myself with less time to devote to upstream these days, Luis has kindly agreed to help maintain the module loader, to eventually transition to being the primary maintainer. Since Luis is already very involved upstream with experience maintaining various areas of the kernel including the kmod usermode helper, I think he is a great fit for this area of the kernel. Summary: - Add Luis Chamberlain as modules maintainer - Fix for .ctors sections in module linker script" * tag 'modules-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: MAINTAINERS: Add Luis Chamberlain as modules maintainer module: combine constructors in module linker script
2021-09-08Merge tag 'microblaze-v5.15' of git://git.monstr.eu/linux-2.6-microblazeLinus Torvalds2-5/+4
Pull microblaze update from Michal Simek: - Kbuild clean up * tag 'microblaze-v5.15' of git://git.monstr.eu/linux-2.6-microblaze: microblaze: move core-y in arch/microblaze/Makefile to arch/microblaze/Kbuild
2021-09-08Merge branch 'for-5.15/fsdax-cleanups' into for-5.15/libnvdimmDan Williams8-169/+117
Include Christoph's rework of the dax_supported() helpers in the v5.15 libnvdimm update. This supports the ongoing dax-reflink enabling effort.
2021-09-08Merge tag 'nfsd-5.15-1' of ↵Linus Torvalds3-7/+10
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Restore performance on memory-starved servers * tag 'nfsd-5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: improve error response to over-size gss credential SUNRPC: don't pause on incomplete allocation
2021-09-08Merge tag 'ceph-for-5.15-rc1' of git://github.com/ceph/ceph-clientLinus Torvalds14-207/+438
Pull ceph updates from Ilya Dryomov: - a set of patches to address fsync stalls caused by depending on periodic rather than triggered MDS journal flushes in some cases (Xiubo Li) - a fix for mtime effectively not getting updated in case of competing writers (Jeff Layton) - a couple of fixes for inode reference leaks and various WARNs after "umount -f" (Xiubo Li) - a new ceph.auth_mds extended attribute (Jeff Layton) - a smattering of fixups and cleanups from Jeff, Xiubo and Colin. * tag 'ceph-for-5.15-rc1' of git://github.com/ceph/ceph-client: ceph: fix dereference of null pointer cf ceph: drop the mdsc_get_session/put_session dout messages ceph: lockdep annotations for try_nonblocking_invalidate ceph: don't WARN if we're forcibly removing the session caps ceph: don't WARN if we're force umounting ceph: remove the capsnaps when removing caps ceph: request Fw caps before updating the mtime in ceph_write_iter ceph: reconnect to the export targets on new mdsmaps ceph: print more information when we can't find snaprealm ceph: add ceph_change_snap_realm() helper ceph: remove redundant initializations from mdsc and session ceph: cancel delayed work instead of flushing on mdsc teardown ceph: add a new vxattr to return auth mds for an inode ceph: remove some defunct forward declarations ceph: flush the mdlog before waiting on unsafe reqs ceph: flush mdlog before umounting ceph: make iterate_sessions a global symbol ceph: make ceph_create_session_msg a global symbol ceph: fix comment about short copies in ceph_write_end ceph: fix memory leak on decode error in ceph_handle_caps
2021-09-08Merge tag '9p-for-5.15-rc1' of git://github.com/martinetd/linuxLinus Torvalds4-6/+10
Pull 9p updates from Dominique Martinet: "A couple of harmless fixes, increase max tcp msize (64KB -> 1MB), and increase default msize (8KB -> 128KB) The default increase has been discussed with Christian for the qemu side of things but makes sense for all supported transports" * tag '9p-for-5.15-rc1' of git://github.com/martinetd/linux: net/9p: increase default msize to 128k net/9p: use macro to define default msize net/9p: increase tcp max msize to 1MB 9p/xen: Fix end of loop tests for list_for_each_entry 9p/trans_virtio: Remove sysfs file on probe failure
2021-09-08arch: remove compat_alloc_user_spaceArnd Bergmann24-333/+12
All users of compat_alloc_user_space() and copy_in_user() have been removed from the kernel, only a few functions in sparc remain that can be changed to calling arch_copy_in_user() instead. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Feng Tang <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08compat: remove some compat entry pointsArnd Bergmann14-117/+42
These are all handled correctly when calling the native system call entry point, so remove the special cases. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Feng Tang <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08mm: simplify compat numa syscallsArnd Bergmann2-129/+64
The compat implementations for mbind, get_mempolicy, set_mempolicy and migrate_pages are just there to handle the subtly different layout of bitmaps on 32-bit hosts. The compat implementation however lacks some of the checks that are present in the native one, in particular for checking that the extra bits are all zero when user space has a larger mask size than the kernel. Worse, those extra bits do not get cleared when copying in or out of the kernel, which can lead to incorrect data as well. Unify the implementation to handle the compat bitmap layout directly in the get_nodes() and copy_nodes_to_user() helpers. Splitting out the get_bitmap() helper from get_nodes() also helps readability of the native case. On x86, two additional problems are addressed by this: compat tasks can pass a bitmap at the end of a mapping, causing a fault when reading across the page boundary for a 64-bit word. x32 tasks might also run into problems with get_mempolicy corrupting data when an odd number of 32-bit words gets passed. On parisc the migrate_pages() system call apparently had the wrong calling convention, as big-endian architectures expect the words inside of a bitmap to be swapped. This is not a problem though since parisc has no NUMA support. [[email protected]: fix mempolicy crash] Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/lkml/YQPLG20V3dmOfq3a@osiris/ Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Feng Tang <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08mm: simplify compat_sys_move_pagesArnd Bergmann1-15/+30
The compat move_pages() implementation uses compat_alloc_user_space() for converting the pointer array. Moving the compat handling into the function itself is a bit simpler and lets us avoid the compat_alloc_user_space() call. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Feng Tang <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08kexec: avoid compat_alloc_user_spaceArnd Bergmann1-36/+25
kimage_alloc_init() expects a __user pointer, so compat_sys_kexec_load() uses compat_alloc_user_space() to convert the layout and put it back onto the user space caller stack. Moving the user space access into the syscall handler directly actually makes the code simpler, as the conversion for compat mode can now be done on kernel memory. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/lkml/YPbtsU4GX6PL7%[email protected]/ Link: https://lore.kernel.org/lkml/[email protected]/ Signed-off-by: Arnd Bergmann <[email protected]> Co-developed-by: Eric Biederman <[email protected]> Co-developed-by: Christoph Hellwig <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Cc: Al Viro <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Feng Tang <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08kexec: move locking into do_kexec_loadArnd Bergmann1-28/+16
Patch series "compat: remove compat_alloc_user_space", v5. Going through compat_alloc_user_space() to convert indirect system call arguments tends to add complexity compared to handling the native and compat logic in the same code. This patch (of 6): The locking is the same between the native and compat version of sys_kexec_load(), so it can be done in the common implementation to reduce duplication. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Co-developed-by: Eric Biederman <[email protected]> Co-developed-by: Christoph Hellwig <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Vasily Gorbik <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Al Viro <[email protected]> Cc: Feng Tang <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08mm: migrate: change to use bool type for 'page_was_mapped'Baolin Wang1-2/+2
Change to use bool type for 'page_was_mapped' variable making it more readable. Link: https://lkml.kernel.org/r/ce1279df18d2c163998c403e0b5ec6d3f6f90f7a.1629447552.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08mm: migrate: fix the incorrect function name in commentsBaolin Wang1-1/+1
since commit a98a2f0c8ce1 ("mm/rmap: split migration into its own function"), the migration ptes establishment has been split into a separate try_to_migrate() function, thus update the related comments. Link: https://lkml.kernel.org/r/5b824bad6183259c916ae6cf42f81d14c6118b06.1629447552.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <[email protected]> Reviewed-by: Yang Shi <[email protected]> Reviewed-by: Alistair Popple <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08mm: migrate: introduce a local variable to get the number of pagesBaolin Wang1-2/+3
Use thp_nr_pages() instead of compound_nr() to get the number of pages for THP page, meanwhile introducing a local variable 'nr_pages' to avoid getting the number of pages repeatedly. Link: https://lkml.kernel.org/r/a8e331ac04392ee230c79186330fb05e86a2aa77.1629447552.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: Alistair Popple <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08mm/vmstat: protect per cpu variables with preempt disable on RTIngo Molnar1-0/+48
Disable preemption on -RT for the vmstat code. On vanila the code runs in IRQ-off regions while on -RT it may not when stats are updated under a local_lock. "preempt_disable" ensures that the same resources is not updated in parallel due to preemption. This patch differs from the preempt-rt version where __count_vm_event and __count_vm_events are also protected. The counters are explicitly "allowed to be to be racy" so there is no need to protect them from preemption. Only the accurate page stats that are updated by a read-modify-write need protection. This patch also differs in that a preempt_[en|dis]able_rt helper is not used. As vmstat is the only user of the helper, it was suggested that it be open-coded in vmstat.c instead of risking the helper being used in unnecessary contexts. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08ksmbd: fix control flow issues in sid_to_id()Namjae Jeon1-26/+22
Addresses-Coverity reported Control flow issues in sid_to_id() /fs/ksmbd/smbacl.c: 277 in sid_to_id() 271 272 if (sidtype == SIDOWNER) { 273 kuid_t uid; 274 uid_t id; 275 276 id = le32_to_cpu(psid->sub_auth[psid->num_subauth - 1]); >>> CID 1506810: Control flow issues (NO_EFFECT) >>> This greater-than-or-equal-to-zero comparison of an unsigned value >>> is always true. "id >= 0U". 277 if (id >= 0) { 278 /* 279 * Translate raw sid into kuid in the server's user 280 * namespace. 281 */ 282 uid = make_kuid(&init_user_ns, id); Addresses-Coverity: ("Control flow issues") Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-08ksmbd: fix read of uninitialized variable ret in set_file_basic_infoNamjae Jeon1-1/+1
Addresses-Coverity reported Uninitialized variables warninig : /fs/ksmbd/smb2pdu.c: 5525 in set_file_basic_info() 5519 if (!rc) { 5520 inode->i_ctime = ctime; 5521 mark_inode_dirty(inode); 5522 } 5523 inode_unlock(inode); 5524 } >>> CID 1506805: Uninitialized variables (UNINIT) >>> Using uninitialized value "rc". 5525 return rc; 5526 } 5527 5528 static int set_file_allocation_info(struct ksmbd_work *work, 5529 struct ksmbd_file *fp, char *buf) 5530 { Addresses-Coverity: ("Uninitialized variable") Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-08ksmbd: add missing assignments to ret on ndr_read_int64 read callsColin Ian King1-2/+2
Currently there are two ndr_read_int64 calls where ret is being checked for failure but ret is not being assigned a return value from the call. Static analyis is reporting the checks on ret as dead code. Fix this. Addresses-Coverity: ("Logical dead code") Reviewed-by: Sergey Senozhatsky <[email protected]> Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-09-08io_uring: fix missing mb() before waitqueue_activePavel Begunkov1-1/+4
In case of !SQPOLL, io_cqring_ev_posted_iopoll() doesn't provide a memory barrier required by waitqueue_active(&ctx->poll_wait). There is a wq_has_sleeper(), which does smb_mb() inside, but it's called only for SQPOLL. Fixes: 5fd4617840596 ("io_uring: be smarter about waking multiple CQ ring waiters") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/2982e53bcea2274006ed435ee2a77197107d8a29.1631130542.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2021-09-08Merge branch 'akpm' (patches from Andrew)Linus Torvalds149-930/+5341
Merge more updates from Andrew Morton: "147 patches, based on 7d2a07b769330c34b4deabeed939325c77a7ec2f. Subsystems affected by this patch series: mm (memory-hotplug, rmap, ioremap, highmem, cleanups, secretmem, kfence, damon, and vmscan), alpha, percpu, procfs, misc, core-kernel, MAINTAINERS, lib, checkpatch, epoll, init, nilfs2, coredump, fork, pids, criu, kconfig, selftests, ipc, and scripts" * emailed patches from Andrew Morton <[email protected]>: (94 commits) scripts: check_extable: fix typo in user error message mm/workingset: correct kernel-doc notations ipc: replace costly bailout check in sysvipc_find_ipc() selftests/memfd: remove unused variable Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCH configs: remove the obsolete CONFIG_INPUT_POLLDEV prctl: allow to setup brk for et_dyn executables pid: cleanup the stale comment mentioning pidmap_init(). kernel/fork.c: unexport get_{mm,task}_exe_file coredump: fix memleak in dump_vma_snapshot() fs/coredump.c: log if a core dump is aborted due to changed file permissions nilfs2: use refcount_dec_and_lock() to fix potential UAF nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_group nilfs2: fix memory leak in nilfs_sysfs_delete_##name##_group nilfs2: fix memory leak in nilfs_sysfs_create_##name##_group nilfs2: fix NULL pointer in nilfs_##name##_attr_release nilfs2: fix memory leak in nilfs_sysfs_create_device_group trap: cleanup trap_init() init: move usermodehelper_enable() to populate_rootfs() ...
2021-09-08tracing/boot: Fix to loop on only subkeysMasami Hiramatsu1-3/+3
Since the commit e5efaeb8a8f5 ("bootconfig: Support mixing a value and subkeys under a key") allows to co-exist a value node and key nodes under a node, xbc_node_for_each_child() is not only returning key node but also a value node. In the boot-time tracing using xbc_node_for_each_child() to iterate the events, groups and instances, but those must be key nodes. Thus it must use xbc_node_for_each_subkey(). Link: https://lkml.kernel.org/r/163112988361.74896.2267026262061819145.stgit@devnote2 Fixes: e5efaeb8a8f5 ("bootconfig: Support mixing a value and subkeys under a key") Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-09-08Merge tag 'mm-slub-5.15-rc1' of ↵Linus Torvalds4-251/+563
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux Pull SLUB updates from Vlastimil Babka: "SLUB: reduce irq disabled scope and make it RT compatible This series was initially inspired by Mel's pcplist local_lock rewrite, and also interest to better understand SLUB's locking and the new primitives and RT variants and implications. It makes SLUB compatible with PREEMPT_RT and generally more preemption-friendly, apparently without significant regressions, as the fast paths are not affected. The main changes to SLUB by this series: - irq disabling is now only done for minimum amount of time needed to protect the strict kmem_cache_cpu fields, and as part of spin lock, local lock and bit lock operations to make them irq-safe - SLUB is fully PREEMPT_RT compatible The series should now be sufficiently tested in both RT and !RT configs, mainly thanks to Mike. The RFC/v1 version also got basic performance screening by Mel that didn't show major regressions. Mike's testing with hackbench of v2 on !RT reported negligible differences [6]: virgin(ish) tip 5.13.0.g60ab3ed-tip 7,320.67 msec task-clock # 7.792 CPUs utilized ( +- 0.31% ) 221,215 context-switches # 0.030 M/sec ( +- 3.97% ) 16,234 cpu-migrations # 0.002 M/sec ( +- 4.07% ) 13,233 page-faults # 0.002 M/sec ( +- 0.91% ) 27,592,205,252 cycles # 3.769 GHz ( +- 0.32% ) 8,309,495,040 instructions # 0.30 insn per cycle ( +- 0.37% ) 1,555,210,607 branches # 212.441 M/sec ( +- 0.42% ) 5,484,209 branch-misses # 0.35% of all branches ( +- 2.13% ) 0.93949 +- 0.00423 seconds time elapsed ( +- 0.45% ) 0.94608 +- 0.00384 seconds time elapsed ( +- 0.41% ) (repeat) 0.94422 +- 0.00410 seconds time elapsed ( +- 0.43% ) 5.13.0.g60ab3ed-tip +slub-local-lock-v2r3 7,343.57 msec task-clock # 7.776 CPUs utilized ( +- 0.44% ) 223,044 context-switches # 0.030 M/sec ( +- 3.02% ) 16,057 cpu-migrations # 0.002 M/sec ( +- 4.03% ) 13,164 page-faults # 0.002 M/sec ( +- 0.97% ) 27,684,906,017 cycles # 3.770 GHz ( +- 0.45% ) 8,323,273,871 instructions # 0.30 insn per cycle ( +- 0.28% ) 1,556,106,680 branches # 211.901 M/sec ( +- 0.31% ) 5,463,468 branch-misses # 0.35% of all branches ( +- 1.33% ) 0.94440 +- 0.00352 seconds time elapsed ( +- 0.37% ) 0.94830 +- 0.00228 seconds time elapsed ( +- 0.24% ) (repeat) 0.93813 +- 0.00440 seconds time elapsed ( +- 0.47% ) (repeat) RT configs showed some throughput regressions, but that's expected tradeoff for the preemption improvements through the RT mutex. It didn't prevent the v2 to be incorporated to the 5.13 RT tree [7], leading to testing exposure and bugfixes. Before the series, SLUB is lockless in both allocation and free fast paths, but elsewhere, it's disabling irqs for considerable periods of time - especially in allocation slowpath and the bulk allocation, where IRQs are re-enabled only when a new page from the page allocator is needed, and the context allows blocking. The irq disabled sections can then include deactivate_slab() which walks a full freelist and frees the slab back to page allocator or unfreeze_partials() going through a list of percpu partial slabs. The RT tree currently has some patches mitigating these, but we can do much better in mainline too. Patches 1-6 are straightforward improvements or cleanups that could exist outside of this series too, but are prerequsities. Patches 7-9 are also preparatory code changes without functional changes, but not so useful without the rest of the series. Patch 10 simplifies the fast paths on systems with preemption, based on (hopefully correct) observation that the current loops to verify tid are unnecessary. Patches 11-20 focus on reducing irq disabled scope in the allocation slowpath: - patch 11 moves disabling of irqs into ___slab_alloc() from its callers, which are the allocation slowpath, and bulk allocation. Instead these callers only disable preemption to stabilize the cpu. - The following patches then gradually reduce the scope of disabled irqs in ___slab_alloc() and the functions called from there. As of patch 14, the re-enabling of irqs based on gfp flags before calling the page allocator is removed from allocate_slab(). As of patch 17, it's possible to reach the page allocator (in case of existing slabs depleted) without disabling and re-enabling irqs a single time. Pathces 21-26 reduce the scope of disabled irqs in functions related to unfreezing percpu partial slab. Patch 27 is preparatory. Patch 28 is adopted from the RT tree and converts the flushing of percpu slabs on all cpus from using IPI to workqueue, so that the processing isn't happening with irqs disabled in the IPI handler. The flushing is not performance critical so it should be acceptable. Patch 29 also comes from RT tree and makes object_map_lock RT compatible. Patch 30 make slab_lock irq-safe on RT where we cannot rely on having irq disabled from the list_lock spin lock usage. Patch 31 changes kmem_cache_cpu->partial handling in put_cpu_partial() from cmpxchg loop to a short irq disabled section, which is used by all other code modifying the field. This addresses a theoretical race scenario pointed out by Jann, and makes the critical section safe wrt with RT local_lock semantics after the conversion in patch 35. Patch 32 changes preempt disable to migrate disable, so that the nested list_lock spinlock is safe to take on RT. Because migrate_disable() is a function call even on !RT, a small set of private wrappers is introduced to keep using the cheaper preempt_disable() on !PREEMPT_RT configurations. As of this patch, SLUB should be already compatible with RT's lock semantics. Finally, patch 33 changes irq disabled sections that protect kmem_cache_cpu fields in the slow paths, with a local lock. However on PREEMPT_RT it means the lockless fast paths can now preempt slow paths which don't expect that, so the local lock has to be taken also in the fast paths and they are no longer lockless. RT folks seem to not mind this tradeoff. The patch also updates the locking documentation in the file's comment" Mike Galbraith and Mel Gorman verified that their earlier testing observations still hold for the final series: Link: https://lore.kernel.org/lkml/[email protected]/ Link: https://lore.kernel.org/lkml/[email protected]/ * tag 'mm-slub-5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/linux: (33 commits) mm, slub: convert kmem_cpu_slab protection to local_lock mm, slub: use migrate_disable() on PREEMPT_RT mm, slub: protect put_cpu_partial() with disabled irqs instead of cmpxchg mm, slub: make slab_lock() disable irqs with PREEMPT_RT mm: slub: make object_map_lock a raw_spinlock_t mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context mm, slab: split out the cpu offline variant of flush_slab() mm, slub: don't disable irqs in slub_cpu_dead() mm, slub: only disable irq with spin_lock in __unfreeze_partials() mm, slub: separate detaching of partial list in unfreeze_partials() from unfreezing mm, slub: detach whole partial list at once in unfreeze_partials() mm, slub: discard slabs in unfreeze_partials() without irqs disabled mm, slub: move irq control into unfreeze_partials() mm, slub: call deactivate_slab() without disabling irqs mm, slub: make locking in deactivate_slab() irq-safe mm, slub: move reset of c->page and freelist out of deactivate_slab() mm, slub: stop disabling irqs around get_partial() mm, slub: check new pages with restored irqs mm, slub: validate slab from partial list or page allocator before making it cpu slab mm, slub: restore irqs around calling new_slab() ...
2021-09-08selftests/ftrace: Exclude "(fault)" in testing add/remove eprobe eventsSteven Rostedt (VMware)1-1/+1
The original test for adding and removing eprobes used synthetic events and retrieved the filename from the open system call at the end of the system call. This would allow it to always be loaded into the page tables when accessed. Masami suggested that the test was too complex for just testing add and remove, so it was changed to test just adding and removing an event probe on top of the start of the open system call event. Now it is possible that the filename will not be loaded into memory at the time the eprobe is triggered, and will result in "(fault)" being displayed in the event. This causes the test to fail. Account for "(fault)" also being one of the values of the filename field of the event probe. Link: https://lkml.kernel.org/r/[email protected] Fixes: 079db70794ec5 ("selftests/ftrace: Add test case to test adding and removing of event probe") Acked-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-09-08tracing: Dynamically allocate the per-elt hist_elt_data arrayTom Zanussi1-2/+12
Setting the hist_elt_data.field_var_str[] array unconditionally to a size of SYNTH_FIELD_MAX elements wastes space unnecessarily. The actual number of elements needed can be calculated at run-time instead. In most cases, this will save a lot of space since it's a per-elt array which isn't normally close to being full. It also allows us to increase SYNTH_FIELD_MAX without worrying about even more wastage when we do that. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Tom Zanussi <[email protected]> Tested-by: Artem Bityutskiy <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-09-08tracing: synth events: increase max fields countArtem Bityutskiy1-1/+1
Sometimes it is useful to construct larger synthetic trace events. Increase 'SYNTH_FIELDS_MAX' (maximum number of fields in a synthetic event) from 32 to 64. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Artem Bityutskiy <[email protected]> Acked-by: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-09-08tools/bootconfig: Show whole test command for each test caseMasami Hiramatsu1-2/+2
Show whole test command instead of only the 3rd argument. This will clear to show what will be actually tested by each test case. Link: https://lkml.kernel.org/r/163077088607.222577.14786016266462495017.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-09-08bootconfig: Fix missing return check of xbc_node_compose_key functionJulio Faracco1-1/+3
The function `xbc_show_list should` handle the keys during the composition. Even the errors returned by the compose function. Instead of removing the `ret` variable, it should save the value and show the exact error. This missing variable is causing a compilation issue also. Link: https://lkml.kernel.org/r/163077087861.222577.12884543474750968146.stgit@devnote2 Fixes: e5efaeb8a8f5 ("bootconfig: Support mixing a value and subkeys under a key") Signed-off-by: Julio Faracco <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Cc: [email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-09-08tools/bootconfig: Fix tracing_on option checking in ftrace2bconf.shMasami Hiramatsu1-2/+2
Since tracing_on indicates only "1" (default) or "0", ftrace2bconf.sh only need to check the value is "0". Link: https://lkml.kernel.org/r/163077087144.222577.6888011847727968737.stgit@devnote2 Fixes: 55ed4560774d ("tools/bootconfig: Add tracing_on support to helper scripts") Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-09-08docs: bootconfig: Add how to use bootconfig for kernel parametersMasami Hiramatsu1-1/+38
Add a section to describe how to use the bootconfig for specifying kernel and init parameters. This is an important section because it is the reason why this document is under the admin-guide. Link: https://lkml.kernel.org/r/163077086399.222577.5881779375643977991.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: [email protected] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-09-08init/bootconfig: Reorder init parameter from bootconfig and cmdlineMasami Hiramatsu1-9/+14
Reorder the init parameters from bootconfig and kernel cmdline so that the kernel cmdline always be the last part of the parameters as below. " -- "[bootconfig init params][cmdline init params] This change will help us to prevent that bootconfig init params overwrite the init params which user gives in the command line. Link: https://lkml.kernel.org/r/163077085675.222577.5665176468023636160.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-09-08init: bootconfig: Remove all bootconfig data when the init memory is removedMasami Hiramatsu1-2/+12
Since the bootconfig is used only in the init functions, it doesn't need to keep the data after boot. Free it when the init memory is removed. Link: https://lkml.kernel.org/r/163077084958.222577.5924961258513004428.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-09-08tracing/osnoise: Fix missed cpus_read_unlock() in start_per_cpu_kthreads()Qiang.Zhang1-3/+3
When start_kthread() return error, the cpus_read_unlock() need to be called. Link: https://lkml.kernel.org/r/[email protected] Cc: <[email protected]> Fixes: c8895e271f79 ("trace/osnoise: Support hotplug operations") Acked-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Qiang.Zhang <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2021-09-08ACPI: PRM: Find PRMT table before parsing itAubrey Li1-1/+9
Find and verify PRMT before parsing it, which eliminates a warning on machines without PRMT: [ 7.197173] ACPI: PRMT not present Fixes: cefc7ca46235 ("ACPI: PRM: implement OperationRegion handler for the PlatformRtMechanism subtype") Signed-off-by: Aubrey Li <[email protected]> Tested-by: Paul Menzel <[email protected]> Cc: 5.14+ <[email protected]> # 5.14+ [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki <[email protected]>
2021-09-08scripts: check_extable: fix typo in user error messageRandy Dunlap1-1/+1
Fix typo ("and" should be "an") in an error message. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Cc: Quentin Casasnovas <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08mm/workingset: correct kernel-doc notationsRandy Dunlap1-1/+1
Use the documented kernel-doc format to prevent kernel-doc warnings. mm/workingset.c:256: warning: No description found for return value of 'workingset_eviction' mm/workingset.c:285: warning: Function parameter or member 'folio' not described in 'workingset_refault' mm/workingset.c:285: warning: Excess function parameter 'page' description in 'workingset_refault' Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08ipc: replace costly bailout check in sysvipc_find_ipc()Rafael Aquini1-12/+4
sysvipc_find_ipc() was left with a costly way to check if the offset position fed to it is bigger than the total number of IPC IDs in use. So much so that the time it takes to iterate over /proc/sysvipc/* files grows exponentially for a custom benchmark that creates "N" SYSV shm segments and then times the read of /proc/sysvipc/shm (milliseconds): 12 msecs to read 1024 segs from /proc/sysvipc/shm 18 msecs to read 2048 segs from /proc/sysvipc/shm 65 msecs to read 4096 segs from /proc/sysvipc/shm 325 msecs to read 8192 segs from /proc/sysvipc/shm 1303 msecs to read 16384 segs from /proc/sysvipc/shm 5182 msecs to read 32768 segs from /proc/sysvipc/shm The root problem lies with the loop that computes the total amount of ids in use to check if the "pos" feeded to sysvipc_find_ipc() grew bigger than "ids->in_use". That is a quite inneficient way to get to the maximum index in the id lookup table, specially when that value is already provided by struct ipc_ids.max_idx. This patch follows up on the optimization introduced via commit 15df03c879836 ("sysvipc: make get_maxid O(1) again") and gets rid of the aforementioned costly loop replacing it by a simpler checkpoint based on ipc_get_maxidx() returned value, which allows for a smooth linear increase in time complexity for the same custom benchmark: 2 msecs to read 1024 segs from /proc/sysvipc/shm 2 msecs to read 2048 segs from /proc/sysvipc/shm 4 msecs to read 4096 segs from /proc/sysvipc/shm 9 msecs to read 8192 segs from /proc/sysvipc/shm 19 msecs to read 16384 segs from /proc/sysvipc/shm 39 msecs to read 32768 segs from /proc/sysvipc/shm Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Rafael Aquini <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Acked-by: Manfred Spraul <[email protected]> Cc: Waiman Long <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08selftests/memfd: remove unused variableGreg Thelen1-1/+1
Commit 544029862cbb ("selftests/memfd: add tests for F_SEAL_FUTURE_WRITE seal") added an unused variable to mfd_assert_reopen_fd(). Delete the unused variable. Link: https://lkml.kernel.org/r/[email protected] Fixes: 544029862cbb ("selftests/memfd: add tests for F_SEAL_FUTURE_WRITE seal") Signed-off-by: Greg Thelen <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: "Joel Fernandes (Google)" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08Kconfig.debug: drop selecting non-existing HARDLOCKUP_DETECTOR_ARCHLukas Bulwahn1-1/+0
Commit 05a4a9527931 ("kernel/watchdog: split up config options") adds a new config HARDLOCKUP_DETECTOR, which selects the non-existing config HARDLOCKUP_DETECTOR_ARCH. Hence, ./scripts/checkkconfigsymbols.py warns: HARDLOCKUP_DETECTOR_ARCH Referencing files: lib/Kconfig.debug Simply drop selecting the non-existing HARDLOCKUP_DETECTOR_ARCH. Link: https://lkml.kernel.org/r/[email protected] Fixes: 05a4a9527931 ("kernel/watchdog: split up config options") Signed-off-by: Lukas Bulwahn <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Babu Moger <[email protected]> Cc: Don Zickus <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08configs: remove the obsolete CONFIG_INPUT_POLLDEVZenghui Yu9-9/+0
This CONFIG option was removed in commit 278b13ce3a89 ("Input: remove input_polled_dev implementation") so there's no point to keep it in defconfigs any longer. Get rid of the leftover for all arches. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zenghui Yu <[email protected]> Cc: Dmitry Torokhov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08prctl: allow to setup brk for et_dyn executablesCyrill Gorcunov1-7/+0
Keno Fischer reported that when a binray loaded via ld-linux-x the prctl(PR_SET_MM_MAP) doesn't allow to setup brk value because it lays before mm:end_data. For example a test program shows | # ~/t | | start_code 401000 | end_code 401a15 | start_stack 7ffce4577dd0 | start_data 403e10 | end_data 40408c | start_brk b5b000 | sbrk(0) b5b000 and when executed via ld-linux | # /lib64/ld-linux-x86-64.so.2 ~/t | | start_code 7fc25b0a4000 | end_code 7fc25b0c4524 | start_stack 7fffcc6b2400 | start_data 7fc25b0ce4c0 | end_data 7fc25b0cff98 | start_brk 55555710c000 | sbrk(0) 55555710c000 This of course prevent criu from restoring such programs. Looking into how kernel operates with brk/start_brk inside brk() syscall I don't see any problem if we allow to setup brk/start_brk without checking for end_data. Even if someone pass some weird address here on a purpose then the worst possible result will be an unexpected unmapping of existing vma (own vma, since prctl works with the callers memory) but test for RLIMIT_DATA is still valid and a user won't be able to gain more memory in case of expanding VMAs via new values shipped with prctl call. Link: https://lkml.kernel.org/r/20210121221207.GB2174@grain Fixes: bbdc6076d2e5 ("binfmt_elf: move brk out of mmap when doing direct loader exec") Signed-off-by: Cyrill Gorcunov <[email protected]> Reported-by: Keno Fischer <[email protected]> Acked-by: Andrey Vagin <[email protected]> Tested-by: Andrey Vagin <[email protected]> Cc: Dmitry Safonov <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Pavel Tikhomirov <[email protected]> Cc: Alexander Mikhalitsyn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08pid: cleanup the stale comment mentioning pidmap_init().Takahiro Itazuri1-1/+1
pidmap_init() has already been replaced with pid_idr_init() in the commit 95846ecf9dac ("pid: replace pid bitmap implementation with IDR API"). Cleanup the stale comment which still mentions it. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Takahiro Itazuri <[email protected]> Cc: Kuniyuki Iwashima <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08kernel/fork.c: unexport get_{mm,task}_exe_fileChristoph Hellwig1-2/+0
Only used by core code and the tomoyo which can't be a module either. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08coredump: fix memleak in dump_vma_snapshot()QiuXi1-1/+3
dump_vma_snapshot() allocs memory for *vma_meta, when dump_vma_snapshot() returns -EFAULT, the memory will be leaked, so we free it correctly. Link: https://lkml.kernel.org/r/[email protected] Fixes: a07279c9a8cd7 ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot") Signed-off-by: QiuXi <[email protected]> Cc: Al Viro <[email protected]> Cc: Jann Horn <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08fs/coredump.c: log if a core dump is aborted due to changed file permissionsDavid Oberhollenzer1-2/+9
For obvious security reasons, a core dump is aborted if the filesystem cannot preserve ownership or permissions of the dump file. This affects filesystems like e.g. vfat, but also something like a 9pfs share in a Qemu test setup, running as a regular user, depending on the security model used. In those cases, the result is an empty core file and a confused user. To hopefully save other people a lot of time figuring out the cause, this patch adds a simple log message for those specific cases. [[email protected]: s/|%s/%s/ in printk text] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: David Oberhollenzer <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08nilfs2: use refcount_dec_and_lock() to fix potential UAFZhen Lei1-5/+4
When the refcount is decreased to 0, the resource reclamation branch is entered. Before CPU0 reaches the race point (1), CPU1 may obtain the spinlock and traverse the rbtree to find 'root', see nilfs_lookup_root(). Although CPU1 will call refcount_inc() to increase the refcount, it is obviously too late. CPU0 will release 'root' directly, CPU1 then accesses 'root' and triggers UAF. Use refcount_dec_and_lock() to ensure that both the operations of decrease refcount to 0 and link deletion are lock protected eliminates this risk. CPU0 CPU1 nilfs_put_root(): <-------- (1) spin_lock(&nilfs->ns_cptree_lock); rb_erase(&root->rb_node, &nilfs->ns_cptree); spin_unlock(&nilfs->ns_cptree_lock); kfree(root); <-------- use-after-free refcount_t: underflow; use-after-free. WARNING: CPU: 2 PID: 9476 at lib/refcount.c:28 \ refcount_warn_saturate+0x1cf/0x210 lib/refcount.c:28 Modules linked in: CPU: 2 PID: 9476 Comm: syz-executor.0 Not tainted 5.10.45-rc1+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), ... RIP: 0010:refcount_warn_saturate+0x1cf/0x210 lib/refcount.c:28 ... ... Call Trace: __refcount_sub_and_test include/linux/refcount.h:283 [inline] __refcount_dec_and_test include/linux/refcount.h:315 [inline] refcount_dec_and_test include/linux/refcount.h:333 [inline] nilfs_put_root+0xc1/0xd0 fs/nilfs2/the_nilfs.c:795 nilfs_segctor_destroy fs/nilfs2/segment.c:2749 [inline] nilfs_detach_log_writer+0x3fa/0x570 fs/nilfs2/segment.c:2812 nilfs_put_super+0x2f/0xf0 fs/nilfs2/super.c:467 generic_shutdown_super+0xcd/0x1f0 fs/super.c:464 kill_block_super+0x4a/0x90 fs/super.c:1446 deactivate_locked_super+0x6a/0xb0 fs/super.c:335 deactivate_super+0x85/0x90 fs/super.c:366 cleanup_mnt+0x277/0x2e0 fs/namespace.c:1118 __cleanup_mnt+0x15/0x20 fs/namespace.c:1125 task_work_run+0x8e/0x110 kernel/task_work.c:151 tracehook_notify_resume include/linux/tracehook.h:188 [inline] exit_to_user_mode_loop kernel/entry/common.c:164 [inline] exit_to_user_mode_prepare+0x13c/0x170 kernel/entry/common.c:191 syscall_exit_to_user_mode+0x16/0x30 kernel/entry/common.c:266 do_syscall_64+0x45/0x80 arch/x86/entry/common.c:56 entry_SYSCALL_64_after_hwframe+0x44/0xa9 There is no reproduction program, and the above is only theoretical analysis. Link: https://lkml.kernel.org/r/[email protected] Fixes: ba65ae4729bf ("nilfs2: add checkpoint tree to nilfs object") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zhen Lei <[email protected]> Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08nilfs2: fix memory leak in nilfs_sysfs_delete_snapshot_groupNanyong Sun1-1/+1
kobject_put() should be used to cleanup the memory associated with the kobject instead of kobject_del(). See the section "Kobject removal" of "Documentation/core-api/kobject.rst". Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nanyong Sun <[email protected]> Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-09-08nilfs2: fix memory leak in nilfs_sysfs_create_snapshot_groupNanyong Sun1-2/+2
If kobject_init_and_add returns with error, kobject_put() is needed here to avoid memory leak, because kobject_init_and_add may return error without freeing the memory associated with the kobject it allocated. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Nanyong Sun <[email protected]> Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>