aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-12-11Merge tag 'iommu-fix-v6.1-rc8' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fix from Joerg Roedel: - Fix device mask to catch all affected devices in the recently added quirk for QAT devices in the Intel VT-d driver. * tag 'iommu-fix-v6.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/vt-d: Fix buggy QAT device mask
2022-12-10Merge tag 'mm-hotfixes-stable-2022-12-10-1' of ↵Linus Torvalds7-19/+29
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Nine hotfixes. Six for MM, three for other areas. Four of these patches address post-6.0 issues" * tag 'mm-hotfixes-stable-2022-12-10-1' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: memcg: fix possible use-after-free in memcg_write_event_control() MAINTAINERS: update Muchun Song's email mm/gup: fix gup_pud_range() for dax mmap: fix do_brk_flags() modifying obviously incorrect VMAs mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bit tmpfs: fix data loss from failed fallocate kselftests: cgroup: update kmem test precision tolerance mm: do not BUG_ON missing brk mapping, because userspace can unmap it mailmap: update Matti Vaittinen's email address
2022-12-10fs: sysv: Fix sysv_nblocks() returns wrong valueChen Zhongjin1-1/+1
sysv_nblocks() returns 'blocks' rather than 'res', which only counting the number of triple-indirect blocks and causing sysv_getattr() gets a wrong result. [AV: this is actually a sysv counterpart of minixfs fix - 0fcd426de9d0 "[PATCH] minix block usage counting fix" in historical tree; mea culpa, should've thought to check fs/sysv back then...] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Chen Zhongjin <[email protected]> Signed-off-by: Al Viro <[email protected]>
2022-12-10NFSv4.2: Change the default KConfig value for READ_PLUSAnna Schumaker1-4/+4
Now that we've worked out performance issues and have a server patch addressing the failed xfstests, we can safely enable this feature by default. Signed-off-by: Anna Schumaker <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2022-12-10Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds2-5/+22
Pull ARM fix from Russell King: "One further ARM fix for 6.1 from Wang Kefeng, fixing up the handling for kfence faults" * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: 9278/1: kfence: only handle translation faults
2022-12-10NFSD: Avoid clashing function prototypesKees Cook1-255/+377
When built with Control Flow Integrity, function prototypes between caller and function declaration must match. These mismatches are visible at compile time with the new -Wcast-function-type-strict in Clang[1]. There were 97 warnings produced by NFS. For example: fs/nfsd/nfs4xdr.c:2228:17: warning: cast from '__be32 (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, struct nfsd4_access *)') to 'nfsd4_dec' (aka 'unsigned int (*)(struct nfsd4_compoundargs *, void *)') converts to incompatible function type [-Wcast-function-type-strict] [OP_ACCESS] = (nfsd4_dec)nfsd4_decode_access, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The enc/dec callbacks were defined as passing "void *" as the second argument, but were being implicitly cast to a new type. Replace the argument with union nfsd4_op_u, and perform explicit member selection in the function body. There are no resulting binary differences. Changes were made mechanically using the following Coccinelle script, with minor by-hand fixes for members that didn't already match their existing argument name: @find@ identifier func; type T, opsT; identifier ops, N; @@ opsT ops[] = { [N] = (T) func, }; @already_void@ identifier find.func; identifier name; @@ func(..., -void +union nfsd4_op_u *name) { ... } @proto depends on !already_void@ identifier find.func; type T; identifier name; position p; @@ func@p(..., T name ) { ... } @script:python get_member@ type_name << proto.T; member; @@ coccinelle.member = cocci.make_ident(type_name.split("_", 1)[1].split(' ',1)[0]) @convert@ identifier find.func; type proto.T; identifier proto.name; position proto.p; identifier get_member.member; @@ func@p(..., - T name + union nfsd4_op_u *u ) { + T name = &u->member; ... } @cast@ identifier find.func; type T, opsT; identifier ops, N; @@ opsT ops[] = { [N] = - (T) func, }; Cc: Chuck Lever <[email protected]> Cc: Jeff Layton <[email protected]> Cc: Gustavo A. R. Silva <[email protected]> Cc: [email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2022-12-10SUNRPC: Fix crasher in unwrap_integ_data()Chuck Lever1-23/+32
If a zero length is passed to kmalloc() it returns 0x10, which is not a valid address. gss_verify_mic() subsequently crashes when it attempts to dereference that pointer. Instead of allocating this memory on every call based on an untrusted size value, use a piece of dynamically-allocated scratch memory that is always available. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Jeff Layton <[email protected]>
2022-12-10SUNRPC: Make the svc_authenticate tracepoint conditionalChuck Lever2-3/+4
Clean up: Simplify the tracepoint's only call site. Also, I noticed that when svc_authenticate() returns SVC_COMPLETE, it leaves rq_auth_stat set to an error value. That doesn't need to be recorded in the trace log. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Jeff Layton <[email protected]>
2022-12-10NFSD: Use only RQ_DROPME to signal the need to drop a replyChuck Lever2-3/+3
Clean up: NFSv2 has the only two usages of rpc_drop_reply in the NFSD code base. Since NFSv2 is going away at some point, replace these in order to simplify the "drop this reply?" check in nfsd_dispatch(). Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Jeff Layton <[email protected]>
2022-12-10SUNRPC: Clean up xdr_write_pages()Chuck Lever1-9/+13
Make it more evident how xdr_write_pages() updates the tail buffer by using the convention of naming the iov pointer variable "tail". I spent more than a couple of hours chasing through code to understand this, so someone is likely to find this useful later. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Jeff Layton <[email protected]>
2022-12-10SUNRPC: Don't leak netobj memory when gss_read_proxy_verf() failsChuck Lever1-2/+7
Fixes: 030d794bf498 ("SUNRPC: Use gssproxy upcall for server RPCGSS authentication.") Signed-off-by: Chuck Lever <[email protected]> Cc: <[email protected]> Reviewed-by: Jeff Layton <[email protected]>
2022-12-10NFSD: add CB_RECALL_ANY tracepointsDai Ngo3-0/+64
Add tracepoints to trace start and end of CB_RECALL_ANY operation. Signed-off-by: Dai Ngo <[email protected]> [ cel: added show_rca_mask() macro ] Signed-off-by: Chuck Lever <[email protected]>
2022-12-10NFSD: add delegation reaper to react to low memory conditionDai Ngo3-4/+102
The delegation reaper is called by nfsd memory shrinker's on the 'count' callback. It scans the client list and sends the courtesy CB_RECALL_ANY to the clients that hold delegations. To avoid flooding the clients with CB_RECALL_ANY requests, the delegation reaper sends only one CB_RECALL_ANY request to each client per 5 seconds. Signed-off-by: Dai Ngo <[email protected]> [ cel: moved definition of RCA4_TYPE_MASK_RDATA_DLG ] Signed-off-by: Chuck Lever <[email protected]>
2022-12-10NFSD: add support for sending CB_RECALL_ANYDai Ngo4-0/+84
Add XDR encode and decode function for CB_RECALL_ANY. Signed-off-by: Dai Ngo <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2022-12-10NFSD: refactoring courtesy_client_reaper to a generic low memory shrinkerDai Ngo1-9/+16
Refactoring courtesy_client_reaper to generic low memory shrinker so it can be used for other purposes. Signed-off-by: Dai Ngo <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2022-12-10trace: Relocate event helper filesChuck Lever12-12/+19
Steven Rostedt says: > The include/trace/events/ directory should only hold files that > are to create events, not headers that hold helper functions. > > Can you please move them out of include/trace/events/ as that > directory is "special" in the creation of events. Signed-off-by: Chuck Lever <[email protected]> Acked-by: Leon Romanovsky <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]> Acked-by: Anna Schumaker <[email protected]>
2022-12-10NFSD: pass range end to vfs_fsync_range() instead of countBrian Foster1-2/+3
_nfsd_copy_file_range() calls vfs_fsync_range() with an offset and count (bytes written), but the former wants the start and end bytes of the range to sync. Fix it up. Fixes: eac0b17a77fb ("NFSD add vfs_fsync after async copy is done") Signed-off-by: Brian Foster <[email protected]> Tested-by: Dai Ngo <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2022-12-10lockd: fix file selection in nlmsvc_cancel_blockedJeff Layton1-3/+4
We currently do a lock_to_openmode call based on the arguments from the NLM_UNLOCK call, but that will always set the fl_type of the lock to F_UNLCK, and the O_RDONLY descriptor is always chosen. Fix it to use the file_lock from the block instead. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2022-12-10lockd: ensure we use the correct file descriptor when unlockingJeff Layton1-4/+6
Shared locks are set on O_RDONLY descriptors and exclusive locks are set on O_WRONLY ones. nlmsvc_unlock however calls vfs_lock_file twice, once for each descriptor, but it doesn't reset fl_file. Ensure that it does. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2022-12-10lockd: set missing fl_flags field when retrieving argsJeff Layton2-0/+2
Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2022-12-10NFSD: Use struct_size() helper in alloc_session()Xiu Jianfeng1-5/+4
Use struct_size() helper to simplify the code, no functional changes. Signed-off-by: Xiu Jianfeng <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2022-12-10nfsd: return error if nfs4_setacl failsJeff Layton1-0/+2
With the addition of POSIX ACLs to struct nfsd_attrs, we no longer return an error if setting the ACL fails. Ensure we return the na_aclerr error on SETATTR if there is one. Fixes: c0cbe70742f4 ("NFSD: add posix ACLs to struct nfsd_attrs") Cc: Neil Brown <[email protected]> Reported-by: Yongcheng Yang <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2022-12-10lockd: set other missing fields when unlocking filesTrond Myklebust1-7/+10
vfs_lock_file() expects the struct file_lock to be fully initialised by the caller. Re-exported NFSv3 has been seen to Oops if the fl_file field is NULL. Fixes: aec158242b87 ("lockd: set fl_owner when unlocking files") Signed-off-by: Trond Myklebust <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Link: https://bugzilla.kernel.org/show_bug.cgi?id=216582 Signed-off-by: Chuck Lever <[email protected]>
2022-12-10NFSD: Add an nfsd_file_fsync tracepointChuck Lever2-1/+35
Add a tracepoint to capture the number of filecache-triggered fsync calls and which files needed it. Also, record when an fsync triggers a write verifier reset. Examples: <...>-97 [007] 262.505611: nfsd_file_free: inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400 <...>-97 [007] 262.505612: nfsd_file_fsync: inode=0xffff888171e08140 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d2400 ret=0 <...>-97 [007] 262.505623: nfsd_file_free: inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00 <...>-97 [007] 262.505624: nfsd_file_fsync: inode=0xffff888171e08dc0 ref=0 flags=GC may=WRITE nf_file=0xffff8881373d1e00 ret=0 Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Jeff Layton <[email protected]>
2022-12-10sunrpc: svc: Remove an unused static function svc_ungetu32()Li zeming1-7/+0
The svc_ungetu32 function is not used, you could remove it. Signed-off-by: Li zeming <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2022-12-10nfsd: fix up the filecache laundrette schedulingJeff Layton1-7/+5
We don't really care whether there are hashed entries when it comes to scheduling the laundrette. They might all be non-gc entries, after all. We only want to schedule it if there are entries on the LRU. Switch to using list_lru_count, and move the check into nfsd_file_gc_worker. The other callsite in nfsd_file_put doesn't need to count entries, since it only schedules the laundrette after adding an entry to the LRU. Signed-off-by: Jeff Layton <[email protected]> Signed-off-by: Chuck Lever <[email protected]>
2022-12-09memcg: fix possible use-after-free in memcg_write_event_control()Tejun Heo3-3/+14
memcg_write_event_control() accesses the dentry->d_name of the specified control fd to route the write call. As a cgroup interface file can't be renamed, it's safe to access d_name as long as the specified file is a regular cgroup file. Also, as these cgroup interface files can't be removed before the directory, it's safe to access the parent too. Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a call to __file_cft() which verified that the specified file is a regular cgroupfs file before further accesses. The cftype pointer returned from __file_cft() was no longer necessary and the commit inadvertently dropped the file type check with it allowing any file to slip through. With the invarients broken, the d_name and parent accesses can now race against renames and removals of arbitrary files and cause use-after-free's. Fix the bug by resurrecting the file type check in __file_cft(). Now that cgroupfs is implemented through kernfs, checking the file operations needs to go through a layer of indirection. Instead, let's check the superblock and dentry type. Link: https://lkml.kernel.org/r/Y5FRm/[email protected] Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft") Signed-off-by: Tejun Heo <[email protected]> Reported-by: Jann Horn <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: <[email protected]> [3.14+] Signed-off-by: Andrew Morton <[email protected]>
2022-12-09MAINTAINERS: update Muchun Song's emailMuchun Song2-2/+4
I'm moving to the @linux.dev account. Map my old addresses and update it to my new address. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Muchun Song <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-12-09mm/gup: fix gup_pud_range() for daxJohn Starks1-1/+1
For dax pud, pud_huge() returns true on x86. So the function works as long as hugetlb is configured. However, dax doesn't depend on hugetlb. Commit 414fd080d125 ("mm/gup: fix gup_pmd_range() for dax") fixed devmap-backed huge PMDs, but missed devmap-backed huge PUDs. Fix this as well. This fixes the below kernel panic: general protection fault, probably for non-canonical address 0x69e7c000cc478: 0000 [#1] SMP < snip > Call Trace: <TASK> get_user_pages_fast+0x1f/0x40 iov_iter_get_pages+0xc6/0x3b0 ? mempool_alloc+0x5d/0x170 bio_iov_iter_get_pages+0x82/0x4e0 ? bvec_alloc+0x91/0xc0 ? bio_alloc_bioset+0x19a/0x2a0 blkdev_direct_IO+0x282/0x480 ? __io_complete_rw_common+0xc0/0xc0 ? filemap_range_has_page+0x82/0xc0 generic_file_direct_write+0x9d/0x1a0 ? inode_update_time+0x24/0x30 __generic_file_write_iter+0xbd/0x1e0 blkdev_write_iter+0xb4/0x150 ? io_import_iovec+0x8d/0x340 io_write+0xf9/0x300 io_issue_sqe+0x3c3/0x1d30 ? sysvec_reschedule_ipi+0x6c/0x80 __io_queue_sqe+0x33/0x240 ? fget+0x76/0xa0 io_submit_sqes+0xe6a/0x18d0 ? __fget_light+0xd1/0x100 __x64_sys_io_uring_enter+0x199/0x880 ? __context_tracking_enter+0x1f/0x70 ? irqentry_exit_to_user_mode+0x24/0x30 ? irqentry_exit+0x1d/0x30 ? __context_tracking_exit+0xe/0x70 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x61/0xcb RIP: 0033:0x7fc97c11a7be < snip > </TASK> ---[ end trace 48b2e0e67debcaeb ]--- RIP: 0010:internal_get_user_pages_fast+0x340/0x990 < snip > Kernel panic - not syncing: Fatal exception Kernel Offset: disabled Link: https://lkml.kernel.org/r/[email protected] Fixes: 414fd080d125 ("mm/gup: fix gup_pmd_range() for dax") Signed-off-by: John Starks <[email protected]> Signed-off-by: Saurabh Sengar <[email protected]> Cc: Jan Kara <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: John Hubbard <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dan Williams <[email protected]> Cc: Alistair Popple <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-12-09mmap: fix do_brk_flags() modifying obviously incorrect VMAsLiam Howlett1-8/+3
Add more sanity checks to the VMA that do_brk_flags() will expand. Ensure the VMA matches basic merge requirements within the function before calling can_vma_merge_after(). Drop the duplicate checks from vm_brk_flags() since they will be enforced later. The old code would expand file VMAs on brk(), which is functionally wrong and also dangerous in terms of locking because the brk() path isn't designed for file VMAs and therefore doesn't lock the file mapping. Checking can_vma_merge_after() ensures that new anonymous VMAs can't be merged into file VMAs. See https://lore.kernel.org/linux-mm/CAG48ez1tJZTOjS_FjRZhvtDA-STFmdw8PEizPDwMGFd_ui0Nrw@mail.gmail.com/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 2e7ce7d354f2 ("mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()") Signed-off-by: Liam R. Howlett <[email protected]> Suggested-by: Jann Horn <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: SeongJae Park <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-12-09mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bitDavid Hildenbrand1-3/+5
We use "unsigned long" to store a PFN in the kernel and phys_addr_t to store a physical address. On a 64bit system, both are 64bit wide. However, on a 32bit system, the latter might be 64bit wide. This is, for example, the case on x86 with PAE: phys_addr_t and PTEs are 64bit wide, while "unsigned long" only spans 32bit. The current definition of SWP_PFN_BITS without MAX_PHYSMEM_BITS misses that case, and assumes that the maximum PFN is limited by an 32bit phys_addr_t. This implies, that SWP_PFN_BITS will currently only be able to cover 4 GiB - 1 on any 32bit system with 4k page size, which is wrong. Let's rely on the number of bits in phys_addr_t instead, but make sure to not exceed the maximum swap offset, to not make the BUILD_BUG_ON() in is_pfn_swap_entry() unhappy. Note that swp_entry_t is effectively an unsigned long and the maximum swap offset shares that value with the swap type. For example, on an 8 GiB x86 PAE system with a kernel config based on Debian 11.5 (-> CONFIG_FLATMEM=y, CONFIG_X86_PAE=y), we will currently fail removing migration entries (remove_migration_ptes()), because mm/page_vma_mapped.c:check_pte() will fail to identify a PFN match as swp_offset_pfn() wrongly masks off PFN bits. For example, split_huge_page_to_list()->...->remap_page() will leave migration entries in place and continue to unlock the page. Later, when we stumble over these migration entries (e.g., via /proc/self/pagemap), pfn_swap_entry_to_page() will BUG_ON() because these migration entries shouldn't exist anymore and the page was unlocked. [ 33.067591] kernel BUG at include/linux/swapops.h:497! [ 33.067597] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 33.067602] CPU: 3 PID: 742 Comm: cow Tainted: G E 6.1.0-rc8+ #16 [ 33.067605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 [ 33.067606] EIP: pagemap_pmd_range+0x644/0x650 [ 33.067612] Code: 00 00 00 00 66 90 89 ce b9 00 f0 ff ff e9 ff fb ff ff 89 d8 31 db e8 48 c6 52 00 e9 23 fb ff ff e8 61 83 56 00 e9 b6 fe ff ff <0f> 0b bf 00 f0 ff ff e9 38 fa ff ff 3e 8d 74 26 00 55 89 e5 57 31 [ 33.067615] EAX: ee394000 EBX: 00000002 ECX: ee394000 EDX: 00000000 [ 33.067617] ESI: c1b0ded4 EDI: 00024a00 EBP: c1b0ddb4 ESP: c1b0dd68 [ 33.067619] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246 [ 33.067624] CR0: 80050033 CR2: b7a00000 CR3: 01bbbd20 CR4: 00350ef0 [ 33.067625] Call Trace: [ 33.067628] ? madvise_free_pte_range+0x720/0x720 [ 33.067632] ? smaps_pte_range+0x4b0/0x4b0 [ 33.067634] walk_pgd_range+0x325/0x720 [ 33.067637] ? mt_find+0x1d6/0x3a0 [ 33.067641] ? mt_find+0x1d6/0x3a0 [ 33.067643] __walk_page_range+0x164/0x170 [ 33.067646] walk_page_range+0xf9/0x170 [ 33.067648] ? __kmem_cache_alloc_node+0x2a8/0x340 [ 33.067653] pagemap_read+0x124/0x280 [ 33.067658] ? default_llseek+0x101/0x160 [ 33.067662] ? smaps_account+0x1d0/0x1d0 [ 33.067664] vfs_read+0x90/0x290 [ 33.067667] ? do_madvise.part.0+0x24b/0x390 [ 33.067669] ? debug_smp_processor_id+0x12/0x20 [ 33.067673] ksys_pread64+0x58/0x90 [ 33.067675] __ia32_sys_ia32_pread64+0x1b/0x20 [ 33.067680] __do_fast_syscall_32+0x4c/0xc0 [ 33.067683] do_fast_syscall_32+0x29/0x60 [ 33.067686] do_SYSENTER_32+0x15/0x20 [ 33.067689] entry_SYSENTER_32+0x98/0xf1 Decrease the indentation level of SWP_PFN_BITS and SWP_PFN_MASK to keep it readable and consistent. [[email protected]: rely on sizeof(phys_addr_t) and min_t() instead] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: use "int" for comparison, as we're only comparing numbers < 64] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry") Signed-off-by: David Hildenbrand <[email protected]> Acked-by: Peter Xu <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-12-09tmpfs: fix data loss from failed fallocateHugh Dickins1-0/+11
Fix tmpfs data loss when the fallocate system call is interrupted by a signal, or fails for some other reason. The partial folio handling in shmem_undo_range() forgot to consider this unfalloc case, and was liable to erase or truncate out data which had already been committed earlier. It turns out that none of the partial folio handling there is appropriate for the unfalloc case, which just wants to proceed to removal of whole folios: which find_get_entries() provides, even when partially covered. Original patch by Rui Wang. Link: https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: b9a8a4195c7d ("truncate,shmem: Handle truncates that split large folios") Signed-off-by: Hugh Dickins <[email protected]> Reported-by: Guoqi Chen <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Cc: Rui Wang <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Vishal Moola (Oracle) <[email protected]> Cc: <[email protected]> [5.17+] Signed-off-by: Andrew Morton <[email protected]>
2022-12-09kselftests: cgroup: update kmem test precision toleranceMichal Hocko1-3/+3
1813e51eece0 ("memcg: increase MEMCG_CHARGE_BATCH to 64") has changed the batch size while this test case has been left behind. This has led to a test failure reported by test bot: not ok 2 selftests: cgroup: test_kmem # exit=1 Update the tolerance for the pcp charges to reflect the MEMCG_CHARGE_BATCH change to fix this. [[email protected]: update comments, per Roman] Link: https://lkml.kernel.org/r/[email protected] Fixes: 1813e51eece0a ("memcg: increase MEMCG_CHARGE_BATCH to 64") Signed-off-by: Michal Hocko <[email protected]> Reported-by: kernel test robot <[email protected]> Link: https://lore.kernel.org/oe-lkp/[email protected] Acked-by: Shakeel Butt <[email protected]> Acked-by: Roman Gushchin <[email protected]> Tested-by: Yujie Liu <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Feng Tang <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: "Michal Koutný" <[email protected]> Cc: Muchun Song <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-12-09mm: do not BUG_ON missing brk mapping, because userspace can unmap itJason A. Donenfeld1-2/+1
The following program will trigger the BUG_ON that this patch removes, because the user can munmap() mm->brk: #include <sys/syscall.h> #include <sys/mman.h> #include <assert.h> #include <unistd.h> static void *brk_now(void) { return (void *)syscall(SYS_brk, 0); } static void brk_set(void *b) { assert(syscall(SYS_brk, b) != -1); } int main(int argc, char *argv[]) { void *b = brk_now(); brk_set(b + 4096); assert(munmap(b - 4096, 4096 * 2) == 0); brk_set(b); return 0; } Compile that with musl, since glibc actually uses brk(), and then execute it, and it'll hit this splat: kernel BUG at mm/mmap.c:229! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 12 PID: 1379 Comm: a.out Tainted: G S U 6.1.0-rc7+ #419 RIP: 0010:__do_sys_brk+0x2fc/0x340 Code: 00 00 4c 89 ef e8 04 d3 fe ff eb 9a be 01 00 00 00 4c 89 ff e8 35 e0 fe ff e9 6e ff ff ff 4d 89 a7 20> RSP: 0018:ffff888140bc7eb0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000007e7000 RCX: ffff8881020fe000 RDX: ffff8881020fe001 RSI: ffff8881955c9b00 RDI: ffff8881955c9b08 RBP: 0000000000000000 R08: ffff8881955c9b00 R09: 00007ffc77844000 R10: 0000000000000000 R11: 0000000000000001 R12: 00000000007e8000 R13: 00000000007e8000 R14: 00000000007e7000 R15: ffff8881020fe000 FS: 0000000000604298(0000) GS:ffff88901f700000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000603fe0 CR3: 000000015ba9a005 CR4: 0000000000770ee0 PKRU: 55555554 Call Trace: <TASK> do_syscall_64+0x2b/0x50 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x400678 Code: 10 4c 8d 41 08 4c 89 44 24 10 4c 8b 01 8b 4c 24 08 83 f9 2f 77 0a 4c 8d 4c 24 20 4c 01 c9 eb 05 48 8b> RSP: 002b:00007ffc77863890 EFLAGS: 00000212 ORIG_RAX: 000000000000000c RAX: ffffffffffffffda RBX: 000000000040031b RCX: 0000000000400678 RDX: 00000000004006a1 RSI: 00000000007e6000 RDI: 00000000007e7000 RBP: 00007ffc77863900 R08: 0000000000000000 R09: 00000000007e6000 R10: 00007ffc77863930 R11: 0000000000000212 R12: 00007ffc77863978 R13: 00007ffc77863988 R14: 0000000000000000 R15: 0000000000000000 </TASK> Instead, just return the old brk value if the original mapping has been removed. [[email protected]: fix changelog, per Liam] Link: https://lkml.kernel.org/r/[email protected] Fixes: 2e7ce7d354f2 ("mm/mmap: change do_brk_flags() to expand existing VMA and add do_brk_munmap()") Signed-off-by: Jason A. Donenfeld <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Liam R. Howlett <[email protected]> Reviewed-by: SeongJae Park <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: David Howells <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Sven Schnelle <[email protected]> Cc: Will Deacon <[email protected]> Cc: Jann Horn <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-12-09mailmap: update Matti Vaittinen's email addressMatti Vaittinen1-0/+1
The email backend used by ROHM keeps labeling patches as spam. This can result in missing the patches. Switch my mail address from a company mail to a personal one. Link: https://lkml.kernel.org/r/8f4498b66fedcbded37b3b87e0c516e659f8f583.1669912977.git.mazziesaccount@gmail.com Signed-off-by: Matti Vaittinen <[email protected]> Suggested-by: Krzysztof Kozlowski <[email protected]> Cc: Anup Patel <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Atish Patra <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Ben Widawsky <[email protected]> Cc: Bjorn Andersson <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Qais Yousef <[email protected]> Cc: Vasily Averin <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-12-09Documentation/rv: Add verification/rv man pagesDaniel Bristot de Oliveira11-2/+386
Add man pages for the rv command line, using the same scheme we used in rtla. Link: https://lkml.kernel.org/r/e841d7cfbdfc3ebdaf7cbd40278571940145d829.1668180100.git.bristot@kernel.org Cc: Jonathan Corbet <[email protected]> Signed-off-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-12-09tools/rv: Add in-kernel monitor interfaceDaniel Bristot de Oliveira3-0/+704
Add the ability to control and trace in-kernel monitors. This is a generic interface, it will check for existing monitors and enable standard setup, like enabling reactors. For example: # rv list wip wakeup in preemptive per-cpu testing monitor. [OFF] wwnr wakeup while not running per-task testing model. [OFF] # rv mon wwnr --help rv version 6.1.0-rc4: help usage: rv mon wwnr [-h] [-q] [-r reactor] [-s] [-v] -h/--help: print this menu and the reactor list -r/--reactor 'reactor': enables the 'reactor' -s/--self: when tracing (-t), also trace rv command -t/--trace: trace monitor's event -v/--verbose: print debug messages available reactors: nop printk panic # rv mon wwnr --trace <TASK>-PID [CPU] TYPE ID STATE x EVENT -> NEXT_STATE FINAL | | | | | | | | | rv-3613 [001] event 3613 running x switch_out -> not_running Y sshd-1248 [005] event 1248 running x switch_out -> not_running Y <idle>-0 [005] event 71 not_running x wakeup -> not_running Y <idle>-0 [005] event 71 not_running x switch_in -> running N kcompactd0-71 [005] event 71 running x switch_out -> not_running Y <idle>-0 [000] event 860 not_running x wakeup -> not_running Y <idle>-0 [000] event 860 not_running x switch_in -> running N systemd-oomd-860 [000] event 860 running x switch_out -> not_running Y <idle>-0 [000] event 860 not_running x wakeup -> not_running Y <idle>-0 [000] event 860 not_running x switch_in -> running N systemd-oomd-860 [000] event 860 running x switch_out -> not_running Y <idle>-0 [005] event 71 not_running x wakeup -> not_running Y <idle>-0 [005] event 71 not_running x switch_in -> running N kcompactd0-71 [005] event 71 running x switch_out -> not_running Y <idle>-0 [000] event 860 not_running x wakeup -> not_running Y <idle>-0 [000] event 860 not_running x switch_in -> running N systemd-oomd-860 [000] event 860 running x switch_out -> not_running Y <idle>-0 [001] event 3613 not_running x wakeup -> not_running Y Link: https://lkml.kernel.org/r/1e57547e3acadda6e23949b2672c89e76ec2ec42.1668180100.git.bristot@kernel.org Cc: Jonathan Corbet <[email protected]> Signed-off-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-12-09rv: Add rv toolDaniel Bristot de Oliveira8-0/+558
This is the (user-space) runtime verification tool, named rv. This tool aims to be the interface for in-kernel rv monitors, as well as the home for monitors in user-space (online asynchronous), and in *eBPF. The tool receives a command as the first argument, the current commands are: list - list all available monitors mon - run a given monitor Each monitor is an independent piece of software inside the tool and can have their own arguments. There is no monitor implemented in this patch, it only adds the basic structure of the tool, based on rtla. # rv --help rv version 6.1.0-rc4: help usage: rv command [-h] [command_options] -h/--help: print this menu command: run one of the following command: list: list all available monitors mon: run a monitor [command options]: each command has its own set of options run rv command -h for further information *dot2bpf is the next patch set, depends on this, doing cleanups. Link: https://lkml.kernel.org/r/fb51184f3b95aea0d7bfdc33ec09f4153aee84fa.1668180100.git.bristot@kernel.org Cc: Jonathan Corbet <[email protected]> Signed-off-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-12-09rtla: Fix exit status when returning from calls to usage()John Kacur3-17/+13
rtla_usage(), osnoise_usage() and timerlat_usage() all exit with an error status. However when these are called from help, they should exit with a non-error status. Fix this by passing the exit status to the functions. Note, although we remove the subsequent call to exit after calling usage, we leave it in at the end of a function to suppress the compiler warning "control reaches end of a non-void function". Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: John Kacur <[email protected]> Acked-by: Daniel Bristot de Oliveira <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-12-09MIPS: OCTEON: warn only once if deprecated link status is being usedLadislav Michl2-2/+2
Avoid flooding kernel log with warnings. Fixes: 2c0756d306c2 ("MIPS: OCTEON: warn if deprecated link status is being used") Signed-off-by: Ladislav Michl <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2022-12-09MIPS: BCM63xx: Add check for NULL for clk in clk_enableAnastasia Belova1-0/+2
Check clk for NULL before calling clk_enable_unlocked where clk is dereferenced. There is such check in other implementations of clk_enable. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: e7300d04bd08 ("MIPS: BCM63xx: Add support for the Broadcom BCM63xx family of SOCs.") Signed-off-by: Anastasia Belova <[email protected]> Reviewed-by: Philippe Mathieu-Daudé <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2022-12-09Merge tag 'media/v6.1-4' of ↵Linus Torvalds1-6/+14
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fix from Mauro Carvalho Chehab: "A v4l-core fix related to validating DV timings related to video blanking values" * tag 'media/v6.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: v4l2-dv-timings.c: fix too strict blanking sanity checks
2022-12-09Merge tag 'soc-fixes-6.1-6' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fix from Arnd Bergmann: "One more last minute revert for a boot regression that was found on the popular colibri-imx7" * tag 'soc-fixes-6.1-6' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: Revert "ARM: dts: imx7: Fix NAND controller size-cells"
2022-12-09regmap-irq: Add handle_mask_sync() callbackWilliam Breathitt Gray2-13/+36
Provide a public callback handle_mask_sync() that drivers can use when they have more complex IRQ masking logic. The default implementation is regmap_irq_handle_mask_sync(), used if the chip doesn't provide its own callback. Cc: Mark Brown <[email protected]> Signed-off-by: William Breathitt Gray <[email protected]> Link: https://lore.kernel.org/r/e083474b3d467a86e6cb53da8072de4515bd6276.1669100542.git.william.gray@linaro.org Signed-off-by: Mark Brown <[email protected]>
2022-12-09lsm: Fix description of fs_context_parse_paramRoberto Sassu1-3/+0
The fs_context_parse_param hook already has a description, which seems the right one according to the code. Fixes: 8eb687bc8069 ("lsm: Add/fix return values in lsm_hooks.h and fix formatting") Signed-off-by: Roberto Sassu <[email protected]> Signed-off-by: Paul Moore <[email protected]>
2022-12-09Merge tag 'timers-v6.2-rc1' of ↵Thomas Gleixner9-9/+50
https://git.linaro.org/people/daniel.lezcano/linux into timers/core Pull clockevent/source driver updates from Daniel Lezcano: - Add DT bindings for the Rockchip rk3128 timer (Johan Jonker) - Change the DT bindings for the npcm7xx timer in order to specify multiple clocks and enable the clock for the timer1 on WPCM450 (Jonathan Neuschäfer) - Fix the timer duration being too long the ARM architected timer in order to prevent an integer overflow leading to a negative value and an immediate interruption (Joe Korty) - Fix an unused pointer warning reported by lkp and some cleanups in the timer TI dm (Tony Lindgren) - Fix a missing call to clk_disable_unprepare() in the error path at init time on the timer TI dm (Yang Yingliang) - Use kstrtobool() instead of strtobool() in the ARM architected timer (Christophe JAILLET) - Add DT bindings for r8a779g0 on Renesas platform (Wolfram Sang) Link: https://lore.kernel.org/all/[email protected]
2022-12-09x86/vdso: Conditionally export __vdso_sgx_enter_enclave()Nathan Chancellor1-0/+2
Recently, ld.lld moved from '--undefined-version' to '--no-undefined-version' as the default, which breaks building the vDSO when CONFIG_X86_SGX is not set: ld.lld: error: version script assignment of 'LINUX_2.6' to symbol '__vdso_sgx_enter_enclave' failed: symbol not defined __vdso_sgx_enter_enclave is only included in the vDSO when CONFIG_X86_SGX is set. Only export it if it will be present in the final object, which clears up the error. Fixes: 8466436952017 ("x86/vdso: Implement a vDSO for Intel SGX enclave call") Signed-off-by: Nathan Chancellor <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Link: https://github.com/ClangBuiltLinux/linux/issues/1756 Link: https://lore.kernel.org/r/[email protected]
2022-12-09udf: Fix extending file within last blockJan Kara1-15/+17
When extending file within last block it can happen that the extent is already rounded to the blocksize and thus contains the offset we want to grow up to. In such case we would mistakenly expand the last extent and make it one block longer than it should be, exposing unallocated block in a file and causing data corruption. Fix the problem by properly detecting this case and bailing out. CC: [email protected] Signed-off-by: Jan Kara <[email protected]>
2022-12-09udf: Discard preallocation before extending file with a holeJan Kara1-28/+18
When extending file with a hole, we tried to preserve existing preallocation for the file. However that is not very useful and complicates code because the previous extent may need to be rounded to block boundary as well (which we forgot to do thus causing data corruption for sequence like: xfs_io -f -c "pwrite 0x75e63 11008" -c "truncate 0x7b24b" \ -c "truncate 0xabaa3" -c "pwrite 0xac70b 22954" \ -c "pwrite 0x93a43 11358" -c "pwrite 0xb8e65 52211" file with 512-byte block size. Just discard preallocation before extending file to simplify things and also fix this data corruption. CC: [email protected] Signed-off-by: Jan Kara <[email protected]>
2022-12-09udf: Do not bother looking for prealloc extents if i_lenExtents matches i_sizeJan Kara1-1/+2
If rounded block-rounded i_lenExtents matches block rounded i_size, there are no preallocation extents. Do not bother walking extent linked list. CC: [email protected] Signed-off-by: Jan Kara <[email protected]>