blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2021-10-20	powerpc/idle: Don't corrupt back chain when going idle	Michael Ellerman	1	-4/+6
	In isa206_idle_insn_mayloss() we store various registers into the stack red zone, which is allowed. However inside the IDLE_STATE_ENTER_SEQ_NORET macro we save r2 again, to 0(r1), which corrupts the stack back chain. We used to do the same in isa206_idle_insn_mayloss() itself, but we fixed that in 73287caa9210 ("powerpc64/idle: Fix SP offsets when saving GPRs"), however we missed that the macro also corrupts the back chain. Corrupting the back chain is bad for debuggability but doesn't necessarily cause a bug. However we recently changed the stack handling in some KVM code, and it now relies on the stack back chain being valid when it returns. The corruption causes that code to return with r1 pointing somewhere in kernel data, at some point LR is restored from the stack and we branch to NULL or somewhere else invalid. Only affects Power8 hosts running KVM guests, with dynamic_mt_modes enabled (which it is by default). The fixes tag below points to the commit that changed the KVM stack handling, exposing this bug. The actual corruption of the back chain has always existed since 948cf67c4726 ("powerpc: Add NAP mode support on Power7 in HV mode"). Fixes: 9b4416c5095c ("KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()") Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-20	vrf: Revert "Reset skb conntrack connection..."	Eugene Crosser	1	-4/+0
	This reverts commit 09e856d54bda5f288ef8437a90ab2b9b3eab83d1. When an interface is enslaved in a VRF, prerouting conntrack hook is called twice: once in the context of the original input interface, and once in the context of the VRF interface. If no special precausions are taken, this leads to creation of two conntrack entries instead of one, and breaks SNAT. Commit above was intended to avoid creation of extra conntrack entries when input interface is enslaved in a VRF. It did so by resetting conntrack related data associated with the skb when it enters VRF context. However it breaks netfilter operation. Imagine a use case when conntrack zone must be assigned based on the original input interface, rather than VRF interface (that would make original interfaces indistinguishable). One could create netfilter rules similar to these: chain rawprerouting { type filter hook prerouting priority raw; iif realiface1 ct zone set 1 return iif realiface2 ct zone set 2 return } This works before the mentioned commit, but not after: zone assignment is "forgotten", and any subsequent NAT or filtering that is dependent on the conntrack zone does not work. Here is a reproducer script that demonstrates the difference in behaviour. ========== #!/bin/sh # This script demonstrates unexpected change of nftables behaviour # caused by commit 09e856d54bda5f28 ""vrf: Reset skb conntrack # connection on VRF rcv" # https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=09e856d54bda5f288ef8437a90ab2b9b3eab83d1 # # Before the commit, it was possible to assign conntrack zone to a # packet (or mark it for `notracking`) in the prerouting chanin, raw # priority, based on the `iif` (interface from which the packet # arrived). # After the change, # if the interface is enslaved in a VRF, such # assignment is lost. Instead, assignment based on the `iif` matching # the VRF master interface is honored. Thus it is impossible to # distinguish packets based on the original interface. # # This script demonstrates this change of behaviour: conntrack zone 1 # or 2 is assigned depending on the match with the original interface # or the vrf master interface. It can be observed that conntrack entry # appears in different zone in the kernel versions before and after # the commit. IPIN=172.30.30.1 IPOUT=172.30.30.2 PFXL=30 ip li sh vein >/dev/null 2>&1 && ip li del vein ip li sh tvrf >/dev/null 2>&1 && ip li del tvrf nft list table testct >/dev/null 2>&1 && nft delete table testct ip li add vein type veth peer veout ip li add tvrf type vrf table 9876 ip li set veout master tvrf ip li set vein up ip li set veout up ip li set tvrf up /sbin/sysctl -w net.ipv4.conf.veout.accept_local=1 /sbin/sysctl -w net.ipv4.conf.veout.rp_filter=0 ip addr add $IPIN/$PFXL dev vein ip addr add $IPOUT/$PFXL dev veout nft -f - <<__END__ table testct { chain rawpre { type filter hook prerouting priority raw; iif { veout, tvrf } meta nftrace set 1 iif veout ct zone set 1 return iif tvrf ct zone set 2 return notrack } chain rawout { type filter hook output priority raw; notrack } } __END__ uname -rv conntrack -F ping -W 1 -c 1 -I vein $IPOUT conntrack -L Signed-off-by: Eugene Crosser <[email protected]> Acked-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-10-20	ksmbd: add buffer validation in session setup	Marios Makassikis	2	-27/+40
	Make sure the security buffer's length/offset are valid with regards to the packet length. Acked-by: Namjae Jeon <[email protected]> Signed-off-by: Marios Makassikis <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-10-20	ksmbd: throttle session setup failures to avoid dictionary attacks	Namjae Jeon	6	-6/+31
	To avoid dictionary attacks (repeated session setups rapidly sent) to connect to server, ksmbd make a delay of a 5 seconds on session setup failure to make it harder to send enough random connection requests to break into a server if a user insert the wrong password 10 times in a row. Signed-off-by: Namjae Jeon <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-10-20	ksmbd: validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requests	Hyunchul Lee	1	-16/+52
	Validate OutputBufferLength of QUERY_DIR, QUERY_INFO, IOCTL requests and check the free size of response buffer for these requests. Acked-by: Namjae Jeon <[email protected]> Signed-off-by: Hyunchul Lee <[email protected]> Signed-off-by: Steve French <[email protected]>
2021-10-19	io-wq: max_worker fixes	Pavel Begunkov	1	-2/+5
	First, fix nr_workers checks against max_workers, with max_worker registration, it may pretty easily happen that nr_workers > max_workers. Also, synchronise writing to acct->max_worker with wqe->lock. It's not an actual problem, but as we don't care about io_wqe_create_worker(), it's better than WRITE_ONCE()/READ_ONCE(). Fixes: 2e480058ddc2 ("io-wq: provide a way to limit max number of workers") Reported-by: Beld Zhang <[email protected]> Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/11f90e6b49410b7d1a88f5d04fb8d95bb86b8cf3.1634671835.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2021-10-19	net: dsa: Fix an error handling path in 'dsa_switch_parse_ports_of()'	Christophe JAILLET	1	-2/+7
	If we return before the end of the 'for_each_child_of_node()' iterator, the reference taken on 'port' must be released. Add the missing 'of_node_put()' calls. Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation") Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/r/15d5310d1d55ad51c1af80775865306d92432e03.1634587046.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <[email protected]>
2021-10-19	ACPI: PM: Do not turn off power resources in unknown state	Rafael J. Wysocki	1	-6/+1
	Commit 6381195ad7d0 ("ACPI: power: Rework turning off unused power resources") caused power resources in unknown state with reference counters equal to zero to be turned off too, but that caused issues to appear in the field, so modify the code to only turn off power resources that are known to be "on". Link: https://lore.kernel.org/linux-acpi/[email protected]/ Fixes: 6381195ad7d0 ("ACPI: power: Rework turning off unused power resources") Reported-by: Andreas K. Huettel <[email protected]> Tested-by: Andreas K. Huettel <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]> Cc: 5.14+ <[email protected]> # 5.14+
2021-10-19	ucounts: Proper error handling in set_cred_ucounts	Eric W. Biederman	1	-2/+3
	Instead of leaking the ucounts in new if alloc_ucounts fails, store the result of alloc_ucounts into a temporary variable, which is later assigned to new->ucounts. Cc: [email protected] Fixes: 905ae01c4ae2 ("Add a reference to ucounts for each cred") Link: https://lkml.kernel.org/r/87pms2s0v8.fsf_-_@disp2133 Tested-by: Yu Zhao <[email protected]> Reviewed-by: Alexey Gladkov <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2021-10-19	ucounts: Pair inc_rlimit_ucounts with dec_rlimit_ucoutns in commit_creds	Eric W. Biederman	1	-1/+1
	The purpose of inc_rlimit_ucounts and dec_rlimit_ucounts in commit_creds is to change which rlimit counter is used to track a process when the credentials changes. Use the same test for both to guarantee the tracking is correct. Cc: [email protected] Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") Link: https://lkml.kernel.org/r/87v91us0w4.fsf_-_@disp2133 Tested-by: Yu Zhao <[email protected]> Reviewed-by: Alexey Gladkov <[email protected]> Signed-off-by: "Eric W. Biederman" <[email protected]>
2021-10-19	sched/scs: Reset the shadow stack when idle_task_exit	Woody Lin	1	-0/+1
	Commit f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled") removed the init_idle() call from idle_thread_get(). This was the sole call-path on hotplug that resets the Shadow Call Stack (scs) Stack Pointer (sp). Not resetting the scs-sp leads to scs overflow after enough hotplug cycles. Therefore add an explicit scs_task_reset() to the hotplug code to make sure the scs-sp does get reset on hotplug. Fixes: f1a0a376ca0c ("sched/core: Initialize the idle task with preemption disabled") Signed-off-by: Woody Lin <[email protected]> [peterz: Changelog] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2021-10-19	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	17	-99/+138
	Merge misc fixes from Andrew Morton: "19 patches. Subsystems affected by this patch series: mm (userfaultfd, migration, memblock, mempolicy, slub, secretmem, and thp), ocfs2, binfmt, vfs, and misc" * emailed patches from Andrew Morton <[email protected]>: mailmap: add Andrej Shadura mm/thp: decrease nr_thps in file's mapping on THP split mm/secretmem: fix NULL page->mapping dereference in page_is_secretmem() vfs: check fd has read access in kernel_read_file_from_fd() elfcore: correct reference to CONFIG_UML mm, slub: fix incorrect memcg slab count for bulk free mm, slub: fix potential use-after-free in slab_debugfs_fops mm, slub: fix potential memoryleak in kmem_cache_open() mm, slub: fix mismatch between reconstructed freelist depth and cnt mm, slub: fix two bugs in slab_debug_trace_open() mm/mempolicy: do not allow illegal MPOL_F_NUMA_BALANCING \| MPOL_LOCAL in mbind() memblock: check memory total_size ocfs2: mount fails with buffer overflow in strlen ocfs2: fix data corruption after conversion from inline format mm/migrate: fix CPUHP state to update node demotion order mm/migrate: add CPU hotplug to demotion #ifdef mm/migrate: optimize hotplug-time demotion order updates userfaultfd: fix a race between writeprotect and exit_mmap() mm/userfaultfd: selftests: fix memory corruption with thp enabled
2021-10-19	Merge tag 'linux-can-fixes-for-5.15-20211019' of ↵	David S. Miller	1	-0/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can Marc Kleine-Budde says: ==================== pull-request: can 2021-10-19 this is a pull request of a single patch for net/master. The patch is by me and fixes the error handling in case of a FC timeout in the TX path of the ISOTOP CAN protocol. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-10-19	cavium: Fix return values of the probe function	Zheyu Ma	1	-2/+2
	During the process of driver probing, the probe function should return < 0 for failure, otherwise, the kernel will treat value > 0 as success. Signed-off-by: Zheyu Ma <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-10-19	mISDN: Fix return values of the probe function	Zheyu Ma	1	-4/+4
	During the process of driver probing, the probe function should return < 0 for failure, otherwise, the kernel will treat value > 0 as success. Signed-off-by: Zheyu Ma <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-10-19	ARM: 9141/1: only warn about XIP address when not compile testing	Arnd Bergmann	1	-1/+1
	In randconfig builds, we sometimes come across this warning: arm-linux-gnueabi-ld: XIP start address may cause MPU programming issues While this is helpful for actual systems to figure out why it fails, the warning does not provide any benefit for build testing, so guard it in a check for CONFIG_COMPILE_TEST, which is usually set on randconfig builds. Fixes: 216218308cfb ("ARM: 8713/1: NOMMU: Support MPU in XIP configuration") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2021-10-19	ARM: 9139/1: kprobes: fix arch_init_kprobes() prototype	Arnd Bergmann	1	-1/+1
	With extra warnings enabled, gcc complains about this function definition: arch/arm/probes/kprobes/core.c: In function 'arch_init_kprobes': arch/arm/probes/kprobes/core.c:465:12: warning: old-style function definition [-Wold-style-definition] 465 \| int __init arch_init_kprobes() Link: https://lore.kernel.org/all/[email protected]/ Fixes: 24ba613c9d6c ("ARM kprobes: core code") Acked-by: Masami Hiramatsu <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2021-10-19	ARM: 9138/1: fix link warning with XIP + frame-pointer	Arnd Bergmann	1	-0/+4
	When frame pointers are used instead of the ARM unwinder, and the kernel is built using clang with an external assembler and CONFIG_XIP_KERNEL, every file produces two warnings like: arm-linux-gnueabi-ld: warning: orphan section `.ARM.extab' from `net/mac802154/util.o' being placed in section `.ARM.extab' arm-linux-gnueabi-ld: warning: orphan section `.ARM.exidx' from `net/mac802154/util.o' being placed in section `.ARM.exidx' The same fix was already merged for the normal (non-XIP) linker script, with a longer description. Fixes: c39866f268f8 ("arm/build: Always handle .ARM.exidx and .ARM.extab sections") Reviewed-by: Kees Cook <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2021-10-19	ARM: 9134/1: remove duplicate memcpy() definition	Arnd Bergmann	1	-0/+3
	Both the decompressor code and the kasan logic try to override the memcpy() and memmove() definitions, which leading to a clash in a KASAN-enabled kernel with XZ decompression: arch/arm/boot/compressed/decompress.c:50:9: error: 'memmove' macro redefined [-Werror,-Wmacro-redefined] #define memmove memmove ^ arch/arm/include/asm/string.h:59:9: note: previous definition is here #define memmove(dst, src, len) __memmove(dst, src, len) ^ arch/arm/boot/compressed/decompress.c:51:9: error: 'memcpy' macro redefined [-Werror,-Wmacro-redefined] #define memcpy memcpy ^ arch/arm/include/asm/string.h:58:9: note: previous definition is here #define memcpy(dst, src, len) __memcpy(dst, src, len) ^ Here we want the set of functions from the decompressor, so undefine the other macros before the override. Link: https://lore.kernel.org/linux-arm-kernel/CACRpkdZYJogU_SN3H9oeVq=zJkRgRT1gDz3xp59gdqWXxw-B=w@mail.gmail.com/ Link: https://lore.kernel.org/lkml/[email protected]/ Fixes: d6d51a96c7d6 ("ARM: 9014/2: Replace string mem* functions for KASan") Fixes: a7f464f3db93 ("ARM: 7001/2: Wire up support for the XZ decompressor") Reported-by: kernel test robot <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2021-10-19	ARM: 9133/1: mm: proc-macros: ensure *_tlb_fns are 4B aligned	Nick Desaulniers	1	-0/+1
	A kernel built with CONFIG_THUMB2_KERNEL=y and using clang as the assembler could generate non-naturally-aligned v7wbi_tlb_fns which results in a boot failure. The original commit adding the macro missed the .align directive on this data. Link: https://github.com/ClangBuiltLinux/linux/issues/1447 Link: https://lore.kernel.org/all/[email protected]/ Debugged-by: Ard Biesheuvel <[email protected]> Debugged-by: Nathan Chancellor <[email protected]> Debugged-by: Richard Henderson <[email protected]> Fixes: 66a625a88174 ("ARM: mm: proc-macros: Add generic proc/cache/tlb struct definition macros") Suggested-by: Ard Biesheuvel <[email protected]> Acked-by: Ard Biesheuvel <[email protected]> Signed-off-by: Nick Desaulniers <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2021-10-19	ARM: 9132/1: Fix __get_user_check failure with ARM KASAN images	Lexi Shao	1	-1/+3
	ARM: kasan: Fix __get_user_check failure with kasan In macro __get_user_check defined in arch/arm/include/asm/uaccess.h, error code is store in register int __e(r0). When kasan is enabled, assigning value to kernel address might trigger kasan check, which unexpectedly overwrites r0 and causes undefined behavior on arm kasan images. One example is failure in do_futex and results in process soft lockup. Log: watchdog: BUG: soft lockup - CPU#0 stuck for 62946ms! [rs:main Q:Reg:1151] ... (__asan_store4) from (futex_wait_setup+0xf8/0x2b4) (futex_wait_setup) from (futex_wait+0x138/0x394) (futex_wait) from (do_futex+0x164/0xe40) (do_futex) from (sys_futex_time32+0x178/0x230) (sys_futex_time32) from (ret_fast_syscall+0x0/0x50) The soft lockup happens in function futex_wait_setup. The reason is function get_futex_value_locked always return EINVAL, thus pc jump back to retry label and causes looping. This line in function get_futex_value_locked ret = __get_user(dest, from); is expanded to dest = (typeof((p))) __r2; , in macro __get_user_check. Writing to pointer dest triggers kasan check and overwrites the return value of __get_user_x function. The assembly code of get_futex_value_locked in kernel/futex.c: ... c01f6dc8: eb0b020e bl c04b7608 <__get_user_4> // "x = (typeof((p))) __r2;" triggers kasan check and r0 is overwritten c01f6dCc: e1a00007 mov r0, r7 c01f6dd0: e1a05002 mov r5, r2 c01f6dd4: eb04f1e6 bl c0333574 <__asan_store4> c01f6dd8: e5875000 str r5, [r7] // save ret value of __get_user(*dest, from), which is dest address now c01f6ddc: e1a05000 mov r5, r0 ... // checking return value of __get_user failed c01f6e00: e3550000 cmp r5, #0 ... c01f6e0c: 01a00005 moveq r0, r5 // assign return value to EINVAL c01f6e10: 13e0000d mvnne r0, #13 Return value is the destination address of get_user thus certainly non-zero, so get_futex_value_locked always return EINVAL. Fix it by using a tmp vairable to store the error code before the assignment. This fix has no effects to non-kasan images thanks to compiler optimization. It only affects cases that overwrite r0 due to kasan check. This should fix bug discussed in Link: [1] https://lore.kernel.org/linux-arm-kernel/[email protected]/ Fixes: 421015713b30 ("ARM: 9017/2: Enable KASan for ARM") Signed-off-by: Lexi Shao <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2021-10-19	ARM: 9125/1: fix incorrect use of get_kernel_nofault()	Ard Biesheuvel	1	-1/+1
	Commit 344179fc7ef4 ("ARM: 9106/1: traps: use get_kernel_nofault instead of set_fs()") replaced an occurrence of __get_user() with get_kernel_nofault(), but inverted the sense of the conditional in the process, resulting in no values to be printed at all. I.e., every exception stack now looks like this: Exception stack(0xc18d1fb0 to 0xc18d1ff8) 1fa0: ???????? ???????? ???????? ???????? 1fc0: ???????? ???????? ???????? ???????? ???????? ???????? ???????? ???????? 1fe0: ???????? ???????? ???????? ???????? ???????? ???????? which is rather unhelpful. Fixes: 344179fc7ef4 ("ARM: 9106/1: traps: use get_kernel_nofault instead of set_fs()") Signed-off-by: Ard Biesheuvel <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2021-10-19	ARM: 9122/1: select HAVE_FUTEX_CMPXCHG	Nick Desaulniers	1	-0/+1
	tglx notes: This function [futex_detect_cmpxchg] is only needed when an architecture has to runtime discover whether the CPU supports it or not. ARM has unconditional support for this, so the obvious thing to do is the below. Fixes linkage failure from Clang randconfigs: kernel/futex.o:(.text.fixup+0x5c): relocation truncated to fit: R_ARM_JUMP24 against `.init.text' and boot failures for CONFIG_THUMB2_KERNEL. Link: https://github.com/ClangBuiltLinux/linux/issues/325 Comments from Nick Desaulniers: See-also: 03b8c7b623c8 ("futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test") Reported-by: Arnd Bergmann <[email protected]> Reported-by: Nathan Chancellor <[email protected]> Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Nick Desaulniers <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Tested-by: Nathan Chancellor <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Cc: [email protected] # v3.14+ Reviewed-by: Arnd Bergmann <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]>
2021-10-19	ceph: fix handling of "meta" errors	Jeff Layton	5	-27/+8
	Currently, we check the wb_err too early for directories, before all of the unsafe child requests have been waited on. In order to fix that we need to check the mapping->wb_err later nearer to the end of ceph_fsync. We also have an overly-complex method for tracking errors after blocklisting. The errors recorded in cleanup_session_requests go to a completely separate field in the inode, but we end up reporting them the same way we would for any other error (in fsync). There's no real benefit to tracking these errors in two different places, since the only reporting mechanism for them is in fsync, and we'd need to advance them both every time. Given that, we can just remove i_meta_err, and convert the places that used it to instead just use mapping->wb_err instead. That also fixes the original problem by ensuring that we do a check_and_advance of the wb_err at the end of the fsync op. Cc: [email protected] URL: https://tracker.ceph.com/issues/52864 Reported-by: Patrick Donnelly <[email protected]> Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2021-10-19	ceph: skip existing superblocks that are blocklisted or shut down when mounting	Jeff Layton	1	-3/+14
	Currently when mounting, we may end up finding an existing superblock that corresponds to a blocklisted MDS client. This means that the new mount ends up being unusable. If we've found an existing superblock with a client that is already blocklisted, and the client is not configured to recover on its own, fail the match. Ditto if the superblock has been forcibly unmounted. While we're in here, also rename "other" to the more conventional "fsc". Cc: [email protected] URL: https://bugzilla.redhat.com/show_bug.cgi?id=1901499 Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Xiubo Li <[email protected]> Reviewed-by: Ilya Dryomov <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2021-10-19	can: isotp: isotp_sendmsg(): fix return error on FC timeout on TX path	Marc Kleine-Budde	1	-0/+3
	When the a large chunk of data send and the receiver does not send a Flow Control frame back in time, the sendmsg() does not return a error code, but the number of bytes sent corresponding to the size of the packet. If a timeout occurs the isotp_tx_timer_handler() is fired, sets sk->sk_err and calls the sk->sk_error_report() function. It was wrongly expected that the error would be propagated to user space in every case. For isotp_sendmsg() blocking on wait_event_interruptible() this is not the case. This patch fixes the problem by checking if sk->sk_err is set and returning the error to user space. Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://github.com/hartkopp/can-isotp/issues/42 Link: https://github.com/hartkopp/can-isotp/pull/43 Link: https://lore.kernel.org/all/[email protected] Cc: [email protected] Reported-by: Sottas Guillaume (LMB) <[email protected]> Tested-by: Oliver Hartkopp <[email protected]> Signed-off-by: Marc Kleine-Budde <[email protected]>
2021-10-18	mailmap: add Andrej Shadura	Andrej Shadura	1	-0/+2
	Add a mapping for my old work email for BelDisplayTech to the personal email, and make sure the Collabora email has the correct spelling of the first name. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Andrej Shadura <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	mm/thp: decrease nr_thps in file's mapping on THP split	Marek Szyprowski	1	-2/+4
	Decrease nr_thps counter in file's mapping to ensure that the page cache won't be dropped excessively on file write access if page has been already split. I've tried a test scenario running a big binary, kernel remaps it with THPs, then force a THP split with /sys/kernel/debug/split_huge_pages. During any further open of that binary with O_RDWR or O_WRITEONLY kernel drops page cache for it, because of non-zero thps counter. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Marek Szyprowski <[email protected]> Fixes: 09d91cda0e82 ("mm,thp: avoid writes to file with THP in pagecache") Fixes: 06d3eff62d9d ("mm/thp: fix node page state in split_huge_page_to_list()") Acked-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Yang Shi <[email protected]> Cc: <[email protected]> Cc: Song Liu <[email protected]> Cc: Rik van Riel <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: William Kucharski <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	mm/secretmem: fix NULL page->mapping dereference in page_is_secretmem()	Sean Christopherson	1	-1/+1
	Check for a NULL page->mapping before dereferencing the mapping in page_is_secretmem(), as the page's mapping can be nullified while gup() is running, e.g. by reclaim or truncation. BUG: kernel NULL pointer dereference, address: 0000000000000068 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 6 PID: 4173897 Comm: CPU 3/KVM Tainted: G W RIP: 0010:internal_get_user_pages_fast+0x621/0x9d0 Code: <48> 81 7a 68 80 08 04 bc 0f 85 21 ff ff 8 89 c7 be RSP: 0018:ffffaa90087679b0 EFLAGS: 00010046 RAX: ffffe3f37905b900 RBX: 00007f2dd561e000 RCX: ffffe3f37905b934 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffe3f37905b900 ... CR2: 0000000000000068 CR3: 00000004c5898003 CR4: 00000000001726e0 Call Trace: get_user_pages_fast_only+0x13/0x20 hva_to_pfn+0xa9/0x3e0 try_async_pf+0xa1/0x270 direct_page_fault+0x113/0xad0 kvm_mmu_page_fault+0x69/0x680 vmx_handle_exit+0xe1/0x5d0 kvm_arch_vcpu_ioctl_run+0xd81/0x1c70 kvm_vcpu_ioctl+0x267/0x670 __x64_sys_ioctl+0x83/0xa0 do_syscall_64+0x56/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Link: https://lkml.kernel.org/r/[email protected] Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas") Signed-off-by: Sean Christopherson <[email protected]> Reported-by: Darrick J. Wong <[email protected]> Reported-by: Stephen <[email protected]> Tested-by: Darrick J. Wong <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	vfs: check fd has read access in kernel_read_file_from_fd()	Matthew Wilcox (Oracle)	1	-1/+1
	If we open a file without read access and then pass the fd to a syscall whose implementation calls kernel_read_file_from_fd(), we get a warning from __kernel_read(): if (WARN_ON_ONCE(!(file->f_mode & FMODE_READ))) This currently affects both finit_module() and kexec_file_load(), but it could affect other syscalls in the future. Link: https://lkml.kernel.org/r/[email protected] Fixes: b844f0ecbc56 ("vfs: define kernel_copy_file_from_fd()") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Reported-by: Hao Sun <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Christian Brauner <[email protected]> Cc: Al Viro <[email protected]> Cc: Mimi Zohar <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	elfcore: correct reference to CONFIG_UML	Lukas Bulwahn	1	-1/+1
	Commit 6e7b64b9dd6d ("elfcore: fix building with clang") introduces special handling for two architectures, ia64 and User Mode Linux. However, the wrong name, i.e., CONFIG_UM, for the intended Kconfig symbol for User-Mode Linux was used. Although the directory for User Mode Linux is ./arch/um; the Kconfig symbol for this architecture is called CONFIG_UML. Luckily, ./scripts/checkkconfigsymbols.py warns on non-existing configs: UM Referencing files: include/linux/elfcore.h Similar symbols: UML, NUMA Correct the name of the config to the intended one. [[email protected]: fix um/x86_64, per Catalin] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 6e7b64b9dd6d ("elfcore: fix building with clang") Signed-off-by: Lukas Bulwahn <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Barret Rhoden <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	mm, slub: fix incorrect memcg slab count for bulk free	Miaohe Lin	1	-1/+3
	kmem_cache_free_bulk() will call memcg_slab_free_hook() for all objects when doing bulk free. So we shouldn't call memcg_slab_free_hook() again for bulk free to avoid incorrect memcg slab count. Link: https://lkml.kernel.org/r/[email protected] Fixes: d1b2cf6cb84a ("mm: memcg/slab: uncharge during kmem_cache_free_bulk()") Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Faiyaz Mohammed <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	mm, slub: fix potential use-after-free in slab_debugfs_fops	Miaohe Lin	1	-2/+4
	When sysfs_slab_add failed, we shouldn't call debugfs_slab_add() for s because s will be freed soon. And slab_debugfs_fops will use s later leading to a use-after-free. Link: https://lkml.kernel.org/r/[email protected] Fixes: 64dd68497be7 ("mm: slub: move sysfs slab alloc/free interfaces to debugfs") Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Faiyaz Mohammed <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	mm, slub: fix potential memoryleak in kmem_cache_open()	Miaohe Lin	1	-1/+1
	In error path, the random_seq of slub cache might be leaked. Fix this by using __kmem_cache_release() to release all the relevant resources. Link: https://lkml.kernel.org/r/[email protected] Fixes: 210e7a43fa90 ("mm: SLUB freelist randomization") Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Faiyaz Mohammed <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	mm, slub: fix mismatch between reconstructed freelist depth and cnt	Miaohe Lin	1	-2/+9
	If object's reuse is delayed, it will be excluded from the reconstructed freelist. But we forgot to adjust the cnt accordingly. So there will be a mismatch between reconstructed freelist depth and cnt. This will lead to free_debug_processing() complaining about freelist count or a incorrect slub inuse count. Link: https://lkml.kernel.org/r/[email protected] Fixes: c3895391df38 ("kasan, slub: fix handling of kasan_slab_free hook") Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Faiyaz Mohammed <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Kees Cook <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	mm, slub: fix two bugs in slab_debug_trace_open()	Miaohe Lin	1	-1/+7
	Patch series "Fixups for slub". This series contains various bug fixes for slub. We fix memoryleak, use-afer-free, NULL pointer dereferencing and so on in slub. More details can be found in the respective changelogs. This patch (of 5): It's possible that __seq_open_private() will return NULL. So we should check it before using lest dereferencing NULL pointer. And in error paths, we forgot to release private buffer via seq_release_private(). Memory will leak in these paths. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 64dd68497be7 ("mm: slub: move sysfs slab alloc/free interfaces to debugfs") Signed-off-by: Miaohe Lin <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Faiyaz Mohammed <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Kees Cook <[email protected]> Cc: Bharata B Rao <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	mm/mempolicy: do not allow illegal MPOL_F_NUMA_BALANCING \| MPOL_LOCAL in mbind()	Eric Dumazet	1	-11/+5
	syzbot reported access to unitialized memory in mbind() [1] Issue came with commit bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes") This commit added a new bit in MPOL_MODE_FLAGS, but only checked valid combination (MPOL_F_NUMA_BALANCING can only be used with MPOL_BIND) in do_set_mempolicy() This patch moves the check in sanitize_mpol_flags() so that it is also used by mbind() [1] BUG: KMSAN: uninit-value in __mpol_equal+0x567/0x590 mm/mempolicy.c:2260 __mpol_equal+0x567/0x590 mm/mempolicy.c:2260 mpol_equal include/linux/mempolicy.h:105 [inline] vma_merge+0x4a1/0x1e60 mm/mmap.c:1190 mbind_range+0xcc8/0x1e80 mm/mempolicy.c:811 do_mbind+0xf42/0x15f0 mm/mempolicy.c:1333 kernel_mbind mm/mempolicy.c:1483 [inline] __do_sys_mbind mm/mempolicy.c:1490 [inline] __se_sys_mbind+0x437/0xb80 mm/mempolicy.c:1486 __x64_sys_mbind+0x19d/0x200 mm/mempolicy.c:1486 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Uninit was created at: slab_alloc_node mm/slub.c:3221 [inline] slab_alloc mm/slub.c:3230 [inline] kmem_cache_alloc+0x751/0xff0 mm/slub.c:3235 mpol_new mm/mempolicy.c:293 [inline] do_mbind+0x912/0x15f0 mm/mempolicy.c:1289 kernel_mbind mm/mempolicy.c:1483 [inline] __do_sys_mbind mm/mempolicy.c:1490 [inline] __se_sys_mbind+0x437/0xb80 mm/mempolicy.c:1486 __x64_sys_mbind+0x19d/0x200 mm/mempolicy.c:1486 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae ===================================================== Kernel panic - not syncing: panic_on_kmsan set ... CPU: 0 PID: 15049 Comm: syz-executor.0 Tainted: G B 5.15.0-rc2-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1ff/0x28e lib/dump_stack.c:106 dump_stack+0x25/0x28 lib/dump_stack.c:113 panic+0x44f/0xdeb kernel/panic.c:232 kmsan_report+0x2ee/0x300 mm/kmsan/report.c:186 __msan_warning+0xd7/0x150 mm/kmsan/instrumentation.c:208 __mpol_equal+0x567/0x590 mm/mempolicy.c:2260 mpol_equal include/linux/mempolicy.h:105 [inline] vma_merge+0x4a1/0x1e60 mm/mmap.c:1190 mbind_range+0xcc8/0x1e80 mm/mempolicy.c:811 do_mbind+0xf42/0x15f0 mm/mempolicy.c:1333 kernel_mbind mm/mempolicy.c:1483 [inline] __do_sys_mbind mm/mempolicy.c:1490 [inline] __se_sys_mbind+0x437/0xb80 mm/mempolicy.c:1486 __x64_sys_mbind+0x19d/0x200 mm/mempolicy.c:1486 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x54/0xd0 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x44/0xae Link: https://lkml.kernel.org/r/[email protected] Fixes: bda420b98505 ("numa balancing: migrate on fault among multiple bound nodes") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	memblock: check memory total_size	Peng Fan	1	-1/+1
	mem=[X][G\|M] is broken on ARM64 platform, there are cases that even type.cnt is 1, but total_size is not 0 because regions are merged into 1. So only check 'cnt' is not enough, total_size should be used, othersize bootargs 'mem=[X][G\|B]' not work anymore. Link: https://lkml.kernel.org/r/[email protected] Fixes: e888fa7bb882 ("memblock: Check memory add/cap ordering") Signed-off-by: Peng Fan <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	ocfs2: mount fails with buffer overflow in strlen	Valentin Vidic	1	-4/+10
	Starting with kernel 5.11 built with CONFIG_FORTIFY_SOURCE mouting an ocfs2 filesystem with either o2cb or pcmk cluster stack fails with the trace below. Problem seems to be that strings for cluster stack and cluster name are not guaranteed to be null terminated in the disk representation, while strlcpy assumes that the source string is always null terminated. This causes a read outside of the source string triggering the buffer overflow detection. detected buffer overflow in strlen ------------[ cut here ]------------ kernel BUG at lib/string.c:1149! invalid opcode: 0000 [#1] SMP PTI CPU: 1 PID: 910 Comm: mount.ocfs2 Not tainted 5.14.0-1-amd64 #1 Debian 5.14.6-2 RIP: 0010:fortify_panic+0xf/0x11 ... Call Trace: ocfs2_initialize_super.isra.0.cold+0xc/0x18 [ocfs2] ocfs2_fill_super+0x359/0x19b0 [ocfs2] mount_bdev+0x185/0x1b0 legacy_get_tree+0x27/0x40 vfs_get_tree+0x25/0xb0 path_mount+0x454/0xa20 __x64_sys_mount+0x103/0x140 do_syscall_64+0x3b/0xc0 entry_SYSCALL_64_after_hwframe+0x44/0xae Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Valentin Vidic <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	ocfs2: fix data corruption after conversion from inline format	Jan Kara	1	-34/+12
	Commit 6dbf7bb55598 ("fs: Don't invalidate page buffers in block_write_full_page()") uncovered a latent bug in ocfs2 conversion from inline inode format to a normal inode format. The code in ocfs2_convert_inline_data_to_extents() attempts to zero out the whole cluster allocated for file data by grabbing, zeroing, and dirtying all pages covering this cluster. However these pages are beyond i_size, thus writeback code generally ignores these dirty pages and no blocks were ever actually zeroed on the disk. This oversight was fixed by commit 693c241a5f6a ("ocfs2: No need to zero pages past i_size.") for standard ocfs2 write path, inline conversion path was apparently forgotten; the commit log also has a reasoning why the zeroing actually is not needed. After commit 6dbf7bb55598, things became worse as writeback code stopped invalidating buffers on pages beyond i_size and thus these pages end up with clean PageDirty bit but with buffers attached to these pages being still dirty. So when a file is converted from inline format, then writeback triggers, and then the file is grown so that these pages become valid, the invalid dirtiness state is preserved, mark_buffer_dirty() does nothing on these pages (buffers are already dirty) but page is never written back because it is clean. So data written to these pages is lost once pages are reclaimed. Simple reproducer for the problem is: xfs_io -f -c "pwrite 0 2000" -c "pwrite 2000 2000" -c "fsync" \ -c "pwrite 4000 2000" ocfs2_file After unmounting and mounting the fs again, you can observe that end of 'ocfs2_file' has lost its contents. Fix the problem by not doing the pointless zeroing during conversion from inline format similarly as in the standard write path. [[email protected]: fix whitespace, per Joseph] Link: https://lkml.kernel.org/r/[email protected] Fixes: 6dbf7bb55598 ("fs: Don't invalidate page buffers in block_write_full_page()") Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Tested-by: Joseph Qi <[email protected]> Acked-by: Gang He <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Jun Piao <[email protected]> Cc: "Markov, Andrey" <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	mm/migrate: fix CPUHP state to update node demotion order	Huang Ying	2	-3/+9
	The node demotion order needs to be updated during CPU hotplug. Because whether a NUMA node has CPU may influence the demotion order. The update function should be called during CPU online/offline after the node_states[N_CPU] has been updated. That is done in CPUHP_AP_ONLINE_DYN during CPU online and in CPUHP_MM_VMSTAT_DEAD during CPU offline. But in commit 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events"), the function to update node demotion order is called in CPUHP_AP_ONLINE_DYN during CPU online/offline. This doesn't satisfy the order requirement. For example, there are 4 CPUs (P0, P1, P2, P3) in 2 sockets (P0, P1 in S0 and P2, P3 in S1), the demotion order is - S0 -> NUMA_NO_NODE - S1 -> NUMA_NO_NODE After P2 and P3 is offlined, because S1 has no CPU now, the demotion order should have been changed to - S0 -> S1 - S1 -> NO_NODE but it isn't changed, because the order updating callback for CPU hotplug doesn't see the new nodemask. After that, if P1 is offlined, the demotion order is changed to the expected order as above. So in this patch, we added CPUHP_AP_MM_DEMOTION_ONLINE and CPUHP_MM_DEMOTION_DEAD to be called after CPUHP_AP_ONLINE_DYN and CPUHP_MM_VMSTAT_DEAD during CPU online and offline, and register the update function on them. Link: https://lkml.kernel.org/r/[email protected] Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: "Huang, Ying" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Yang Shi <[email protected]> Cc: Zi Yan <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Wei Xu <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Keith Busch <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	mm/migrate: add CPU hotplug to demotion #ifdef	Dave Hansen	4	-27/+28
	Once upon a time, the node demotion updates were driven solely by memory hotplug events. But now, there are handlers for both CPU and memory hotplug. However, the #ifdef around the code checks only memory hotplug. A system that has HOTPLUG_CPU=y but MEMORY_HOTPLUG=n would miss CPU hotplug events. Update the #ifdef around the common code. Add memory and CPU-specific #ifdefs for their handlers. These memory/CPU #ifdefs avoid unused function warnings when their Kconfig option is off. [[email protected]: rework hotplug_memory_notifier() stub] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: Dave Hansen <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Wei Xu <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	mm/migrate: optimize hotplug-time demotion order updates	Dave Hansen	1	-1/+11
	Patch series "mm/migrate: 5.15 fixes for automatic demotion", v2. This contains two fixes for the "automatic demotion" code which was merged into 5.15: * Fix memory hotplug performance regression by watching suppressing any real action on irrelevant hotplug events. * Ensure CPU hotplug handler is registered when memory hotplug is disabled. This patch (of 2): == tl;dr == Automatic demotion opted for a simple, lazy approach to handling hotplug events. This noticeably slows down memory hotplug[1]. Optimize away updates to the demotion order when memory hotplug events should have no effect. This has no effect on CPU hotplug. There is no known problem on the CPU side and any work there will be in a separate series. == Background == Automatic demotion is a memory migration strategy to ensure that new allocations have room in faster memory tiers on tiered memory systems. The kernel maintains an array (node_demotion[]) to drive these migrations. The node_demotion[] path is calculated by starting at nodes with CPUs and then "walking" to nodes with memory. Only hotplug events which online or offline a node with memory (N_ONLINE) or CPUs (N_CPU) will actually affect the migration order. == Problem == However, the current code is lazy. It completely regenerates the migration order on any CPU or memory hotplug event. The logic was that these events are extremely rare and that the overhead from indiscriminate order regeneration is minimal. Part of the update logic involves a synchronize_rcu(), which is a pretty big hammer. Its overhead was large enough to be detected by some 0day tests that watch memory hotplug performance[1]. == Solution == Add a new helper (node_demotion_topo_changed()) which can differentiate between superfluous and impactful hotplug events. Skip the expensive update operation for superfluous events. == Aside: Locking == It took me a few moments to declare the locking to be safe enough for node_demotion_topo_changed() to work. It all hinges on the memory hotplug lock: During memory hotplug events, 'mem_hotplug_lock' is held for write. This ensures that two memory hotplug events can not be called simultaneously. CPU hotplug has a similar lock (cpuhp_state_mutex) which also provides mutual exclusion between CPU hotplug events. In addition, the demotion code acquire and hold the mem_hotplug_lock for read during its CPU hotplug handlers. This provides mutual exclusion between the demotion memory hotplug callbacks and the CPU hotplug callbacks. This effectively allows treating the migration target generation code to act as if it is single-threaded. 1. https://lore.kernel.org/all/20210905135932.GE15026@xsang-OptiPlex-9020/ Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 884a6e5d1f93 ("mm/migrate: update node demotion order on hotplug events") Signed-off-by: Dave Hansen <[email protected]> Reported-by: kernel test robot <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Wei Xu <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: David Rientjes <[email protected]> Cc: Dan Williams <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Yang Shi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	userfaultfd: fix a race between writeprotect and exit_mmap()	Nadav Amit	1	-3/+9
	A race is possible when a process exits, its VMAs are removed by exit_mmap() and at the same time userfaultfd_writeprotect() is called. The race was detected by KASAN on a development kernel, but it appears to be possible on vanilla kernels as well. Use mmget_not_zero() to prevent the race as done in other userfaultfd operations. Link: https://lkml.kernel.org/r/[email protected] Fixes: 63b2d4174c4ad ("userfaultfd: wp: add the writeprotect API to userfaultfd ioctl") Signed-off-by: Nadav Amit <[email protected]> Tested-by: Li Wang <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-18	mm/userfaultfd: selftests: fix memory corruption with thp enabled	Peter Xu	1	-3/+20
	In RHEL's gating selftests we've encountered memory corruption in the uffd event test even with upstream kernel: # ./userfaultfd anon 128 4 nr_pages: 32768, nr_pages_per_cpu: 32768 bounces: 3, mode: rnd racing read, userfaults: 6240 missing (6240) 14729 wp (14729) bounces: 2, mode: racing read, userfaults: 1444 missing (1444) 28877 wp (28877) bounces: 1, mode: rnd read, userfaults: 6055 missing (6055) 14699 wp (14699) bounces: 0, mode: read, userfaults: 82 missing (82) 25196 wp (25196) testing uffd-wp with pagemap (pgsize=4096): done testing uffd-wp with pagemap (pgsize=2097152): done testing events (fork, remap, remove): ERROR: nr 32427 memory corruption 0 1 (errno=0, line=963) ERROR: faulting process failed (errno=0, line=1117) It can be easily reproduced when global thp enabled, which is the default for RHEL. It's also known as a side effect of commit 0db282ba2c12 ("selftest: use mmap instead of posix_memalign to allocate memory", 2021-07-23), which is imho right itself on using mmap() to make sure the addresses will be untagged even on arm. The problem is, for each test we allocate buffers using two allocate_area() calls. We assumed these two buffers won't affect each other, however they could, because mmap() could have found that the two buffers are near each other and having the same VMA flags, so they got merged into one VMA. It won't be a big problem if thp is not enabled, but when thp is agressively enabled it means when initializing the src buffer it could accidentally setup part of the dest buffer too when there's a shared THP that overlaps the two regions. Then some of the dest buffer won't be able to be trapped by userfaultfd missing mode, then it'll cause memory corruption as described. To fix it, do release_pages() after initializing the src buffer. Since the previous two release_pages() calls are after uffd_test_ctx_clear() which will unmap all the buffers anyway (which is stronger than release pages; as unmap() also tear town pgtables), drop them as they shouldn't really be anything useful. We can mark the Fixes tag upon 0db282ba2c12 as it's reported to only happen there, however the real "Fixes" IMHO should be 8ba6e8640844, as before that commit we'll always do explicit release_pages() before registration of uffd, and 8ba6e8640844 changed that logic by adding extra unmap/map and we didn't release the pages at the right place. Meanwhile I don't have a solid glue anyway on whether posix_memalign() could always avoid triggering this bug, hence it's safer to attach this fix to commit 8ba6e8640844. Link: https://lkml.kernel.org/r/[email protected] Fixes: 8ba6e8640844 ("userfaultfd/selftests: reinitialize test context in each test") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1994931 Signed-off-by: Peter Xu <[email protected]> Reported-by: Li Wang <[email protected]> Tested-by: Li Wang <[email protected]> Reviewed-by: Axel Rasmussen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Nadav Amit <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-10-19	ALSA: usb-audio: Fix microphone sound on Jieli webcam.	Marco Giunta	2	-0/+14
	When a Jieli Technology USB Webcam is connected, the video part works well, but the mic sound is speeded up. On dmesg there are messages about different rates from the runtime rates, warnings about volume resolution and lastly, the log is filled, every 5 seconds, with retire_capture_urb error messages. The mic works only when ep packet size is set to wMaxPacketSize (normal sound and no more retire_capture_urb error messages). Skipping reading sample rate, fixes the messages about different rates and forcing a volume resolution, fixes warnings about volume range. I have arbitrarily choosed the value (16): I read in a comment that there should be no more than 255 levels, so 4096 (max volume) / 16 = 0-255. Signed-off-by: Marco Giunta <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2021-10-18	scsi: ufs: ufs-pci: Force a full restore after suspend-to-disk	Adrian Hunter	1	-15/+18
	Implement the ->restore() PM operation and set the link to off, which will force a full reset and restore. This ensures that Host Performance Booster is reset after suspend-to-disk. The Host Performance Booster feature caches logical-to-physical mapping information in the host memory. After suspend-to-disk, such information is not valid, so a full reset and restore is needed. A full reset and restore is done if the SPM level is 5 or 6, but not for other SPM levels, so this change fixes those cases. A full reset and restore also restores base address registers, so that code is removed. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Avri Altman <[email protected]> Signed-off-by: Adrian Hunter <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2021-10-18	scsi: qla2xxx: Fix unmap of already freed sgl	Dmitry Bogdanov	1	-9/+5
	The sgl is freed in the target stack in target_release_cmd_kref() before calling qlt_free_cmd() but there is an unmap of sgl in qlt_free_cmd() that causes a panic if sgl is not yet DMA unmapped: NIP dma_direct_unmap_sg+0xdc/0x180 LR dma_direct_unmap_sg+0xc8/0x180 Call Trace: ql_dbg_prefix+0x68/0xc0 [qla2xxx] (unreliable) dma_unmap_sg_attrs+0x54/0xf0 qlt_unmap_sg.part.19+0x54/0x1c0 [qla2xxx] qlt_free_cmd+0x124/0x1d0 [qla2xxx] tcm_qla2xxx_release_cmd+0x4c/0xa0 [tcm_qla2xxx] target_put_sess_cmd+0x198/0x370 [target_core_mod] transport_generic_free_cmd+0x6c/0x1b0 [target_core_mod] tcm_qla2xxx_complete_free+0x6c/0x90 [tcm_qla2xxx] The sgl may be left unmapped in error cases of response sending. For instance, qlt_rdy_to_xfer() maps sgl and exits when session is being deleted keeping the sgl mapped. This patch removes use-after-free of the sgl and ensures that the sgl is unmapped for any command that was not sent to firmware. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Himanshu Madhani <[email protected]> Signed-off-by: Dmitry Bogdanov <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2021-10-18	scsi: qla2xxx: Fix a memory leak in an error path of qla2x00_process_els()	Joy Gu	1	-1/+1
	Commit 8c0eb596baa5 ("[SCSI] qla2xxx: Fix a memory leak in an error path of qla2x00_process_els()"), intended to change: bsg_job->request->msgcode == FC_BSG_HST_ELS_NOLOGIN to: bsg_job->request->msgcode != FC_BSG_RPT_ELS but changed it to: bsg_job->request->msgcode == FC_BSG_RPT_ELS instead. Change the == to a != to avoid leaking the fcport structure or freeing unallocated memory. Link: https://lore.kernel.org/r/[email protected] Fixes: 8c0eb596baa5 ("[SCSI] qla2xxx: Fix a memory leak in an error path of qla2x00_process_els()") Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Joy Gu <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2021-10-18	scsi: qla2xxx: Return -ENOMEM if kzalloc() fails	Zheyu Ma	1	-1/+1
	The driver probing function should return < 0 for failure, otherwise kernel will treat value > 0 as success. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Himanshu Madhani <[email protected]> Signed-off-by: Zheyu Ma <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>