aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-03-27drm/amd/swsmu: add smu 14.0.1 vcn and jpeg msglima10024-27/+82
add new vcn and jpeg msg v2: squash in updates (Alex) v3: rework code for better compat with other smu14.x variants (Alex) Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: lima1002 <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-03-27fs,block: yield devices earlyChristian Brauner13-55/+76
Currently a device is only really released once the umount returns to userspace due to how file closing works. That ultimately could cause an old umount assumption to be violated that concurrent umount and mount don't fail. So an exclusively held device with a temporary holder should be yielded before the filesystem is gone. Add a helper that allows callers to do that. This also allows us to remove the two holder ops that Linus wasn't excited about. Link: https://lore.kernel.org/r/[email protected] Fixes: f3a608827d1f ("bdev: open block device as files") # mainline only Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-03-27block: count BLK_OPEN_RESTRICT_WRITES openersChristian Brauner1-3/+3
The original changes in v6.8 do allow for a block device to be reopened with BLK_OPEN_RESTRICT_WRITES provided the same holder is used as per bdev_may_open(). I think this has a bug. The first opener @f1 of that block device will set bdev->bd_writers to -1. The second opener @f2 using the same holder will pass the check in bdev_may_open() that bdev->bd_writers must not be greater than zero. The first opener @f1 now closes the block device and in bdev_release() will end up calling bdev_yield_write_access() which calls bdev_writes_blocked() and sets bdev->bd_writers to 0 again. Now @f2 holds a file to that block device which was opened with exclusive write access but bdev->bd_writers has been reset to 0. So now @f3 comes along and succeeds in opening the block device with BLK_OPEN_WRITE betraying @f2's request to have exclusive write access. This isn't a practical issue yet because afaict there's no codepath inside the kernel that reopenes the same block device with BLK_OPEN_RESTRICT_WRITES but it will be if there is. Fix this by counting the number of BLK_OPEN_RESTRICT_WRITES openers. So we only allow writes again once all BLK_OPEN_RESTRICT_WRITES openers are done. Link: https://lore.kernel.org/r/20240323-abtauchen-klauen-c2953810082d@brauner Fixes: ed5cc702d311 ("block: Add config option to not allow writing to mounted devices") Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-03-27selftests: netdevsim: set test timeout to 10 minutesJakub Kicinski1-0/+1
The longest running netdevsim test, nexthop.sh, currently takes 5 min to finish. Around 260s to be exact, and 310s on a debug kernel. The default timeout in selftest is 45sec, so we need an explicit config. Give ourselves some headroom and use 10min. Commit under Fixes isn't really to "blame" but prior to that netdevsim tests weren't integrated with kselftest infra so blaming the tests themselves doesn't seem right, either. Fixes: 8ff25dac88f6 ("netdevsim: add Makefile for selftests") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-27net: wan: framer: Add missing static inline qualifiersHerve Codina1-2/+2
Compilation with CONFIG_GENERIC_FRAMER disabled lead to the following warnings: framer.h:184:16: warning: no previous prototype for function 'framer_get' [-Wmissing-prototypes] 184 | struct framer *framer_get(struct device *dev, const char *con_id) framer.h:184:1: note: declare 'static' if the function is not intended to be used outside of this translation unit 184 | struct framer *framer_get(struct device *dev, const char *con_id) framer.h:189:6: warning: no previous prototype for function 'framer_put' [-Wmissing-prototypes] 189 | void framer_put(struct device *dev, struct framer *framer) framer.h:189:1: note: declare 'static' if the function is not intended to be used outside of this translation unit 189 | void framer_put(struct device *dev, struct framer *framer) Add missing 'static inline' qualifiers for these functions. Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Fixes: 82c944d05b1a ("net: wan: Add framer framework support") Cc: [email protected] Signed-off-by: Herve Codina <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-27ALSA: hda/tas2781: remove useless dev_dbg from playback_hookGergo Koteles1-2/+0
The debug message "Playback action not supported: action" is not useful, because the action was previously printed, and the list of supported actions are intentional. Remove the debug statement from the default switch case. Signed-off-by: Gergo Koteles <[email protected]> Message-ID: <8b9546db6c92dea4476a7247a88d56248c2ba8c2.1711469583.git.soyer@irl.hu> Signed-off-by: Takashi Iwai <[email protected]>
2024-03-27ALSA: hda/tas2781: add debug statements to kcontrolsGergo Koteles1-4/+31
Sometimes it is useful to examine the timing of kcontrol events. Add debug statements to each kcontrol. Signed-off-by: Gergo Koteles <[email protected]> Message-ID: <18ff4b0caab90a2dacf907e62346fd5079a9eb1a.1711469583.git.soyer@irl.hu> Signed-off-by: Takashi Iwai <[email protected]>
2024-03-27ALSA: hda/tas2781: add locks to kcontrolsGergo Koteles1-2/+48
The rcabin.profile_cfg_id, cur_prog, cur_conf, force_fwload_status variables are acccessible from multiple threads and therefore require locking. Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver") CC: [email protected] Signed-off-by: Gergo Koteles <[email protected]> Message-ID: <e35b867f6fe5fa1f869dd658a0a1f2118b737f57.1711469583.git.soyer@irl.hu> Signed-off-by: Takashi Iwai <[email protected]>
2024-03-27ALSA: hda/tas2781: remove digital gain kcontrolGergo Koteles1-36/+1
The "Speaker Digital Gain" kcontrol controls the TAS2781_DVC_LVL (0x1A) register. Unfortunately the tas2563 does not have DVC_LVL, but has INT_MASK0 in 0x1A, which has been misused so far. Since commit c1947ce61ff4 ("ALSA: hda/realtek: tas2781: enable subwoofer volume control") the volume of the tas2781 amplifiers can be controlled by the master volume, so this digital gain kcontrol is not needed. Remove it. Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver") CC: [email protected] Signed-off-by: Gergo Koteles <[email protected]> Message-ID: <741fc21db994efd58f83e7aef38931204961e5b2.1711469583.git.soyer@irl.hu> Signed-off-by: Takashi Iwai <[email protected]>
2024-03-27ALSA: aoa: avoid false-positive format truncation warningArnd Bergmann1-1/+1
clang warns about what it interprets as a truncated snprintf: sound/aoa/soundbus/i2sbus/core.c:171:6: error: 'snprintf' will always be truncated; specified size is 6, but format string expands to at least 7 [-Werror,-Wformat-truncation-non-kprintf] The actual problem here is that it does not understand the special %pOFn format string and assumes that it is a pointer followed by the string "OFn", which would indeed not fit. Slightly increasing the size of the buffer to its natural alignment avoids the warning, as it is now long enough for the correct and the incorrect interprations. Fixes: b917d58dcfaa ("ALSA: aoa: Convert to using %pOFn instead of device_node.name") Signed-off-by: Arnd Bergmann <[email protected]> Message-ID: <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2024-03-27block: handle BLK_OPEN_RESTRICT_WRITES correctlyChristian Brauner2-7/+9
Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By default this option is set. When it is set the long-standing behavior of being able to write to mounted block devices is enabled. But in order to guard against unintended corruption by writing to the block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned off. In that case it isn't possible to write to mounted block devices anymore. A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES which disallows concurrent BLK_OPEN_WRITE access. When we still had the bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because the mode was passed around. Since we managed to get rid of the bdev handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based on whether the file was opened writable and writes to that block device are blocked. That logic doesn't work because we do allow BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE. Fix the detection logic and use an FMODE_* bit. We could've also abused O_EXCL as an indicator that BLK_OPEN_RESTRICT_WRITES has been requested. For userspace open paths O_EXCL will never be retained but for internal opens where we open files that are never installed into a file descriptor table this is fine. But it would be a gamble that this doesn't cause bugs. Note that BLK_OPEN_RESTRICT_WRITES is an internal only flag that cannot directly be raised by userspace. It is implicitly raised during mounting. Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and unset. Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/20240323-zielbereich-mittragen-6fdf14876c3e@brauner Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access") Reviewed-by: Yu Kuai <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reported-by: Matthew Wilcox <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-03-26Merge branch '100GbE' of ↵Jakub Kicinski7-36/+37
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-03-25 (ice, ixgbe, igc) This series contains updates to ice, ixgbe, and igc drivers. Steven fixes incorrect casting of bitmap type for ice driver. Jesse fixes memory corruption issue with suspend flow on ice. Przemek adds GFP_ATOMIC flag to avoid sleeping in IRQ context for ixgbe. Kurt Kanzenbach removes no longer valid comment on igc. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: igc: Remove stale comment about Tx timestamping ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa() ice: fix memory corruption bug with suspend and rebuild ice: Refactor FW data type and fix bitmap casting issue ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-26mlxbf_gige: call request_irq() after NAPI initializedDavid Thompson1-7/+11
The mlxbf_gige driver encounters a NULL pointer exception in mlxbf_gige_open() when kdump is enabled. The sequence to reproduce the exception is as follows: a) enable kdump b) trigger kdump via "echo c > /proc/sysrq-trigger" c) kdump kernel executes d) kdump kernel loads mlxbf_gige module e) the mlxbf_gige module runs its open() as the the "oob_net0" interface is brought up f) mlxbf_gige module will experience an exception during its open(), something like: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 Mem abort info: ESR = 0x0000000086000004 EC = 0x21: IABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x04: level 0 translation fault user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000 [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 Internal error: Oops: 0000000086000004 [#1] SMP CPU: 0 PID: 812 Comm: NetworkManager Tainted: G OE 5.15.0-1035-bluefield #37-Ubuntu Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024 pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : 0x0 lr : __napi_poll+0x40/0x230 sp : ffff800008003e00 x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8 x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000 x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000 x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0 x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398 x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2 x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100 x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238 Call trace: 0x0 net_rx_action+0x178/0x360 __do_softirq+0x15c/0x428 __irq_exit_rcu+0xac/0xec irq_exit+0x18/0x2c handle_domain_irq+0x6c/0xa0 gic_handle_irq+0xec/0x1b0 call_on_irq_stack+0x20/0x2c do_interrupt_handler+0x5c/0x70 el1_interrupt+0x30/0x50 el1h_64_irq_handler+0x18/0x2c el1h_64_irq+0x7c/0x80 __setup_irq+0x4c0/0x950 request_threaded_irq+0xf4/0x1bc mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige] mlxbf_gige_open+0x5c/0x170 [mlxbf_gige] __dev_open+0x100/0x220 __dev_change_flags+0x16c/0x1f0 dev_change_flags+0x2c/0x70 do_setlink+0x220/0xa40 __rtnl_newlink+0x56c/0x8a0 rtnl_newlink+0x58/0x84 rtnetlink_rcv_msg+0x138/0x3c4 netlink_rcv_skb+0x64/0x130 rtnetlink_rcv+0x20/0x30 netlink_unicast+0x2ec/0x360 netlink_sendmsg+0x278/0x490 __sock_sendmsg+0x5c/0x6c ____sys_sendmsg+0x290/0x2d4 ___sys_sendmsg+0x84/0xd0 __sys_sendmsg+0x70/0xd0 __arm64_sys_sendmsg+0x2c/0x40 invoke_syscall+0x78/0x100 el0_svc_common.constprop.0+0x54/0x184 do_el0_svc+0x30/0xac el0_svc+0x48/0x160 el0t_64_sync_handler+0xa4/0x12c el0t_64_sync+0x1a4/0x1a8 Code: bad PC value ---[ end trace 7d1c3f3bf9d81885 ]--- Kernel panic - not syncing: Oops: Fatal exception in interrupt Kernel Offset: 0x2870a7a00000 from 0xffff800008000000 PHYS_OFFSET: 0x80000000 CPU features: 0x0,000005c1,a3332a5a Memory Limit: none ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]--- The exception happens because there is a pending RX interrupt before the call to request_irq(RX IRQ) executes. Then, the RX IRQ handler fires immediately after this request_irq() completes. The RX IRQ handler runs "napi_schedule()" before NAPI is fully initialized via "netif_napi_add()" and "napi_enable()", both which happen later in the open() logic. The logic in mlxbf_gige_open() must fully initialize NAPI before any calls to request_irq() execute. Fixes: f92e1869d74e ("Add Mellanox BlueField Gigabit Ethernet driver") Signed-off-by: David Thompson <[email protected]> Reviewed-by: Asmaa Mnebhi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-26Merge branch 'tls-recvmsg-fixes'Jakub Kicinski2-2/+39
Sabrina Dubroca says: ==================== tls: recvmsg fixes The first two fixes are again related to async decrypt. The last one is unrelated but I stumbled upon it while reading the code. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-26tls: get psock ref after taking rxlock to avoid leakSabrina Dubroca1-1/+1
At the start of tls_sw_recvmsg, we take a reference on the psock, and then call tls_rx_reader_lock. If that fails, we return directly without releasing the reference. Instead of adding a new label, just take the reference after locking has succeeded, since we don't need it before. Fixes: 4cbc325ed6b4 ("tls: rx: allow only one reader at a time") Signed-off-by: Sabrina Dubroca <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/fe2ade22d030051ce4c3638704ed58b67d0df643.1711120964.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-26selftests: tls: add test with a partially invalid iovSabrina Dubroca1-0/+34
Make sure that we don't return more bytes than we actually received if the userspace buffer was bogus. We expect to receive at least the rest of rec1, and possibly some of rec2 (currently, we don't, but that would be ok). Signed-off-by: Sabrina Dubroca <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/720e61b3d3eab40af198a58ce2cd1ee019f0ceb1.1711120964.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-26tls: adjust recv return with async crypto and failed copy to userspaceSabrina Dubroca1-0/+3
process_rx_list may not copy as many bytes as we want to the userspace buffer, for example in case we hit an EFAULT during the copy. If this happens, we should only count the bytes that were actually copied, which may be 0. Subtracting async_copy_bytes is correct in both peek and !peek cases, because decrypted == async_copy_bytes + peeked for the peek case: peek is always !ZC, and we can go through either the sync or async path. In the async case, we add chunk to both decrypted and async_copy_bytes. In the sync case, we add chunk to both decrypted and peeked. I missed that in commit 6caaf104423d ("tls: fix peeking with sync+async decryption"). Fixes: 4d42cd6bc2ac ("tls: rx: fix return value for async crypto") Signed-off-by: Sabrina Dubroca <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/1b5a1eaab3c088a9dd5d9f1059ceecd7afe888d1.1711120964.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-26tls: recv: process_rx_list shouldn't use an offset with kvecSabrina Dubroca1-1/+1
Only MSG_PEEK needs to copy from an offset during the final process_rx_list call, because the bytes we copied at the beginning of tls_sw_recvmsg were left on the rx_list. In the KVEC case, we removed data from the rx_list as we were copying it, so there's no need to use an offset, just like in the normal case. Fixes: 692d7b5d1f91 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Sabrina Dubroca <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/e5487514f828e0347d2b92ca40002c62b58af73d.1711120964.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-26riscv: Mark __se_sys_* functions __usedSami Tolvanen1-1/+2
Clang doesn't think ___se_sys_* functions used even though they are aliased to __se_sys_*, resulting in -Wunused-function warnings when building rv32. For example: mm/oom_kill.c:1195:1: warning: unused function '___se_sys_process_mrelease' [-Wunused-function] 1195 | SYSCALL_DEFINE2(process_mrelease, int, pidfd, unsigned int, flags) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/syscalls.h:221:36: note: expanded from macro 'SYSCALL_DEFINE2' 221 | #define SYSCALL_DEFINE2(name, ...) SYSCALL_DEFINEx(2, _##name, __VA_ARGS__) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/syscalls.h:231:2: note: expanded from macro 'SYSCALL_DEFINEx' 231 | __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/riscv/include/asm/syscall_wrapper.h:81:2: note: expanded from macro '__SYSCALL_DEFINEx' 81 | __SYSCALL_SE_DEFINEx(x, sys, name, __VA_ARGS__) \ | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/riscv/include/asm/syscall_wrapper.h:40:14: note: expanded from macro '__SYSCALL_SE_DEFINEx' 40 | static long ___se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__)) | ^~~~~~~~~~~~~~~~~~~~ <scratch space>:30:1: note: expanded from here 30 | ___se_sys_process_mrelease | ^~~~~~~~~~~~~~~~~~~~~~~~~~ 1 warning generated. Mark the functions __used explicitly to fix the Clang warnings. Fixes: a9ad73295cc1 ("riscv: Fix syscall wrapper for >word-size arguments") Reported-by: Linux Kernel Functional Testing <[email protected]> Tested-by: Linux Kernel Functional Testing <[email protected]> Signed-off-by: Sami Tolvanen <[email protected]> Reviewed-by: Alexandre Ghiti <[email protected]> Tested-by: Conor Dooley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-03-26drivers/perf: riscv: Disable PERF_SAMPLE_BRANCH_* while not supportedPu Lehui1-0/+4
RISC-V perf driver does not yet support branch sampling. Although the specification is in the works [0], it is best to disable such events until support is available, otherwise we will get unexpected results. Due to this reason, two riscv bpf testcases get_branch_snapshot and perf_branches/perf_branches_hw fail. Link: https://github.com/riscv/riscv-control-transfer-records [0] Fixes: f5bfa23f576f ("RISC-V: Add a perf core library for pmu drivers") Signed-off-by: Pu Lehui <[email protected]> Reviewed-by: Atish Patra <[email protected]> Reviewed-by: Conor Dooley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-03-26riscv: compat_vdso: install compat_vdso.so.dbg to /lib/modules/*/vdso/Masahiro Yamada1-1/+1
'make vdso_install' installs debug vdso files to /lib/modules/*/vdso/. Only for the compat vdso on riscv, the installation destination differs; compat_vdso.so.dbg is installed to /lib/module/*/compat_vdso/. To follow the standard install destination and simplify the vdso_install logic, change the install destination to standard /lib/modules/*/vdso/. Signed-off-by: Masahiro Yamada <[email protected]> Reviewed-by: Alexandre Ghiti <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-03-26riscv: hwprobe: do not produce frtace relocationVladimir Isaev1-0/+1
Such relocation causes crash of android linker similar to one described in commit e05d57dcb8c7 ("riscv: Fixup __vdso_gettimeofday broke dynamic ftrace"). Looks like this relocation is added by CONFIG_DYNAMIC_FTRACE which is disabled in the default android kernel. Before: readelf -rW arch/riscv/kernel/vdso/vdso.so: Relocation section '.rela.dyn' at offset 0xd00 contains 1 entry: Offset Info Type 0000000000000d20 0000000000000003 R_RISCV_RELATIVE objdump: 0000000000000c86 <__vdso_riscv_hwprobe@@LINUX_4.15>: c86: 0001 nop c88: 0001 nop c8a: 0001 nop c8c: 0001 nop c8e: e211 bnez a2,c92 <__vdso_riscv_hwprobe... After: readelf -rW arch/riscv/kernel/vdso/vdso.so: There are no relocations in this file. objdump: 0000000000000c86 <__vdso_riscv_hwprobe@@LINUX_4.15>: c86: e211 bnez a2,c8a <__vdso_riscv_hwprobe... c88: c6b9 beqz a3,cd6 <__vdso_riscv_hwprobe... c8a: e739 bnez a4,cd8 <__vdso_riscv_hwprobe... c8c: ffffd797 auipc a5,0xffffd Also disable SCS since it also should not be available in vdso. Fixes: aa5af0aa90ba ("RISC-V: Add hwprobe vDSO function and data") Signed-off-by: Roman Artemev <[email protected]> Signed-off-by: Vladimir Isaev <[email protected]> Reviewed-by: Alexandre Ghiti <[email protected]> Reviewed-by: Guo Ren <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-03-26RAS: Avoid build errors when CONFIG_DEBUG_FS=nYazen Ghannam1-0/+4
A new helper was introduced for RAS modules to be able to get the RAS subsystem debugfs root directory. The helper is defined in debugfs.c which is only built when CONFIG_DEBUG_FS=y. However, it's possible that the modules would include debugfs support for optional functionality. One current example is the fmpm module. In this case, a build error will occur when CONFIG_RAS_FMPM is selected and CONFIG_DEBUG_FS=n. Add an inline helper function stub for the CONFIG_DEBUG_FS=n case as the fmpm module can function without the debugfs functionality too. Fixes: 9d2b6fa09d15 ("RAS: Export helper to get ras_debugfs_dir") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218640 Reported-by: anthony s. knowles <[email protected]> Signed-off-by: Yazen Ghannam <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Tested-by: anthony s. knowles <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-03-26of: dynamic: Synchronize of_changeset_destroy() with the devlink removalsHerve Codina1-0/+12
In the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove() During the step 1, devices are destroyed and devlinks are removed. During the step 2, OF nodes are destroyed but __of_changeset_entry_destroy() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 ... Indeed, during the devlink removals performed at step 1, the removal itself releasing the device (and the attached of_node) is done by a job queued in a workqueue and so, it is done asynchronously with respect to function calls. When the warning is present, of_node_put() will be called but wrongly too late from the workqueue job. In order to be sure that any ongoing devlink removals are done before the of_node destruction, synchronize the of_changeset_destroy() with the devlink removals. Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal") Cc: [email protected] Signed-off-by: Herve Codina <[email protected]> Reviewed-by: Saravana Kannan <[email protected]> Tested-by: Luca Ceresoli <[email protected]> Reviewed-by: Nuno Sa <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2024-03-26driver core: Introduce device_link_wait_removal()Herve Codina2-3/+24
The commit 80dd33cf72d1 ("drivers: base: Fix device link removal") introduces a workqueue to release the consumer and supplier devices used in the devlink. In the job queued, devices are release and in turn, when all the references to these devices are dropped, the release function of the device itself is called. Nothing is present to provide some synchronisation with this workqueue in order to ensure that all ongoing releasing operations are done and so, some other operations can be started safely. For instance, in the following sequence: 1) of_platform_depopulate() 2) of_overlay_remove() During the step 1, devices are released and related devlinks are removed (jobs pushed in the workqueue). During the step 2, OF nodes are destroyed but, without any synchronisation with devlink removal jobs, of_overlay_remove() can raise warnings related to missing of_node_put(): ERROR: memory leak, expected refcount 1 instead of 2 Indeed, the missing of_node_put() call is going to be done, too late, from the workqueue job execution. Introduce device_link_wait_removal() to offer a way to synchronize operations waiting for the end of devlink removals (i.e. end of workqueue jobs). Also, as a flushing operation is done on the workqueue, the workqueue used is moved from a system-wide workqueue to a local one. Cc: [email protected] Signed-off-by: Herve Codina <[email protected]> Tested-by: Luca Ceresoli <[email protected]> Reviewed-by: Nuno Sa <[email protected]> Reviewed-by: Saravana Kannan <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2024-03-26smb3: add trace event for mknodSteve French2-1/+10
Add trace points to help debug mknod and mkfifo: smb3_mknod_done smb3_mknod_enter smb3_mknod_err Example output: TASK-PID CPU# ||||| TIMESTAMP FUNCTION | | | ||||| | | mkfifo-6163 [003] ..... 960.425558: smb3_mknod_enter: xid=12 sid=0xb55130f6 tid=0x46e6241c path=\fifo1 mkfifo-6163 [003] ..... 960.432719: smb3_mknod_done: xid=12 sid=0xb55130f6 tid=0x46e6241c Reviewed-by: Bharath SM <[email protected]> Reviewed-by: Meetakshi Setiya <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-03-26crash: use macro to add crashk_res into iomem early for specific archBaoquan He2-0/+9
There are regression reports[1][2] that crashkernel region on x86_64 can't be added into iomem tree sometime. This causes the later failure of kdump loading. This happened after commit 4a693ce65b18 ("kdump: defer the insertion of crashkernel resources") was merged. Even though, these reported issues are proved to be related to other component, they are just exposed after above commmit applied, I still would like to keep crashk_res and crashk_low_res being added into iomem early as before because the early adding has been always there on x86_64 and working very well. For safety of kdump, Let's change it back. Here, add a macro HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY to limit that only ARCH defining the macro can have the early adding crashk_res/_low_res into iomem. Then define HAVE_ARCH_ADD_CRASH_RES_TO_IOMEM_EARLY on x86 to enable it. Note: In reserve_crashkernel_low(), there's a remnant of crashk_low_res handling which was mistakenly added back in commit 85fcde402db1 ("kexec: split crashkernel reservation code out from crash_core.c"). [1] [PATCH V2] x86/kexec: do not update E820 kexec table for setup_data https://lore.kernel.org/all/[email protected]/T/#u [2] Question about Address Range Validation in Crash Kernel Allocation https://lore.kernel.org/all/[email protected]/T/#u Link: https://lkml.kernel.org/r/ZgDYemRQ2jxjLkq+@MiWiFi-R3L-srv Fixes: 4a693ce65b18 ("kdump: defer the insertion of crashkernel resources") Signed-off-by: Baoquan He <[email protected]> Cc: Dave Young <[email protected]> Cc: Huacai Chen <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jiri Bohac <[email protected]> Cc: Li Huafei <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mm: zswap: fix data loss on SWP_SYNCHRONOUS_IO devicesJohannes Weiner1-4/+19
Zhongkun He reports data corruption when combining zswap with zram. The issue is the exclusive loads we're doing in zswap. They assume that all reads are going into the swapcache, which can assume authoritative ownership of the data and so the zswap copy can go. However, zram files are marked SWP_SYNCHRONOUS_IO, and faults will try to bypass the swapcache. This results in an optimistic read of the swap data into a page that will be dismissed if the fault fails due to races. In this case, zswap mustn't drop its authoritative copy. Link: https://lore.kernel.org/all/CACSyD1N+dUvsu8=zV9P691B9bVq33erwOXNTmEaUbi9DrDeJzw@mail.gmail.com/ Fixes: b9c91c43412f ("mm: zswap: support exclusive loads") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Zhongkun He <[email protected]> Tested-by: Zhongkun He <[email protected]> Acked-by: Yosry Ahmed <[email protected]> Acked-by: Barry Song <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Acked-by: Chris Li <[email protected]> Cc: <[email protected]> [6.5+] Signed-off-by: Andrew Morton <[email protected]>
2024-03-26selftests/mm: fix ARM related issue with fork after pthread_createEdward Liaw3-0/+15
Following issue was observed while running the uffd-unit-tests selftest on ARM devices. On x86_64 no issues were detected: pthread_create followed by fork caused deadlock in certain cases wherein fork required some work to be completed by the created thread. Used synchronization to ensure that created thread's start function has started before invoking fork. [[email protected]: refactored to use atomic_bool] Link: https://lkml.kernel.org/r/[email protected] Fixes: 760aee0b71e3 ("selftests/mm: add tests for RO pinning vs fork()") Signed-off-by: Lokesh Gidra <[email protected]> Signed-off-by: Edward Liaw <[email protected]> Cc: Peter Xu <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26hexagon: vmlinux.lds.S: handle attributes sectionNathan Chancellor1-0/+1
After the linked LLVM change, the build fails with CONFIG_LD_ORPHAN_WARN_LEVEL="error", which happens with allmodconfig: ld.lld: error: vmlinux.a(init/main.o):(.hexagon.attributes) is being placed in '.hexagon.attributes' Handle the attributes section in a similar manner as arm and riscv by adding it after the primary ELF_DETAILS grouping in vmlinux.lds.S, which fixes the error. Link: https://lkml.kernel.org/r/20240319-hexagon-handle-attributes-section-vmlinux-lds-s-v1-1-59855dab8872@kernel.org Fixes: 113616ec5b64 ("hexagon: select ARCH_WANT_LD_ORPHAN_WARN") Link: https://github.com/llvm/llvm-project/commit/31f4b329c8234fab9afa59494d7f8bdaeaefeaad Signed-off-by: Nathan Chancellor <[email protected]> Reviewed-by: Brian Cain <[email protected]> Cc: Bill Wendling <[email protected]> Cc: Justin Stitt <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26userfaultfd: fix deadlock warning when locking src and dst VMAsLokesh Gidra1-1/+2
Use down_read_nested() to avoid the warning. Link: https://lkml.kernel.org/r/[email protected] Fixes: 867a43a34ff8 ("userfaultfd: use per-vma locks in userfaultfd operations") Reported-by: [email protected] Signed-off-by: Lokesh Gidra <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Axel Rasmussen <[email protected]> Cc: Brian Geffon <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Jann Horn <[email protected]> [Bug #2] Cc: Kalesh Singh <[email protected]> Cc: Lokesh Gidra <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Nicolas Geoffray <[email protected]> Cc: Peter Xu <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26tmpfs: fix race on handling dquot rbtreeCarlos Maiolino1-3/+7
A syzkaller reproducer found a race while attempting to remove dquot information from the rb tree. Fetching the rb_tree root node must also be protected by the dqopt->dqio_sem, otherwise, giving the right timing, shmem_release_dquot() will trigger a warning because it couldn't find a node in the tree, when the real reason was the root node changing before the search starts: Thread 1 Thread 2 - shmem_release_dquot() - shmem_{acquire,release}_dquot() - fetch ROOT - Fetch ROOT - acquire dqio_sem - wait dqio_sem - do something, triger a tree rebalance - release dqio_sem - acquire dqio_sem - start searching for the node, but from the wrong location, missing the node, and triggering a warning. Link: https://lkml.kernel.org/r/[email protected] Fixes: eafc474e2029 ("shmem: prepare shmem quota infrastructure") Signed-off-by: Carlos Maiolino <[email protected]> Reported-by: Ubisectech Sirius <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26selftests/mm: sigbus-wp test requires UFFD_FEATURE_WP_HUGETLBFS_SHMEMEdward Liaw1-1/+2
The sigbus-wp test requires the UFFD_FEATURE_WP_HUGETLBFS_SHMEM flag for shmem and hugetlb targets. Otherwise it is not backwards compatible with kernels <5.19 and fails with EINVAL. Link: https://lkml.kernel.org/r/[email protected] Fixes: 73c1ea939b65 ("selftests/mm: move uffd sig/events tests into uffd unit tests") Signed-off-by: Edward Liaw <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Peter Xu <[email protected] Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mm: zswap: fix writeback shinker GFP_NOIO/GFP_NOFS recursionJohannes Weiner1-0/+8
Kent forwards this bug report of zswap re-entering the block layer from an IO request allocation and locking up: [10264.128242] sysrq: Show Blocked State [10264.128268] task:kworker/20:0H state:D stack:0 pid:143 tgid:143 ppid:2 flags:0x00004000 [10264.128271] Workqueue: bcachefs_io btree_write_submit [bcachefs] [10264.128295] Call Trace: [10264.128295] <TASK> [10264.128297] __schedule+0x3e6/0x1520 [10264.128303] schedule+0x32/0xd0 [10264.128304] schedule_timeout+0x98/0x160 [10264.128308] io_schedule_timeout+0x50/0x80 [10264.128309] wait_for_completion_io_timeout+0x7f/0x180 [10264.128310] submit_bio_wait+0x78/0xb0 [10264.128313] swap_writepage_bdev_sync+0xf6/0x150 [10264.128317] zswap_writeback_entry+0xf2/0x180 [10264.128319] shrink_memcg_cb+0xe7/0x2f0 [10264.128322] __list_lru_walk_one+0xb9/0x1d0 [10264.128325] list_lru_walk_one+0x5d/0x90 [10264.128326] zswap_shrinker_scan+0xc4/0x130 [10264.128327] do_shrink_slab+0x13f/0x360 [10264.128328] shrink_slab+0x28e/0x3c0 [10264.128329] shrink_one+0x123/0x1b0 [10264.128331] shrink_node+0x97e/0xbc0 [10264.128332] do_try_to_free_pages+0xe7/0x5b0 [10264.128333] try_to_free_pages+0xe1/0x200 [10264.128334] __alloc_pages_slowpath.constprop.0+0x343/0xde0 [10264.128337] __alloc_pages+0x32d/0x350 [10264.128338] allocate_slab+0x400/0x460 [10264.128339] ___slab_alloc+0x40d/0xa40 [10264.128345] kmem_cache_alloc+0x2e7/0x330 [10264.128348] mempool_alloc+0x86/0x1b0 [10264.128349] bio_alloc_bioset+0x200/0x4f0 [10264.128352] bio_alloc_clone+0x23/0x60 [10264.128354] alloc_io+0x26/0xf0 [dm_mod 7e9e6b44df4927f93fb3e4b5c782767396f58382] [10264.128361] dm_submit_bio+0xb8/0x580 [dm_mod 7e9e6b44df4927f93fb3e4b5c782767396f58382] [10264.128366] __submit_bio+0xb0/0x170 [10264.128367] submit_bio_noacct_nocheck+0x159/0x370 [10264.128368] bch2_submit_wbio_replicas+0x21c/0x3a0 [bcachefs 85f1b9a7a824f272eff794653a06dde1a94439f2] [10264.128391] btree_write_submit+0x1cf/0x220 [bcachefs 85f1b9a7a824f272eff794653a06dde1a94439f2] [10264.128406] process_one_work+0x178/0x350 [10264.128408] worker_thread+0x30f/0x450 [10264.128409] kthread+0xe5/0x120 The zswap shrinker resumes the swap_writepage()s that were intercepted by the zswap store. This will enter the block layer, and may even enter the filesystem depending on the swap backing file. Make it respect GFP_NOIO and GFP_NOFS. Link: https://lore.kernel.org/linux-mm/rc4pk2r42oyvjo4dc62z6sovquyllq56i5cdgcaqbd7wy3hfzr@n4nbxido3fme/ Link: https://lkml.kernel.org/r/[email protected] Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Kent Overstreet <[email protected]> Acked-by: Yosry Ahmed <[email protected]> Reported-by: Jérôme Poulin <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Cc: [email protected] [v6.8] Signed-off-by: Andrew Morton <[email protected]>
2024-03-26ARM: prctl: reject PR_SET_MDWE on pre-ARMv6Zev Weiss1-0/+14
On v5 and lower CPUs we can't provide MDWE protection, so ensure we fail any attempt to enable it via prctl(PR_SET_MDWE). Previously such an attempt would misleadingly succeed, leading to any subsequent mmap(PROT_READ|PROT_WRITE) or execve() failing unconditionally (the latter somewhat violently via force_fatal_sig(SIGSEGV) due to READ_IMPLIES_EXEC). Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zev Weiss <[email protected]> Cc: <[email protected]> [6.3+] Cc: Borislav Petkov <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Florent Revest <[email protected]> Cc: Helge Deller <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Kees Cook <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Ondrej Mosnacek <[email protected]> Cc: Rick Edgecombe <[email protected]> Cc: Russell King (Oracle) <[email protected]> Cc: Sam James <[email protected]> Cc: Stefan Roesch <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yin Fengwei <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26prctl: generalize PR_SET_MDWE support check to be per-archZev Weiss3-2/+27
Patch series "ARM: prctl: Reject PR_SET_MDWE where not supported". I noticed after a recent kernel update that my ARM926 system started segfaulting on any execve() after calling prctl(PR_SET_MDWE). After some investigation it appears that ARMv5 is incapable of providing the appropriate protections for MDWE, since any readable memory is also implicitly executable. The prctl_set_mdwe() function already had some special-case logic added disabling it on PARISC (commit 793838138c15, "prctl: Disable prctl(PR_SET_MDWE) on parisc"); this patch series (1) generalizes that check to use an arch_*() function, and (2) adds a corresponding override for ARM to disable MDWE on pre-ARMv6 CPUs. With the series applied, prctl(PR_SET_MDWE) is rejected on ARMv5 and subsequent execve() calls (as well as mmap(PROT_READ|PROT_WRITE)) can succeed instead of unconditionally failing; on ARMv6 the prctl works as it did previously. [0] https://lore.kernel.org/all/2023112456-linked-nape-bf19@gregkh/ This patch (of 2): There exist systems other than PARISC where MDWE may not be feasible to support; rather than cluttering up the generic code with additional arch-specific logic let's add a generic function for checking MDWE support and allow each arch to override it as needed. Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Zev Weiss <[email protected]> Acked-by: Helge Deller <[email protected]> [parisc] Cc: Borislav Petkov <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Florent Revest <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Kees Cook <[email protected]> Cc: Miguel Ojeda <[email protected]> Cc: Mike Rapoport (IBM) <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Ondrej Mosnacek <[email protected]> Cc: Rick Edgecombe <[email protected]> Cc: Russell King (Oracle) <[email protected]> Cc: Sam James <[email protected]> Cc: Stefan Roesch <[email protected]> Cc: Yang Shi <[email protected]> Cc: Yin Fengwei <[email protected]> Cc: <[email protected]> [6.3+] Signed-off-by: Andrew Morton <[email protected]>
2024-03-26MAINTAINERS: remove incorrect M: tag for [email protected]Kuan-Wei Chiu1-1/+0
The [email protected] mailing list should only be listed under the L: (List) tag in the MAINTAINERS file. However, it was incorrectly listed under both L: and M: (Maintainers) tags, which is not accurate. Remove the M: tag for [email protected] in the MAINTAINERS file to reflect the correct categorization. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kuan-Wei Chiu <[email protected]> Cc: Ching-Chun (Jim) Huang <[email protected]> Cc: Matthew Sakai <[email protected]> Cc: Michael Sclafani <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mm: zswap: fix kernel BUG in sg_init_oneBarry Song1-2/+12
sg_init_one() relies on linearly mapped low memory for the safe utilization of virt_to_page(). Otherwise, we trigger a kernel BUG, kernel BUG at include/linux/scatterlist.h:187! Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM Modules linked in: CPU: 0 PID: 2997 Comm: syz-executor198 Not tainted 6.8.0-syzkaller #0 Hardware name: ARM-Versatile Express PC is at sg_set_buf include/linux/scatterlist.h:187 [inline] PC is at sg_init_one+0x9c/0xa8 lib/scatterlist.c:143 LR is at sg_init_table+0x2c/0x40 lib/scatterlist.c:128 Backtrace: [<807e16ac>] (sg_init_one) from [<804c1824>] (zswap_decompress+0xbc/0x208 mm/zswap.c:1089) r7:83471c80 r6:def6d08c r5:844847d0 r4:ff7e7ef4 [<804c1768>] (zswap_decompress) from [<804c4468>] (zswap_load+0x15c/0x198 mm/zswap.c:1637) r9:8446eb80 r8:8446eb80 r7:8446eb84 r6:def6d08c r5:00000001 r4:844847d0 [<804c430c>] (zswap_load) from [<804b9644>] (swap_read_folio+0xa8/0x498 mm/page_io.c:518) r9:844ac800 r8:835e6c00 r7:00000000 r6:df955d4c r5:00000001 r4:def6d08c [<804b959c>] (swap_read_folio) from [<804bb064>] (swap_cluster_readahead+0x1c4/0x34c mm/swap_state.c:684) r10:00000000 r9:00000007 r8:df955d4b r7:00000000 r6:00000000 r5:00100cca r4:00000001 [<804baea0>] (swap_cluster_readahead) from [<804bb3b8>] (swapin_readahead+0x68/0x4a8 mm/swap_state.c:904) r10:df955eb8 r9:00000000 r8:00100cca r7:84476480 r6:00000001 r5:00000000 r4:00000001 [<804bb350>] (swapin_readahead) from [<8047cde0>] (do_swap_page+0x200/0xcc4 mm/memory.c:4046) r10:00000040 r9:00000000 r8:844ac800 r7:84476480 r6:00000001 r5:00000000 r4:df955eb8 [<8047cbe0>] (do_swap_page) from [<8047e6c4>] (handle_pte_fault mm/memory.c:5301 [inline]) [<8047cbe0>] (do_swap_page) from [<8047e6c4>] (__handle_mm_fault mm/memory.c:5439 [inline]) [<8047cbe0>] (do_swap_page) from [<8047e6c4>] (handle_mm_fault+0x3d8/0x12b8 mm/memory.c:5604) r10:00000040 r9:842b3900 r8:7eb0d000 r7:84476480 r6:7eb0d000 r5:835e6c00 r4:00000254 [<8047e2ec>] (handle_mm_fault) from [<80215d28>] (do_page_fault+0x148/0x3a8 arch/arm/mm/fault.c:326) r10:00000007 r9:842b3900 r8:7eb0d000 r7:00000207 r6:00000254 r5:7eb0d9b4 r4:df955fb0 [<80215be0>] (do_page_fault) from [<80216170>] (do_DataAbort+0x38/0xa8 arch/arm/mm/fault.c:558) r10:7eb0da7c r9:00000000 r8:80215be0 r7:df955fb0 r6:7eb0d9b4 r5:00000207 r4:8261d0e0 [<80216138>] (do_DataAbort) from [<80200e3c>] (__dabt_usr+0x5c/0x60 arch/arm/kernel/entry-armv.S:427) Exception stack(0xdf955fb0 to 0xdf955ff8) 5fa0: 00000000 00000000 22d5f800 0008d158 5fc0: 00000000 7eb0d9a4 00000000 00000109 00000000 00000000 7eb0da7c 7eb0da3c 5fe0: 00000000 7eb0d9a0 00000001 00066bd4 00000010 ffffffff r8:824a9044 r7:835e6c00 r6:ffffffff r5:00000010 r4:00066bd4 Code: 1a000004 e1822003 e8860094 e89da8f0 (e7f001f2) ---[ end trace 0000000000000000 ]--- ---------------- Code disassembly (best guess): 0: 1a000004 bne 0x18 4: e1822003 orr r2, r2, r3 8: e8860094 stm r6, {r2, r4, r7} c: e89da8f0 ldm sp, {r4, r5, r6, r7, fp, sp, pc} * 10: e7f001f2 udf #18 <-- trapping instruction Consequently, we have two choices: either employ kmap_to_page() alongside sg_set_page(), or resort to copying high memory contents to a temporary buffer residing in low memory. However, considering the introduction of the WARN_ON_ONCE in commit ef6e06b2ef870 ("highmem: fix kmap_to_page() for kmap_local_page() addresses"), which specifically addresses high memory concerns, it appears that memcpy remains the sole viable option. Link: https://lkml.kernel.org/r/[email protected] Fixes: 270700dd06ca ("mm/zswap: remove the memcpy if acomp is not sleepable") Signed-off-by: Barry Song <[email protected]> Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected]/ Tested-by: [email protected] Acked-by: Yosry Ahmed <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Chris Li <[email protected]> Cc: Ira Weiny <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26selftests: mm: restore settings from only parent processMuhammad Usama Anjum1-1/+5
The atexit() is called from parent process as well as forked processes. Hence the child restores the settings at exit while the parent is still executing. Fix this by checking pid of atexit() calling process and only restore THP number from parent process. Link: https://lkml.kernel.org/r/[email protected] Fixes: c23ea61726d5 ("selftests/mm: protection_keys: save/restore nr_hugepages settings") Signed-off-by: Muhammad Usama Anjum <[email protected]> Tested-by: Joey Gouly <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26tools/Makefile: remove cgroup targetCong Liu1-7/+6
The tools/cgroup directory no longer contains a Makefile. This patch updates the top-level tools/Makefile to remove references to building and installing cgroup components. This change reflects the current structure of the tools directory and fixes the build failure when building tools in the top-level directory. linux/tools$ make cgroup DESCEND cgroup make[1]: *** No targets specified and no makefile found. Stop. make: *** [Makefile:73: cgroup] Error 2 Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Cong Liu <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Reviewed-by: Dmitry Rokosov <[email protected]> Cc: Cong Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mm: cachestat: fix two shmem bugsJohannes Weiner1-0/+16
When cachestat on shmem races with swapping and invalidation, there are two possible bugs: 1) A swapin error can have resulted in a poisoned swap entry in the shmem inode's xarray. Calling get_shadow_from_swap_cache() on it will result in an out-of-bounds access to swapper_spaces[]. Validate the entry with non_swap_entry() before going further. 2) When we find a valid swap entry in the shmem's inode, the shadow entry in the swapcache might not exist yet: swap IO is still in progress and we're before __remove_mapping; swapin, invalidation, or swapoff have removed the shadow from swapcache after we saw the shmem swap entry. This will send a NULL to workingset_test_recent(). The latter purely operates on pointer bits, so it won't crash - node 0, memcg ID 0, eviction timestamp 0, etc. are all valid inputs - but it's a bogus test. In theory that could result in a false "recently evicted" count. Such a false positive wouldn't be the end of the world. But for code clarity and (future) robustness, be explicit about this case. Bail on get_shadow_from_swap_cache() returning NULL. Link: https://lkml.kernel.org/r/[email protected] Fixes: cf264e1329fb ("cachestat: implement cachestat syscall") Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Chengming Zhou <[email protected]> [Bug #1] Reported-by: Jann Horn <[email protected]> [Bug #2] Reviewed-by: Chengming Zhou <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Cc: <[email protected]> [v6.5+] Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mm: increase folio batch sizeMatthew Wilcox (Oracle)1-2/+2
On a 104 thread, 2 socket Skylake system, Intel report a 4.7% performance reduction with will-it-scale page_fault2. This was due to reducing the size of the batch from 32 to 15. Increasing the folio batch size from 15 to 31 gives a performance increase of 12.5% relative to the original, or 17.2% relative to the reduced performance commit. The penalty of this commit is an additional 128 bytes of stack usage. Six folio_batches are also allocated from percpu memory in cpu_fbatches so that will be an additional 768 bytes of percpu memory (per CPU). Tim Chen originally submitted a patch like this in 2020: https://lore.kernel.org/linux-mm/d1cc9f12a8ad6c2a52cb600d93b06b064f2bbc57.1593205965.git.tim.c.chen@linux.intel.com/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 99fbb6bfc16f ("mm: make folios_put() the basis of release_pages()") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]> Tested-by: Yujie Liu <[email protected]> Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mm,page_owner: fix recursionOscar Salvador1-10/+23
Prior to 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count") the only place where page_owner could potentially go into recursion due to its need of allocating more memory was in save_stack(), which ends up calling into stackdepot code with the possibility of allocating memory. We made sure to guard against that by signaling that the current task was already in page_owner code, so in case a recursion attempt was made, we could catch that and return dummy_handle. After above commit, a new place in page_owner code was introduced where we could allocate memory, meaning we could go into recursion would we take that path. Make sure to signal that we are in page_owner in that codepath as well. Move the guard code into two helpers {un}set_current_in_page_owner() and use them prior to calling in the two functions that might allocate memory. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Oscar Salvador <[email protected]> Fixes: 217b2119b9e2 ("mm,page_owner: implement the tracking of the stacks count") Reviewed-by: Vlastimil Babka <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Andrey Konovalov <[email protected]> Cc: Marco Elver <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oscar Salvador <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mailmap: update entry for Leonard CrestezLeonard Crestez1-1/+2
Put my personal email first because NXP employment ended some time ago. Also add my old intel email address. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Leonard Crestez <[email protected]> Cc: Florian Fainelli <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26init: open /initrd.image with O_LARGEFILEJohn Sperbeck1-1/+1
If initrd data is larger than 2Gb, we'll eventually fail to write to the /initrd.image file when we hit that limit, unless O_LARGEFILE is set. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: John Sperbeck <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26selftests/mm: Fix build with _FORTIFY_SOURCEVitaly Chikunov3-3/+3
Add missing flags argument to open(2) call with O_CREAT. Some tests fail to compile if _FORTIFY_SOURCE is defined (to any valid value) (together with -O), resulting in similar error messages such as: In file included from /usr/include/fcntl.h:342, from gup_test.c:1: In function 'open', inlined from 'main' at gup_test.c:206:10: /usr/include/bits/fcntl2.h:50:11: error: call to '__open_missing_mode' declared with attribute error: open with O_CREAT or O_TMPFILE in second argument needs 3 arguments 50 | __open_missing_mode (); | ^~~~~~~~~~~~~~~~~~~~~~ _FORTIFY_SOURCE is enabled by default in some distributions, so the tests are not built by default and are skipped. open(2) man-page warns about missing flags argument: "if it is not supplied, some arbitrary bytes from the stack will be applied as the file mode." Link: https://lkml.kernel.org/r/[email protected] Fixes: aeb85ed4f41a ("tools/testing/selftests/vm/gup_benchmark.c: allow user specified file") Fixes: fbe37501b252 ("mm: huge_memory: debugfs for file-backed THP split") Fixes: c942f5bd17b3 ("selftests: soft-dirty: add test for mprotect") Signed-off-by: Vitaly Chikunov <[email protected]> Reviewed-by: Zi Yan <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Keith Busch <[email protected]> Cc: Peter Xu <[email protected]> Cc: Yang Shi <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Nadav Amit <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26mm/memory: fix missing pte marker for !page on pte zapsPeter Xu1-1/+3
Commit 0cf18e839f64 of large folio zap work broke uffd-wp. Now mm's uffd unit test "wp-unpopulated" will trigger this WARN_ON_ONCE(). The WARN_ON_ONCE() asserts that an VMA cannot be registered with userfaultfd-wp if it contains a !normal page, but it's actually possible. One example is an anonymous vma, register with uffd-wp, read anything will install a zero page. Then when zap on it, this should trigger. What's more, removing that WARN_ON_ONCE may not be enough either, because we should also not rely on "whether it's a normal page" to decide whether pte marker is needed. For example, one can register wr-protect over some DAX regions to track writes when UFFD_FEATURE_WP_ASYNC enabled, in which case it can have page==NULL for a devmap but we may want to keep the marker around. Link: https://lkml.kernel.org/r/[email protected] Fixes: 0cf18e839f64 ("mm/memory: handle !page case in zap_present_pte() separately") Signed-off-by: Peter Xu <[email protected]> Acked-by: David Hildenbrand <[email protected]> Cc: Muhammad Usama Anjum <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-03-26block: don't reject too large max_user_sectors in blk_validate_limitsChristoph Hellwig1-2/+1
We already cap down the actual max_sectors to the max of the hardware and user limit, so don't reject the configuration. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: John Garry <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-03-26block: Make blk_rq_set_mixed_merge() staticJohn Garry2-2/+1
Since commit 8e756373d7c8 ("block: Move bio merge related functions into blk-merge.c"), blk_rq_set_mixed_merge() has only been referenced in blk-merge.c, so make it static. Signed-off-by: John Garry <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-03-27MIPS: move unselectable FIT_IMAGE_FDT_EPM5 out of the "System type" choiceMasahiro Yamada1-9/+9
The reason is described in 5033ad566016 ("MIPS: move unselectable entries out of the "CPU type" choice"). At the same time, commit 101bd58fde10 ("MIPS: Add support for Mobileye EyeQ5") introduced another unselectable choice member. (In fact, 5033ad566016 and 101bd58fde10 have the same commit time.) Signed-off-by: Masahiro Yamada <[email protected]>