aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-06-20io_uring: remove io_req_ffs_setChristoph Hellwig3-7/+2
Just checking the flag directly makes it a lot more obvious what is going on here. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-20io_uring: remove a confusing comment above io_file_get_flagsChristoph Hellwig1-5/+0
The SCM inflight mechanism has nothing to do with the fact that a file might be a regular file or not and if it supports non-blocking operations. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-20io_uring: remove the mode variable in io_file_get_flagsChristoph Hellwig1-2/+1
The variable is only once now, so don't bother with it. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-20io_uring: remove __io_file_supports_nowaitChristoph Hellwig1-14/+1
Now that this only checks O_NONBLOCK and FMODE_NOWAIT, the helper is complete overkilļ, and the comments are confusing bordering to wrong. Just inline the check into the caller. Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-20accel/qaic: Call DRM helper function to destroy prime GEMPranjal Ramajor Asha Kanojiya1-2/+2
smatch warning: drivers/accel/qaic/qaic_data.c:620 qaic_free_object() error: dereferencing freed memory 'obj->import_attach' obj->import_attach is detached and freed using dma_buf_detach(). But used after free to decrease the dmabuf ref count using dma_buf_put(). drm_prime_gem_destroy() handles this issue and performs the proper clean up instead of open coding it in the driver. Fixes: ff13be830333 ("accel/qaic: Add datapath") Reported-by: Sukrut Bellary <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Suggested-by: Christian König <[email protected]> Signed-off-by: Pranjal Ramajor Asha Kanojiya <[email protected]> Reviewed-by: Carl Vanderlip <[email protected]> Reviewed-by: Jeffrey Hugo <[email protected]> Signed-off-by: Jeffrey Hugo <[email protected]> Reviewed-by: Christian König <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-06-20reiserfs: fix blkdev_put() warning from release_journal_dev()Yu Kuai1-1/+6
In journal_init_dev(), if super bdev is used as 'j_dev_bd', then blkdev_get_by_dev() is called with NULL holder, otherwise, holder will be journal. However, later in release_journal_dev(), blkdev_put() is called with journal unconditionally, cause following warning: WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 bd_end_claim block/bdev.c:617 [inline] WARNING: CPU: 1 PID: 5034 at block/bdev.c:617 blkdev_put+0x562/0x8a0 block/bdev.c:901 RIP: 0010:blkdev_put+0x562/0x8a0 block/bdev.c:901 Call Trace: <TASK> release_journal_dev fs/reiserfs/journal.c:2592 [inline] free_journal_ram+0x421/0x5c0 fs/reiserfs/journal.c:1896 do_journal_release fs/reiserfs/journal.c:1960 [inline] journal_release+0x276/0x630 fs/reiserfs/journal.c:1971 reiserfs_put_super+0xe4/0x5c0 fs/reiserfs/super.c:616 generic_shutdown_super+0x158/0x480 fs/super.c:499 kill_block_super+0x64/0xb0 fs/super.c:1422 deactivate_locked_super+0x98/0x160 fs/super.c:330 deactivate_super+0xb1/0xd0 fs/super.c:361 cleanup_mnt+0x2ae/0x3d0 fs/namespace.c:1247 task_work_run+0x16f/0x270 kernel/task_work.c:179 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0xadc/0x2a30 kernel/exit.c:874 do_group_exit+0xd4/0x2a0 kernel/exit.c:1024 __do_sys_exit_group kernel/exit.c:1035 [inline] __se_sys_exit_group kernel/exit.c:1033 [inline] __x64_sys_exit_group+0x3e/0x50 kernel/exit.c:1033 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fix this problem by passing in NULL holder in this case. Reported-by: [email protected] Link: https://syzkaller.appspot.com/bug?extid=04625c80899f4555de39 Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens") Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-20block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()Yu Kuai1-1/+2
After commit 2736e8eeb0cc ("block: use the holder as indication for exclusive opens"), blkdev_get_by_dev() will warn if holder is NULL and mode contains 'FMODE_EXCL'. holder from blkdev_get_by_dev() from disk_scan_partitions() is always NULL, hence it should not use 'FMODE_EXCL', which is broben by the commit. For consequence, WARN_ON_ONCE() will be triggered from blkdev_get_by_dev() if user scan partitions with device opened exclusively. Fix this problem by removing 'FMODE_EXCL' from disk_scan_partitions(), as it used to be. Reported-by: [email protected] Link: https://syzkaller.appspot.com/bug?extid=00cd27751f78817f167b Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens") Signed-off-by: Yu Kuai <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-20block: document the holder argument to blkdev_get_by_pathChristoph Hellwig1-0/+1
Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-20block: increment diskseq on all media change eventsDemi Marie Obenour1-0/+1
Currently, associating a loop device with a different file descriptor does not increment its diskseq. This allows the following race condition: 1. Program X opens a loop device 2. Program X gets the diskseq of the loop device. 3. Program X associates a file with the loop device. 4. Program X passes the loop device major, minor, and diskseq to something. 5. Program X exits. 6. Program Y detaches the file from the loop device. 7. Program Y attaches a different file to the loop device. 8. The opener finally gets around to opening the loop device and checks that the diskseq is what it expects it to be. Even though the diskseq is the expected value, the result is that the opener is accessing the wrong file. From discussions with Christoph Hellwig, it appears that disk_force_media_change() was supposed to call inc_diskseq(), but in fact it does not. Adding a Fixes: tag to indicate this. Christoph's Reported-by is because he stated that disk_force_media_change() calls inc_diskseq(), which is what led me to discover that it should but does not. Reported-by: Christoph Hellwig <[email protected]> Signed-off-by: Demi Marie Obenour <[email protected]> Fixes: e6138dc12de9 ("block: add a helper to raise a media changed event") Cc: [email protected] # 5.15+ Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-20swim: fix a missing FMODE_ -> BLK_OPEN_ conversion in floppy_openChristoph Hellwig1-1/+1
Fix a missing conversion to the new BLK_OPEN constant in swim. Fixes: 05bdb9965305 ("block: replace fmode_t with a block-specific type for block open flags") Reported-by: kernel test robot <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-20x86/smp: Put CPUs into INIT on shutdown if possibleThomas Gleixner3-7/+53
Parking CPUs in a HLT loop is not completely safe vs. kexec() as HLT can resume execution due to NMI, SMI and MCE, which has the same issue as the MWAIT loop. Kicking the secondary CPUs into INIT makes this safe against NMI and SMI. A broadcast MCE will take the machine down, but a broadcast MCE which makes HLT resume and execute overwritten text, pagetables or data will end up in a disaster too. So chose the lesser of two evils and kick the secondary CPUs into INIT unless the system has installed special wakeup mechanisms which are not using INIT. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Ashok Raj <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-20x86/smp: Split sending INIT IPI out into a helper functionThomas Gleixner1-29/+20
Putting CPUs into INIT is a safer place during kexec() to park CPUs. Split the INIT assert/deassert sequence out so it can be reused. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Ashok Raj <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-20x86/smp: Cure kexec() vs. mwait_play_dead() breakageThomas Gleixner3-0/+66
TLDR: It's a mess. When kexec() is executed on a system with offline CPUs, which are parked in mwait_play_dead() it can end up in a triple fault during the bootup of the kexec kernel or cause hard to diagnose data corruption. The reason is that kexec() eventually overwrites the previous kernel's text, page tables, data and stack. If it writes to the cache line which is monitored by a previously offlined CPU, MWAIT resumes execution and ends up executing the wrong text, dereferencing overwritten page tables or corrupting the kexec kernels data. Cure this by bringing the offlined CPUs out of MWAIT into HLT. Write to the monitored cache line of each offline CPU, which makes MWAIT resume execution. The written control word tells the offlined CPUs to issue HLT, which does not have the MWAIT problem. That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as those make it come out of HLT. A follow up change will put them into INIT, which protects at least against NMI and SMI. Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case") Reported-by: Ashok Raj <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Ashok Raj <[email protected]> Reviewed-by: Ashok Raj <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2023-06-20x86/smp: Use dedicated cache-line for mwait_play_dead()Thomas Gleixner1-10/+14
Monitoring idletask::thread_info::flags in mwait_play_dead() has been an obvious choice as all what is needed is a cache line which is not written by other CPUs. But there is a use case where a "dead" CPU needs to be brought out of MWAIT: kexec(). This is required as kexec() can overwrite text, pagetables, stacks and the monitored cacheline of the original kernel. The latter causes MWAIT to resume execution which obviously causes havoc on the kexec kernel which results usually in triple faults. Use a dedicated per CPU storage to prepare for that. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Ashok Raj <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2023-06-20x86/smp: Remove pointless wmb()s from native_stop_other_cpus()Thomas Gleixner1-6/+0
The wmb()s before sending the IPIs are not synchronizing anything. If at all then the apic IPI functions have to provide or act as appropriate barriers. Remove these cargo cult barriers which have no explanation of what they are synchronizing. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2023-06-20x86/smp: Dont access non-existing CPUID leafTony Battersby1-2/+3
stop_this_cpu() tests CPUID leaf 0x8000001f::EAX unconditionally. Intel CPUs return the content of the highest supported leaf when a non-existing leaf is read, while AMD CPUs return all zeros for unsupported leafs. So the result of the test on Intel CPUs is lottery. While harmless it's incorrect and causes the conditional wbinvd() to be issued where not required. Check whether the leaf is supported before reading it. [ tglx: Adjusted changelog ] Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use") Signed-off-by: Tony Battersby <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Mario Limonciello <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected]
2023-06-20x86/smp: Make stop_other_cpus() more robustThomas Gleixner3-23/+64
Tony reported intermittent lockups on poweroff. His analysis identified the wbinvd() in stop_this_cpu() as the culprit. This was added to ensure that on SME enabled machines a kexec() does not leave any stale data in the caches when switching from encrypted to non-encrypted mode or vice versa. That wbinvd() is conditional on the SME feature bit which is read directly from CPUID. But that readout does not check whether the CPUID leaf is available or not. If it's not available the CPU will return the value of the highest supported leaf instead. Depending on the content the "SME" bit might be set or not. That's incorrect but harmless. Making the CPUID readout conditional makes the observed hangs go away, but it does not fix the underlying problem: CPU0 CPU1 stop_other_cpus() send_IPIs(REBOOT); stop_this_cpu() while (num_online_cpus() > 1); set_online(false); proceed... -> hang wbinvd() WBINVD is an expensive operation and if multiple CPUs issue it at the same time the resulting delays are even larger. But CPU0 already observed num_online_cpus() going down to 1 and proceeds which causes the system to hang. This issue exists independent of WBINVD, but the delays caused by WBINVD make it more prominent. Make this more robust by adding a cpumask which is initialized to the online CPU mask before sending the IPIs and CPUs clear their bit in stop_this_cpu() after the WBINVD completed. Check for that cpumask to become empty in stop_other_cpus() instead of watching num_online_cpus(). The cpumask cannot plug all holes either, but it's better than a raw counter and allows to restrict the NMI fallback IPI to be sent only the CPUs which have not reported within the timeout window. Fixes: 08f253ec3767 ("x86/cpu: Clear SME feature flag when not in use") Reported-by: Tony Battersby <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Ashok Raj <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/all/[email protected] Link: https://lore.kernel.org/r/87h6r770bv.ffs@tglx
2023-06-20Merge tag 'ipsec-2023-06-20' of ↵David S. Miller8-8/+79
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec ipsec-2023-06-20
2023-06-20fs: Provide helpers for manipulating sb->s_readonly_remountJan Kara4-21/+64
Provide helpers to set and clear sb->s_readonly_remount including appropriate memory barriers. Also use this opportunity to document what the barriers pair with and why they are needed. Suggested-by: Dave Chinner <[email protected]> Signed-off-by: Jan Kara <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Message-Id: <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2023-06-20Merge branch 'dsa-mt7530-fixes'David S. Miller5-13/+78
Arınç ÜNAL says: ==================== net: dsa: mt7530: fix multiple CPU ports, BPDU and LLDP handling This patch series fixes all non-theoretical issues regarding multiple CPU ports and the handling of LLDP frames and BPDUs. I am adding me as a maintainer, I've got some code improvements on the way. I will keep an eye on this driver and the patches submitted for it in the future. Arınç v6: - Change a small portion of the comment in the diff on "net: dsa: mt7530: set all CPU ports in MT7531_CPU_PMAP" with Russell's suggestion. - Change the patch log of "net: dsa: mt7530: fix trapping frames on non-MT7621 SoC MT7530 switch" with Vladimir's suggestion. - Group the code for trapping frames into a common function and call that. - Add Vladimir and Russell's reviewed-by tags to where they're given. v5: - Change the comment in the diff on the first patch with Russell's words. - Change the patch log of the first patch to state that the patch is just preparatory work for change "net: dsa: introduce preferred_default_local_cpu_port and use on MT7530" and not a fix to an existing problem on the code base. - Remove the "net: dsa: mt7530: fix trapping frames with multiple CPU ports on MT7530" patch. It fixes a theoretical issue, therefore it is net-next material. - Remove unnecessary information from the patch logs. Remove the enum renaming change. - Strengthen the point of the "net: dsa: introduce preferred_default_local_cpu_port and use on MT7530" patch. v4: Make the patch logs and my comments in the code easier to understand. v3: Fix the from header on the patches. Write a cover letter. v2: Add patches to fix the handling of LLDP frames and BPDUs. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-06-20MAINTAINERS: add me as maintainer of MEDIATEK SWITCH DRIVERArınç ÜNAL1-2/+3
Add me as a maintainer of the MediaTek MT7530 DSA subdriver. List maintainers in alphabetical order by first name. Signed-off-by: Arınç ÜNAL <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Reviewed-by: Matthias Brugger <[email protected]> Reviewed-by: Matthias Brugger <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-20net: dsa: introduce preferred_default_local_cpu_port and use on MT7530Vladimir Oltean3-1/+46
Since the introduction of the OF bindings, DSA has always had a policy that in case multiple CPU ports are present in the device tree, the numerically smallest one is always chosen. The MT7530 switch family, except the switch on the MT7988 SoC, has 2 CPU ports, 5 and 6, where port 6 is preferable on the MT7531BE switch because it has higher bandwidth. The MT7530 driver developers had 3 options: - to modify DSA when the MT7531 switch support was introduced, such as to prefer the better port - to declare both CPU ports in device trees as CPU ports, and live with the sub-optimal performance resulting from not preferring the better port - to declare just port 6 in the device tree as a CPU port Of course they chose the path of least resistance (3rd option), kicking the can down the road. The hardware description in the device tree is supposed to be stable - developers are not supposed to adopt the strategy of piecemeal hardware description, where the device tree is updated in lockstep with the features that the kernel currently supports. Now, as a result of the fact that they did that, any attempts to modify the device tree and describe both CPU ports as CPU ports would make DSA change its default selection from port 6 to 5, effectively resulting in a performance degradation visible to users with the MT7531BE switch as can be seen below. Without preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 374 MBytes 157 Mbits/sec 734 sender [ 5][TX-C] 0.00-20.00 sec 373 MBytes 156 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 778 Mbits/sec 0 sender [ 7][RX-C] 0.00-20.00 sec 1.81 GBytes 777 Mbits/sec receiver With preferring port 6: [ ID][Role] Interval Transfer Bitrate Retr [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 856 Mbits/sec 273 sender [ 5][TX-C] 0.00-20.00 sec 1.99 GBytes 855 Mbits/sec receiver [ 7][RX-C] 0.00-20.00 sec 1.72 GBytes 737 Mbits/sec 15 sender [ 7][RX-C] 0.00-20.00 sec 1.71 GBytes 736 Mbits/sec receiver Using one port for WAN and the other ports for LAN is a very popular use case which is what this test emulates. As such, this change proposes that we retroactively modify stable kernels (which don't support the modification of the CPU port assignments, so as to let user space fix the problem and restore the throughput) to keep the mt7530 driver preferring port 6 even with device trees where the hardware is more fully described. Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Arınç ÜNAL <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-20net: dsa: mt7530: fix handling of LLDP framesArınç ÜNAL2-0/+9
LLDP frames are link-local frames, therefore they must be trapped to the CPU port. Currently, the MT753X switches treat LLDP frames as regular multicast frames, therefore flooding them to user ports. To fix this, set LLDP frames to be trapped to the CPU port(s). Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-20net: dsa: mt7530: fix handling of BPDUs on MT7530 switchArınç ÜNAL1-3/+11
BPDUs are link-local frames, therefore they must be trapped to the CPU port. Currently, the MT7530 switch treats BPDUs as regular multicast frames, therefore flooding them to user ports. To fix this, set BPDUs to be trapped to the CPU port. Group this on mt7530_setup() and mt7531_setup_common() into mt753x_trap_frames() and call that. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Signed-off-by: Arınç ÜNAL <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-20net: dsa: mt7530: fix trapping frames on non-MT7621 SoC MT7530 switchArınç ÜNAL1-1/+1
All MT7530 switch IP variants share the MT7530_MFC register, but the current driver only writes it for the switch variant that is integrated in the MT7621 SoC. Modify the code to include all MT7530 derivatives. Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch") Suggested-by: Vladimir Oltean <[email protected]> Signed-off-by: Arınç ÜNAL <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-20net: dsa: mt7530: set all CPU ports in MT7531_CPU_PMAPArınç ÜNAL2-7/+9
MT7531_CPU_PMAP represents the destination port mask for trapped-to-CPU frames (further restricted by PCR_MATRIX). Currently the driver sets the first CPU port as the single port in this bit mask, which works fine regardless of whether the device tree defines port 5, 6 or 5+6 as CPU ports. This is because the logic coincides with DSA's logic of picking the first CPU port as the CPU port that all user ports are affine to, by default. An upcoming change would like to influence DSA's selection of the default CPU port to no longer be the first one, and in that case, this logic needs adaptation. Since there is no observed leakage or duplication of frames if all CPU ports are defined in this bit mask, simply include them all. Suggested-by: Russell King (Oracle) <[email protected]> Suggested-by: Vladimir Oltean <[email protected]> Signed-off-by: Arınç ÜNAL <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-20net: dpaa2-mac: add 25gbase-r supportJosua Mayer1-1/+6
Layerscape MACs support 25Gbps network speed with dpmac "CAUI" mode. Add the mappings between DPMAC_ETH_IF_* and HY_INTERFACE_MODE_*, as well as the 25000 mac capability. Tested on SolidRun LX2162a Clearfog, serdes 1 protocol 18. Signed-off-by: Josua Mayer <[email protected]> Reviewed-by: Russell King (Oracle) <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-20Merge tag 'ieee802154-for-net-2023-06-19' of ↵David S. Miller4-6/+9
git://git.kernel.org/pub/scm/linux/kernel/git/wpan/wpan Stefan Schmidt says: ==================== An update from ieee802154 for your *net* tree: Two small fixes and MAINTAINERS update this time. Azeem Shaikh ensured consistent use of strscpy through the tree and fixed the usage in our trace.h. Chen Aotian fixed a potential memory leak in the hwsim simulator for ieee802154. Miquel Raynal updated the MAINATINERS file with the new team git tree locations and patchwork URLs. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-06-19Merge tag 'hyperv-fixes-signed-20230619' of ↵Linus Torvalds8-88/+129
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: - Fix races in Hyper-V PCI controller (Dexuan Cui) - Fix handling of hyperv_pcpu_input_arg (Michael Kelley) - Fix vmbus_wait_for_unload to scan present CPUs (Michael Kelley) - Call hv_synic_free in the failure path of hv_synic_alloc (Dexuan Cui) - Add noop for real mode handlers for virtual trust level code (Saurabh Sengar) * tag 'hyperv-fixes-signed-20230619' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: PCI: hv: Add a per-bus mutex state_lock Revert "PCI: hv: Fix a timing issue which causes kdump to fail occasionally" PCI: hv: Remove the useless hv_pcichild_state from struct hv_pci_dev PCI: hv: Fix a race condition in hv_irq_unmask() that can cause panic PCI: hv: Fix a race condition bug in hv_pci_query_relations() arm64/hyperv: Use CPUHP_AP_HYPERV_ONLINE state to fix CPU online sequencing x86/hyperv: Fix hyperv_pcpu_input_arg handling when CPUs go online/offline Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs Drivers: hv: vmbus: Call hv_synic_free() if hv_synic_alloc() fails x86/hyperv/vtl: Add noop for realmode pointers
2023-06-19selftests/mm: fix cross compilation with LLVMMark Brown1-5/+8
Currently the MM selftests attempt to work out the target architecture by using CROSS_COMPILE or otherwise querying the host machine, storing the target architecture in a variable called MACHINE rather than the usual ARCH though as far as I can tell (including for x86_64) the value is the same as we would use for architecture. When cross compiling with LLVM we don't need a CROSS_COMPILE as LLVM can support many target architectures in a single build so this logic does not work, CROSS_COMPILE is not set and we end up selecting tests for the host rather than target architecture. Fix this by using the more standard ARCH to describe the architecture, taking it from the environment if specified. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Albert Ou <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Tom Rix <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19mailmap: add entries for Ben DooksBen Dooks1-0/+2
I am going to be losing my sifive.com address soon and I also realised my old Simtec address (from >10 years ago) is also not been updates so update .mailmap for both. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ben Dooks <[email protected]> Signed-off-by: Ben Dooks <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19nilfs2: prevent general protection fault in nilfs_clear_dirty_page()Ryusuke Konishi1-1/+9
In a syzbot stress test that deliberately causes file system errors on nilfs2 with a corrupted disk image, it has been reported that nilfs_clear_dirty_page() called from nilfs_clear_dirty_pages() can cause a general protection fault. In nilfs_clear_dirty_pages(), when looking up dirty pages from the page cache and calling nilfs_clear_dirty_page() for each dirty page/folio retrieved, the back reference from the argument page to "mapping" may have been changed to NULL (and possibly others). It is necessary to check this after locking the page/folio. So, fix this issue by not calling nilfs_clear_dirty_page() on a page/folio after locking it in nilfs_clear_dirty_pages() if the back reference "mapping" from the page/folio is different from the "mapping" that held the page/folio just before. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ryusuke Konishi <[email protected]> Reported-by: [email protected] Closes: https://lkml.kernel.org/r/[email protected] Tested-by: Ryusuke Konishi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19Revert "mm: vmscan: make global slab shrink lockless"Qi Zheng1-12/+16
This reverts commit f95bdb700bc6bb74e1199b1f5f90c613e152cfa7. Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec test case [1], which is caused by commit f95bdb700bc6 ("mm: vmscan: make global slab shrink lockless"). The root cause is that SRCU has to be careful to not frequently check for SRCU read-side critical section exits. Therefore, even if no one is currently in the SRCU read-side critical section, synchronize_srcu() cannot return quickly. That's why unregister_shrinker() has become slower. After discussion, we will try to use the refcount+RCU method [2] proposed by Dave Chinner to continue to re-implement the lockless slab shrink. So revert the shrinker_srcu related changes first. [1]. https://lore.kernel.org/lkml/[email protected]/ [2]. https://lore.kernel.org/lkml/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Qi Zheng <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19Revert "mm: vmscan: make memcg slab shrink lockless"Qi Zheng1-26/+19
This reverts commit caa05325c9126c77ebf114edce51536a0d0a9a08. Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec test case [1], which is caused by commit f95bdb700bc6 ("mm: vmscan: make global slab shrink lockless"). The root cause is that SRCU has to be careful to not frequently check for SRCU read-side critical section exits. Therefore, even if no one is currently in the SRCU read-side critical section, synchronize_srcu() cannot return quickly. That's why unregister_shrinker() has become slower. After discussion, we will try to use the refcount+RCU method [2] proposed by Dave Chinner to continue to re-implement the lockless slab shrink. So revert the shrinker_srcu related changes first. [1]. https://lore.kernel.org/lkml/[email protected]/ [2]. https://lore.kernel.org/lkml/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Qi Zheng <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19Revert "mm: vmscan: add shrinker_srcu_generation"Qi Zheng1-20/+4
This reverts commit 475733dda5aedba9e086379aafe6b5ffd53e8f5e. Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec test case [1], which is caused by commit f95bdb700bc6 ("mm: vmscan: make global slab shrink lockless"). The root cause is that SRCU has to be careful to not frequently check for SRCU read-side critical section exits. Therefore, even if no one is currently in the SRCU read-side critical section, synchronize_srcu() cannot return quickly. That's why unregister_shrinker() has become slower. We will try to use the refcount+RCU method [2] proposed by Dave Chinner to continue to re-implement the lockless slab shrink. So revert the shrinker_srcu related changes first. [1]. https://lore.kernel.org/lkml/[email protected]/ [2]. https://lore.kernel.org/lkml/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Qi Zheng <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19Revert "mm: shrinkers: make count and scan in shrinker debugfs lockless"Qi Zheng1-8/+17
This reverts commit 20cd1892fcc3efc10a7ac327cc3790494bec46b5. Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec test case [1], which is caused by commit f95bdb700bc6 ("mm: vmscan: make global slab shrink lockless"). The root cause is that SRCU has to be careful to not frequently check for SRCU read-side critical section exits. Therefore, even if no one is currently in the SRCU read-side critical section, synchronize_srcu() cannot return quickly. That's why unregister_shrinker() has become slower. We will try to use the refcount+RCU method [2] proposed by Dave Chinner to continue to re-implement the lockless slab shrink. So revert the shrinker_srcu related changes first. [1]. https://lore.kernel.org/lkml/[email protected]/ [2]. https://lore.kernel.org/lkml/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Qi Zheng <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19Revert "mm: vmscan: hold write lock to reparent shrinker nr_deferred"Qi Zheng1-2/+2
This reverts commit b3cabea3c9153fd42fe5cb851ac58b51ea2b32b8. Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec test case [1], which is caused by commit f95bdb700bc6 ("mm: vmscan: make global slab shrink lockless"). The root cause is that SRCU has to be careful to not frequently check for SRCU read-side critical section exits. Therefore, even if no one is currently in the SRCU read-side critical section, synchronize_srcu() cannot return quickly. That's why unregister_shrinker() has become slower. We will try to use the refcount+RCU method [2] proposed by Dave Chinner to continue to re-implement the lockless slab shrink. Because there will be other readers after reverting the shrinker_srcu related changes, so it is better to restore to hold read lock to reparent shrinker nr_deferred. [1]. https://lore.kernel.org/lkml/[email protected]/ [2]. https://lore.kernel.org/lkml/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Qi Zheng <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19Revert "mm: vmscan: remove shrinker_rwsem from synchronize_shrinkers()"Qi Zheng1-2/+6
This reverts commit 1643db98d9b314e0a592d152603094fbf7ab906e. Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec test case [1], which is caused by commit f95bdb700bc6 ("mm: vmscan: make global slab shrink lockless"). The root cause is that SRCU has to be careful to not frequently check for SRCU read-side critical section exits. Therefore, even if no one is currently in the SRCU read-side critical section, synchronize_srcu() cannot return quickly. That's why unregister_shrinker() has become slower. We will try to use the refcount+RCU method [2] proposed by Dave Chinner to continue to re-implement the lockless slab shrink. So we still need shrinker_rwsem in synchronize_shrinkers() after reverting the shrinker_srcu related changes. [1]. https://lore.kernel.org/lkml/[email protected]/ [2]. https://lore.kernel.org/lkml/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Qi Zheng <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19Revert "mm: shrinkers: convert shrinker_rwsem to mutex"Qi Zheng5-27/+27
Patch series "revert shrinker_srcu related changes". This patch (of 7): This reverts commit cf2e309ebca7bb0916771839f9b580b06c778530. Kernel test robot reports -88.8% regression in stress-ng.ramfs.ops_per_sec test case [1], which is caused by commit f95bdb700bc6 ("mm: vmscan: make global slab shrink lockless"). The root cause is that SRCU has to be careful to not frequently check for SRCU read-side critical section exits. Therefore, even if no one is currently in the SRCU read-side critical section, synchronize_srcu() cannot return quickly. That's why unregister_shrinker() has become slower. After discussion, we will try to use the refcount+RCU method [2] proposed by Dave Chinner to continue to re-implement the lockless slab shrink. So revert the shrinker_mutex back to shrinker_rwsem first. [1]. https://lore.kernel.org/lkml/[email protected]/ [2]. https://lore.kernel.org/lkml/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Qi Zheng <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Muchun Song <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Yujie Liu <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19nilfs2: fix buffer corruption due to concurrent device readsRyusuke Konishi3-1/+35
As a result of analysis of a syzbot report, it turned out that in three cases where nilfs2 allocates block device buffers directly via sb_getblk, concurrent reads to the device can corrupt the allocated buffers. Nilfs2 uses sb_getblk for segment summary blocks, that make up a log header, and the super root block, that is the trailer, and when moving and writing the second super block after fs resize. In any of these, since the uptodate flag is not set when storing metadata to be written in the allocated buffers, the stored metadata will be overwritten if a device read of the same block occurs concurrently before the write. This causes metadata corruption and misbehavior in the log write itself, causing warnings in nilfs_btree_assign() as reported. Fix these issues by setting an uptodate flag on the buffer head on the first or before modifying each buffer obtained with sb_getblk, and clearing the flag on failure. When setting the uptodate flag, the lock_buffer/unlock_buffer pair is used to perform necessary exclusive control, and the buffer is filled to ensure that uninitialized bytes are not mixed into the data read from others. As for buffers for segment summary blocks, they are filled incrementally, so if the uptodate flag was unset on their allocation, set the flag and zero fill the buffer once at that point. Also, regarding the superblock move routine, the starting point of the memset call to zerofill the block is incorrectly specified, which can cause a buffer overflow on file systems with block sizes greater than 4KiB. In addition, if the superblock is moved within a large block, it is necessary to assume the possibility that the data in the superblock will be destroyed by zero-filling before copying. So fix these potential issues as well. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ryusuke Konishi <[email protected]> Reported-by: [email protected] Closes: https://lkml.kernel.org/r/[email protected] Tested-by: Ryusuke Konishi <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19scripts/gdb: fix SB_* constants parsingFlorian Fainelli1-6/+6
--0000000000009a0c9905fd9173ad Content-Transfer-Encoding: 8bit After f15afbd34d8f ("fs: fix undefined behavior in bit shift for SB_NOUSER") the constants were changed from plain integers which LX_VALUE() can parse to constants using the BIT() macro which causes the following: Reading symbols from build/linux-custom/vmlinux...done. Traceback (most recent call last): File "/home/fainelli/work/buildroot/output/arm64/build/linux-custom/vmlinux-gdb.py", line 25, in <module> import linux.constants File "/home/fainelli/work/buildroot/output/arm64/build/linux-custom/scripts/gdb/linux/constants.py", line 5 LX_SB_RDONLY = ((((1UL))) << (0)) Use LX_GDBPARSED() which does not suffer from that issue. f15afbd34d8f ("fs: fix undefined behavior in bit shift for SB_NOUSER") Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Florian Fainelli <[email protected]> Acked-by: Christian Brauner <[email protected]> Cc: Hao Ge <[email protected]> Cc: Jan Kiszka <[email protected]> Cc: Kieran Bingham <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: Pankaj Raghav <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19scripts: fix the gfp flags header path in gfp-translatePrathu Baronia1-3/+3
Since gfp flags have been shifted to gfp_types.h so update the path in the gfp-translate script. Link: https://lkml.kernel.org/r/[email protected] Fixes: cb5a065b4ea9c ("headers/deps: mm: Split <linux/gfp_types.h> out of <linux/gfp.h>") Signed-off-by: Prathu Baronia <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Nicolas Schier <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Yury Norov <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19udmabuf: revert 'Add support for mapping hugepages (v4)'Mike Kravetz1-41/+6
This effectively reverts commit 16c243e99d33 ("udmabuf: Add support for mapping hugepages (v4)"). Recently, Junxiao Chang found a BUG with page map counting as described here [1]. This issue pointed out that the udmabuf driver was making direct use of subpages of hugetlb pages. This is not a good idea, and no other mm code attempts such use. In addition to the mapcount issue, this also causes issues with hugetlb vmemmap optimization and page poisoning. For now, remove hugetlb support. If udmabuf wants to be used on hugetlb mappings, it should be changed to only use complete hugetlb pages. This will require different alignment and size requirements on the UDMABUF_CREATE API. [1] https://lore.kernel.org/linux-mm/[email protected]/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 16c243e99d33 ("udmabuf: Add support for mapping hugepages (v4)") Signed-off-by: Mike Kravetz <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Vivek Kasireddy <[email protected]> Acked-by: Gerd Hoffmann <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Dongwon Kim <[email protected]> Cc: James Houghton <[email protected]> Cc: Jerome Marchand <[email protected]> Cc: Junxiao Chang <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Muchun Song <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19mm/khugepaged: fix iteration in collapse_fileDavid Stevens1-1/+0
Remove an unnecessary call to xas_set(index) when iterating over the target range in collapse_file. The extra call to xas_set reset the xas cursor to the top of the tree, causing the xas_next call on the next iteration to walk the tree to index instead of advancing to index+1. This returned the same page again, which would cause collapse_file to fail because the page is already locked. This bug was hidden when CONFIG_DEBUG_VM was set. When that config was used, the xas_load in a subsequent VM_BUG_ON assert would walk xas from the top of the tree to index, causing the xas_next call on the next loop iteration to advance the cursor as expected. Link: https://lkml.kernel.org/r/[email protected] Fixes: a2e17cc2efc7 ("mm/khugepaged: maintain page cache uptodate flag") Signed-off-by: David Stevens <[email protected]> Reviewed-by: Peter Xu <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Jiaqi Yan <[email protected]> Cc: Kirill A . Shutemov <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Yang Shi <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19memfd: check for non-NULL file_seals in memfd_create() syscallRoberto Sassu1-3/+6
Ensure that file_seals is non-NULL before using it in the memfd_create() syscall. One situation in which memfd_file_seals_ptr() could return a NULL pointer when CONFIG_SHMEM=n, oopsing the kernel. Link: https://lkml.kernel.org/r/[email protected] Fixes: 47b9012ecdc7 ("shmem: add sealing support to hugetlb-backed memfd") Signed-off-by: Roberto Sassu <[email protected]> Cc: Marc-Andr Lureau <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19mm/vmalloc: do not output a spurious warning when huge vmalloc() failsLorenzo Stoakes1-4/+13
In __vmalloc_area_node() we always warn_alloc() when an allocation performed by vm_area_alloc_pages() fails unless it was due to a pending fatal signal. However, huge page allocations instigated either by vmalloc_huge() or __vmalloc_node_range() (or a caller that invokes this like kvmalloc() or kvmalloc_node()) always falls back to order-0 allocations if the huge page allocation fails. This renders the warning useless and noisy, especially as all callers appear to be aware that this may fallback. This has already resulted in at least one bug report from a user who was confused by this (see link). Therefore, simply update the code to only output this warning for order-0 pages when no fatal signal is pending. Link: https://bugzilla.suse.com/show_bug.cgi?id=1211410 Link: https://lkml.kernel.org/r/[email protected] Fixes: 80b1d8fdfad1 ("mm: vmalloc: correct use of __GFP_NOWARN mask in __vmalloc_area_node()") Signed-off-by: Lorenzo Stoakes <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Baoquan He <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Uladzislau Rezki (Sony) <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19mm/mprotect: fix do_mprotect_pkey() limit checkLiam R. Howlett1-1/+1
The return of do_mprotect_pkey() can still be incorrectly returned as success if there is a gap that spans to or beyond the end address passed in. Update the check to ensure that the end address has indeed been seen. Link: https://lore.kernel.org/all/CABi2SkXjN+5iFoBhxk71t3cmunTk-s=rB4T7qo0UQRh17s49PQ@mail.gmail.com/ Link: https://lkml.kernel.org/r/[email protected] Fixes: 82f951340f25 ("mm/mprotect: fix do_mprotect_pkey() return on error") Signed-off-by: Liam R. Howlett <[email protected]> Reported-by: Jeff Xu <[email protected]> Reviewed-by: Lorenzo Stoakes <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19writeback: fix dereferencing NULL mapping->host on writeback_page_templateRafael Aquini1-1/+1
When commit 19343b5bdd16 ("mm/page-writeback: introduce tracepoint for wait_on_page_writeback()") repurposed the writeback_dirty_page trace event as a template to create its new wait_on_page_writeback trace event, it ended up opening a window to NULL pointer dereference crashes due to the (infrequent) occurrence of a race where an access to a page in the swap-cache happens concurrently with the moment this page is being written to disk and the tracepoint is enabled: BUG: kernel NULL pointer dereference, address: 0000000000000040 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 800000010ec0a067 P4D 800000010ec0a067 PUD 102353067 PMD 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 PID: 1320 Comm: shmem-worker Kdump: loaded Not tainted 6.4.0-rc5+ #13 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS edk2-20230301gitf80f052277c8-1.fc37 03/01/2023 RIP: 0010:trace_event_raw_event_writeback_folio_template+0x76/0xf0 Code: 4d 85 e4 74 5c 49 8b 3c 24 e8 06 98 ee ff 48 89 c7 e8 9e 8b ee ff ba 20 00 00 00 48 89 ef 48 89 c6 e8 fe d4 1a 00 49 8b 04 24 <48> 8b 40 40 48 89 43 28 49 8b 45 20 48 89 e7 48 89 43 30 e8 a2 4d RSP: 0000:ffffaad580b6fb60 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff90e38035c01c RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff90e38035c044 RBP: ffff90e38035c024 R08: 0000000000000002 R09: 0000000000000006 R10: ffff90e38035c02e R11: 0000000000000020 R12: ffff90e380bac000 R13: ffffe3a7456d9200 R14: 0000000000001b81 R15: ffffe3a7456d9200 FS: 00007f2e4e8a15c0(0000) GS:ffff90e3fbc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000040 CR3: 00000001150c6003 CR4: 0000000000170ee0 Call Trace: <TASK> ? __die+0x20/0x70 ? page_fault_oops+0x76/0x170 ? kernelmode_fixup_or_oops+0x84/0x110 ? exc_page_fault+0x65/0x150 ? asm_exc_page_fault+0x22/0x30 ? trace_event_raw_event_writeback_folio_template+0x76/0xf0 folio_wait_writeback+0x6b/0x80 shmem_swapin_folio+0x24a/0x500 ? filemap_get_entry+0xe3/0x140 shmem_get_folio_gfp+0x36e/0x7c0 ? find_busiest_group+0x43/0x1a0 shmem_fault+0x76/0x2a0 ? __update_load_avg_cfs_rq+0x281/0x2f0 __do_fault+0x33/0x130 do_read_fault+0x118/0x160 do_pte_missing+0x1ed/0x2a0 __handle_mm_fault+0x566/0x630 handle_mm_fault+0x91/0x210 do_user_addr_fault+0x22c/0x740 exc_page_fault+0x65/0x150 asm_exc_page_fault+0x22/0x30 This problem arises from the fact that the repurposed writeback_dirty_page trace event code was written assuming that every pointer to mapping (struct address_space) would come from a file-mapped page-cache object, thus mapping->host would always be populated, and that was a valid case before commit 19343b5bdd16. The swap-cache address space (swapper_spaces), however, doesn't populate its ->host (struct inode) pointer, thus leading to the crashes in the corner-case aforementioned. commit 19343b5bdd16 ended up breaking the assignment of __entry->name and __entry->ino for the wait_on_page_writeback tracepoint -- both dependent on mapping->host carrying a pointer to a valid inode. The assignment of __entry->name was fixed by commit 68f23b89067f ("memcg: fix a crash in wb_workfn when a device disappears"), and this commit fixes the remaining case, for __entry->ino. Link: https://lkml.kernel.org/r/[email protected] Fixes: 19343b5bdd16 ("mm/page-writeback: introduce tracepoint for wait_on_page_writeback()") Signed-off-by: Rafael Aquini <[email protected]> Reviewed-by: Yafang Shao <[email protected]> Cc: Aristeu Rozanski <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-06-19x86/apic: Fix kernel panic when booting with intremap=off and x2apic_physDheeraj Kumar Srivastava1-1/+4
When booting with "intremap=off" and "x2apic_phys" on the kernel command line, the physical x2APIC driver ends up being used even when x2APIC mode is disabled ("intremap=off" disables x2APIC mode). This happens because the first compound condition check in x2apic_phys_probe() is false due to x2apic_mode == 0 and so the following one returns true after default_acpi_madt_oem_check() having already selected the physical x2APIC driver. This results in the following panic: kernel BUG at arch/x86/kernel/apic/io_apic.c:2409! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.4.0-rc2-ver4.1rc2 #2 Hardware name: Dell Inc. PowerEdge R6515/07PXPY, BIOS 2.3.6 07/06/2021 RIP: 0010:setup_IO_APIC+0x9c/0xaf0 Call Trace: <TASK> ? native_read_msr apic_intr_mode_init x86_late_time_init start_kernel x86_64_start_reservations x86_64_start_kernel secondary_startup_64_no_verify </TASK> which is: setup_IO_APIC: apic_printk(APIC_VERBOSE, "ENABLING IO-APIC IRQs\n"); for_each_ioapic(ioapic) BUG_ON(mp_irqdomain_create(ioapic)); Return 0 to denote that x2APIC has not been enabled when probing the physical x2APIC driver. [ bp: Massage commit message heavily. ] Fixes: 9ebd680bd029 ("x86, apic: Use probe routines to simplify apic selection") Signed-off-by: Dheeraj Kumar Srivastava <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Kishon Vijay Abraham I <[email protected]> Reviewed-by: Vasant Hegde <[email protected]> Reviewed-by: Cyrill Gorcunov <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-06-19btrfs: fix race between quota disable and relocationFilipe Manana1-3/+15
If we disable quotas while we have a relocation of a metadata block group that has extents belonging to the quota root, we can cause the relocation to fail with -ENOENT. This is because relocation builds backref nodes for extents of the quota root and later needs to walk the backrefs and access the quota root - however if in between a task disables quotas, it results in deleting the quota root from the root tree (with btrfs_del_root(), called from btrfs_quota_disable(). This can be sporadically triggered by test case btrfs/255 from fstests: $ ./check btrfs/255 FSTYP -- btrfs PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023 MKFS_OPTIONS -- /dev/sdc MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1 btrfs/255 6s ... _check_dmesg: something found in dmesg (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.dmesg) - output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad) --- tests/btrfs/255.out 2023-03-02 21:47:53.876609426 +0000 +++ /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad 2023-06-16 10:20:39.267563212 +0100 @@ -1,2 +1,4 @@ QA output created by 255 +ERROR: error during balancing '/home/fdmanana/btrfs-tests/scratch_1': No such file or directory +There may be more info in syslog - try dmesg | tail Silence is golden ... (Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/255.out /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad' to see the entire diff) Ran: btrfs/255 Failures: btrfs/255 Failed 1 of 1 tests To fix this make the quota disable operation take the cleaner mutex, as relocation of a block group also takes this mutex. This is also what we do when deleting a subvolume/snapshot, we take the cleaner mutex in the cleaner kthread (at cleaner_kthread()) and then we call btrfs_del_root() at btrfs_drop_snapshot() while under the protection of the cleaner mutex. Fixes: bed92eae26cc ("Btrfs: qgroup implementation and prototypes") CC: [email protected] # 5.4+ Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>