aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-02-20mm/zswap: invalidate duplicate entry when !zswap_enabledChengming Zhou1-1/+5
We have to invalidate any duplicate entry even when !zswap_enabled since zswap can be disabled anytime. If the folio store success before, then got dirtied again but zswap disabled, we won't invalidate the old duplicate entry in the zswap_store(). So later lru writeback may overwrite the new data in swapfile. Link: https://lkml.kernel.org/r/[email protected] Fixes: 42c06a0e8ebe ("mm: kill frontswap") Signed-off-by: Chengming Zhou <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Nhat Pham <[email protected]> Cc: Yosry Ahmed <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-20lib/Kconfig.debug: TEST_IOV_ITER depends on MMUGuenter Roeck1-0/+1
Trying to run the iov_iter unit test on a nommu system such as the qemu kc705-nommu emulation results in a crash. KTAP version 1 # Subtest: iov_iter # module: kunit_iov_iter 1..9 BUG: failure at mm/nommu.c:318/vmap()! Kernel panic - not syncing: BUG! The test calls vmap() directly, but vmap() is not supported on nommu systems, causing the crash. TEST_IOV_ITER therefore needs to depend on MMU. Link: https://lkml.kernel.org/r/[email protected] Fixes: 2d71340ff1d4 ("iov_iter: Kunit tests for copying to/from an iterator") Signed-off-by: Guenter Roeck <[email protected]> Cc: David Howells <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-20mm/swap: fix race when skipping swapcacheKairui Song4-0/+43
When skipping swapcache for SWP_SYNCHRONOUS_IO, if two or more threads swapin the same entry at the same time, they get different pages (A, B). Before one thread (T0) finishes the swapin and installs page (A) to the PTE, another thread (T1) could finish swapin of page (B), swap_free the entry, then swap out the possibly modified page reusing the same entry. It breaks the pte_same check in (T0) because PTE value is unchanged, causing ABA problem. Thread (T0) will install a stalled page (A) into the PTE and cause data corruption. One possible callstack is like this: CPU0 CPU1 ---- ---- do_swap_page() do_swap_page() with same entry <direct swapin path> <direct swapin path> <alloc page A> <alloc page B> swap_read_folio() <- read to page A swap_read_folio() <- read to page B <slow on later locks or interrupt> <finished swapin first> ... set_pte_at() swap_free() <- entry is free <write to page B, now page A stalled> <swap out page B to same swap entry> pte_same() <- Check pass, PTE seems unchanged, but page A is stalled! swap_free() <- page B content lost! set_pte_at() <- staled page A installed! And besides, for ZRAM, swap_free() allows the swap device to discard the entry content, so even if page (B) is not modified, if swap_read_folio() on CPU0 happens later than swap_free() on CPU1, it may also cause data loss. To fix this, reuse swapcache_prepare which will pin the swap entry using the cache flag, and allow only one thread to swap it in, also prevent any parallel code from putting the entry in the cache. Release the pin after PT unlocked. Racers just loop and wait since it's a rare and very short event. A schedule_timeout_uninterruptible(1) call is added to avoid repeated page faults wasting too much CPU, causing livelock or adding too much noise to perf statistics. A similar livelock issue was described in commit 029c4628b2eb ("mm: swap: get rid of livelock in swapin readahead") Reproducer: This race issue can be triggered easily using a well constructed reproducer and patched brd (with a delay in read path) [1]: With latest 6.8 mainline, race caused data loss can be observed easily: $ gcc -g -lpthread test-thread-swap-race.c && ./a.out Polulating 32MB of memory region... Keep swapping out... Starting round 0... Spawning 65536 workers... 32746 workers spawned, wait for done... Round 0: Error on 0x5aa00, expected 32746, got 32743, 3 data loss! Round 0: Error on 0x395200, expected 32746, got 32743, 3 data loss! Round 0: Error on 0x3fd000, expected 32746, got 32737, 9 data loss! Round 0 Failed, 15 data loss! This reproducer spawns multiple threads sharing the same memory region using a small swap device. Every two threads updates mapped pages one by one in opposite direction trying to create a race, with one dedicated thread keep swapping out the data out using madvise. The reproducer created a reproduce rate of about once every 5 minutes, so the race should be totally possible in production. After this patch, I ran the reproducer for over a few hundred rounds and no data loss observed. Performance overhead is minimal, microbenchmark swapin 10G from 32G zram: Before: 10934698 us After: 11157121 us Cached: 13155355 us (Dropping SWP_SYNCHRONOUS_IO flag) [[email protected]: v4] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: 0bcac06f27d7 ("mm, swap: skip swapcache for swapin of synchronous device") Reported-by: "Huang, Ying" <[email protected]> Closes: https://lore.kernel.org/lkml/[email protected]/ Link: https://github.com/ryncsn/emm-test-project/tree/master/swap-stress-race [1] Signed-off-by: Kairui Song <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Acked-by: Yu Zhao <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Chris Li <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Yosry Ahmed <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Barry Song <[email protected]> Cc: SeongJae Park <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-20mm/swap_state: update zswap LRU's protection range with the folio lockedNhat Pham2-8/+9
When a folio is swapped in, the protection size of the corresponding zswap LRU is incremented, so that the zswap shrinker is more conservative with its reclaiming action. This field is embedded within the struct lruvec, so updating it requires looking up the folio's memcg and lruvec. However, currently this lookup can happen after the folio is unlocked, for instance if a new folio is allocated, and swap_read_folio() unlocks the folio before returning. In this scenario, there is no stability guarantee for the binding between a folio and its memcg and lruvec: * A folio's memcg and lruvec can be freed between the lookup and the update, leading to a UAF. * Folio migration can clear the now-unlocked folio's memcg_data, which directs the zswap LRU protection size update towards the root memcg instead of the original memcg. This was recently picked up by the syzbot thanks to a warning in the inlined folio_lruvec() call. Move the zswap LRU protection range update above the swap_read_folio() call, and only when a new page is allocated, to prevent this. [[email protected]: add VM_WARN_ON_ONCE() to zswap_folio_swapin()] Link: https://lkml.kernel.org/r/[email protected] [[email protected]: remove unneeded if (folio) checks] Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected] Fixes: b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure") Reported-by: [email protected] Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Nhat Pham <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Yosry Ahmed <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-20selftests/mm: uffd-unit-test check if huge page size is 0Terry Tritton1-0/+6
If HUGETLBFS is not enabled then the default_huge_page_size function will return 0 and cause a divide by 0 error. Add a check to see if the huge page size is 0 and skip the hugetlb tests if it is. Link: https://lkml.kernel.org/r/[email protected] Fixes: 16a45b57cbf2 ("selftests/mm: add framework for uffd-unit-test") Signed-off-by: Terry Tritton <[email protected]> Cc: Peter Griffin <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Peter Xu <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-20mm/damon/core: check apply interval in damon_do_apply_schemes()SeongJae Park1-4/+11
kdamond_apply_schemes() checks apply intervals of schemes and avoid further applying any schemes if no scheme passed its apply interval. However, the following schemes applying function, damon_do_apply_schemes() iterates all schemes without the apply interval check. As a result, the shortest apply interval is applied to all schemes. Fix the problem by checking the apply interval in damon_do_apply_schemes(). Link: https://lkml.kernel.org/r/[email protected] Fixes: 42f994b71404 ("mm/damon/core: implement scheme-specific apply interval") Signed-off-by: SeongJae Park <[email protected]> Cc: <[email protected]> [6.7.x] Signed-off-by: Andrew Morton <[email protected]>
2024-02-20mm: zswap: fix missing folio cleanup in writeback race pathYosry Ahmed1-0/+2
In zswap_writeback_entry(), after we get a folio from __read_swap_cache_async(), we grab the tree lock again to check that the swap entry was not invalidated and recycled. If it was, we delete the folio we just added to the swap cache and exit. However, __read_swap_cache_async() returns the folio locked when it is newly allocated, which is always true for this path, and the folio is ref'd. Make sure to unlock and put the folio before returning. This was discovered by code inspection, probably because this path handles a race condition that should not happen often, and the bug would not crash the system, it will only strand the folio indefinitely. Link: https://lkml.kernel.org/r/[email protected] Fixes: 04fc7816089c ("mm: fix zswap writeback race condition") Signed-off-by: Yosry Ahmed <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Acked-by: Johannes Weiner <[email protected]> Reviewed-by: Nhat Pham <[email protected]> Cc: Domenico Cerasuolo <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2024-02-20docs: Instruct LaTeX to cope with deeper nestingJonathan Corbet1-0/+6
The addition of the XFS online fsck documentation starting with commit a8f6c2e54ddc ("xfs: document the motivation for online fsck design") added a deeper level of nesting than LaTeX is prepared to deal with. That caused a pdfdocs build failure with the helpful "Too deeply nested" error message buried deeply in Documentation/output/filesystems.log. Increase the "maxlistdepth" parameter to instruct LaTeX that it needs to deal with the deeper nesting whether it wants to or not. Suggested-by: Akira Yokosawa <[email protected]> Tested-by: Akira Yokosawa <[email protected]> Cc: [email protected] # v6.4+ Link: https://lore.kernel.org/linux-doc/[email protected]/ Signed-off-by: Jonathan Corbet <[email protected]>
2024-02-20arm64: dts: qcom: Fix interrupt-map cell sizesRob Herring2-12/+12
The PCI node interrupt-map properties have the wrong size as #address-cells in the interrupt parent are not accounted for. The dtc interrupt_map check catches this, but the warning is off because its dependency, interrupt_provider, is off by default. Signed-off-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2024-02-20arm: dts: Fix dtc interrupt_map warningsRob Herring3-4/+8
The dtc interrupt_map warning is off because its dependency, interrupt_provider, is off by default. Fix all the warnings so it can be enabled. Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2024-02-20arm64: dts: Fix dtc interrupt_provider warningsRob Herring9-5/+7
The dtc interrupt_provider warning is off by default. Fix all the warnings so it can be enabled. Signed-off-by: Rob Herring <[email protected]> Reviewed-By: AngeloGioacchino Del Regno <[email protected]> # Reviewed-by: Geert Uytterhoeven <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Acked-by: Florian Fainelli <[email protected]> #Broadcom Acked-by: Chanho Min <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2024-02-20arm: dts: Fix dtc interrupt_provider warningsRob Herring24-58/+18
The dtc interrupt_provider warning is off by default. Fix all the warnings so it can be enabled. Signed-off-by: Rob Herring <[email protected]> Reviewed-by: Andrew Jeffery <[email protected]> Reviewed-by: Alexandre Torgue <[email protected]> Acked-by: Florian Fainelli <[email protected]> #Broadcom Acked-by: Thierry Reding <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2024-02-20arm64: dts: freescale: Disable interrupt_map checkRob Herring1-0/+19
Several Freescale Layerscape platforms extirq binding use a malformed interrupt-map property missing parent address cells. These are documented in of_irq_imap_abusers list in drivers/of/irq.c. In order to enable dtc interrupt_map check tree wide, we need to disable it for these platforms which will not be fixed (as that would break compatibility). Signed-off-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2024-02-20Merge tag 'v6.8-rockchip-dtsfixes1' of ↵Arnd Bergmann10-26/+21
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into arm/fixes Some fixes to make devicetrees conform to bindings better (pwm irqs), dt styling fixes (unneeded jaguar status, whitespaces, Cool Pi regulator naming) and functionality fixes (px30 spi chipselect number, allowing rk3588-evb1 to turn off, pcie lane numbers on CoolPi, wrong gpio-names on Indidroid Nova and some CoolPi sdmmc aliases to match what uboot uses). * tag 'v6.8-rockchip-dtsfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: arm64: dts: rockchip: Correct Indiedroid Nova GPIO Names arm64: dts: rockchip: Drop interrupts property from rk3328 pwm-rockchip node arm64: dts: rockchip: set num-cs property for spi on px30 arm64: dts: rockchip: minor rk3588 whitespace cleanup arm64: dts: rockchip: drop unneeded status from rk3588-jaguar gpio-leds ARM: dts: rockchip: Drop interrupts property from pwm-rockchip nodes arm64: dts: rockchip: Fix the num-lanes of pcie3x4 on Cool Pi CM5 EVB arm64: dts: rockchip: rename vcc5v0_usb30_host regulator for Cool Pi CM5 EVB arm64: dts: rockchip: aliase sdmmc as mmc1 for Cool Pi CM5 EVB arm64: dts: rockchip: aliase sdmmc as mmc1 for Cool Pi 4B arm64: dts: rockchip: mark system power controller on rk3588-evb1 Link: https://lore.kernel.org/r/2450634.jE0xQCEvom@phil Signed-off-by: Arnd Bergmann <[email protected]>
2024-02-20drm/tests/drm_buddy: fix build failure on 32-bit targetsLinus Torvalds1-3/+2
Guenter Roeck reports that commit a64056bb5a32 ("drm/tests/drm_buddy: add alloc_contiguous test") causes build failures on 32-bit targets: "This patch breaks the build on all 32-bit systems since it introduces an unhandled direct 64-bit divide operation. ERROR: modpost: "__umoddi3" [drivers/gpu/drm/tests/drm_buddy_test.ko] undefined! ERROR: modpost: "__moddi3" [drivers/gpu/drm/tests/drm_buddy_test.ko] undefined!" and the uses of 'u64' are all entirely pointless. Yes, the arguments to drm_buddy_init() and drm_buddy_alloc_blocks() are in fact of type 'u64', but none of the values here are remotely relevant, and the compiler will happily just do the type expansion. Of course, in a perfect world the compiler would also have just noticed that all the values in question are tiny, and range analysis would have shown that doing a 64-bit divide is pointless, but that is admittedly expecting a fair amount of the compiler. IOW, we shouldn't write code that the compiler then has to notice is unnecessarily complicated just to avoid extra work. We do have fairly high expectations of compilers, but kernel code should be reasonable to begin with. It turns out that there are also other issues with this code: the KUnit assertion messages have incorrect types in the format strings, but that's a widely spread issue caused by the KUnit infrastructure not having enabled format string verification. We'll get that sorted out separately. Reported-by: Guenter Roeck <[email protected]> Fixes: a64056bb5a32 ("drm/tests/drm_buddy: add alloc_contiguous test") Link: https://lore.kernel.org/all/[email protected]/ Cc: Matthew Auld <[email protected]> Cc: Arunpravin Paneer Selvam <[email protected]> Cc: Christian König <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2024-02-20dm-crypt, dm-integrity, dm-verity: bump target versionMike Snitzer3-3/+3
Signed-off-by: Mike Snitzer <[email protected]>
2024-02-20dm-verity, dm-crypt: align "struct bvec_iter" correctlyMikulas Patocka2-4/+4
"struct bvec_iter" is defined with the __packed attribute, so it is aligned on a single byte. On X86 (and on other architectures that support unaligned addresses in hardware), "struct bvec_iter" is accessed using the 8-byte and 4-byte memory instructions, however these instructions are less efficient if they operate on unaligned addresses. (on RISC machines that don't have unaligned access in hardware, GCC generates byte-by-byte accesses that are very inefficient - see [1]) This commit reorders the entries in "struct dm_verity_io" and "struct convert_context", so that "struct bvec_iter" is aligned on 8 bytes. [1] https://lore.kernel.org/all/ZcLuWUNRZadJr0tQ@fedora/T/ Signed-off-by: Mikulas Patocka <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2024-02-20dm-crypt: recheck the integrity tag after a failureMikulas Patocka1-16/+73
If a userspace process reads (with O_DIRECT) multiple blocks into the same buffer, dm-crypt reports an authentication error [1]. The error is reported in a log and it may cause RAID leg being kicked out of the array. This commit fixes dm-crypt, so that if integrity verification fails, the data is read again into a kernel buffer (where userspace can't modify it) and the integrity tag is rechecked. If the recheck succeeds, the content of the kernel buffer is copied into the user buffer; if the recheck fails, an integrity error is reported. [1] https://people.redhat.com/~mpatocka/testcases/blk-auth-modify/read2.c Signed-off-by: Mikulas Patocka <[email protected]> Cc: [email protected] Signed-off-by: Mike Snitzer <[email protected]>
2024-02-20dm-crypt: don't modify the data when using authenticated encryptionMikulas Patocka1-0/+6
It was said that authenticated encryption could produce invalid tag when the data that is being encrypted is modified [1]. So, fix this problem by copying the data into the clone bio first and then encrypt them inside the clone bio. This may reduce performance, but it is needed to prevent the user from corrupting the device by writing data with O_DIRECT and modifying them at the same time. [1] https://lore.kernel.org/all/[email protected]/T/ Signed-off-by: Mikulas Patocka <[email protected]> Cc: [email protected] Signed-off-by: Mike Snitzer <[email protected]>
2024-02-20dm-verity: recheck the hash after a failureMikulas Patocka2-6/+86
If a userspace process reads (with O_DIRECT) multiple blocks into the same buffer, dm-verity reports an error [1]. This commit fixes dm-verity, so that if hash verification fails, the data is read again into a kernel buffer (where userspace can't modify it) and the hash is rechecked. If the recheck succeeds, the content of the kernel buffer is copied into the user buffer; if the recheck fails, an error is reported. [1] https://people.redhat.com/~mpatocka/testcases/blk-auth-modify/read2.c Signed-off-by: Mikulas Patocka <[email protected]> Cc: [email protected] Signed-off-by: Mike Snitzer <[email protected]>
2024-02-20dm-integrity: recheck the integrity tag after a failureMikulas Patocka1-9/+84
If a userspace process reads (with O_DIRECT) multiple blocks into the same buffer, dm-integrity reports an error [1]. The error is reported in a log and it may cause RAID leg being kicked out of the array. This commit fixes dm-integrity, so that if integrity verification fails, the data is read again into a kernel buffer (where userspace can't modify it) and the integrity tag is rechecked. If the recheck succeeds, the content of the kernel buffer is copied into the user buffer; if the recheck fails, an integrity error is reported. [1] https://people.redhat.com/~mpatocka/testcases/blk-auth-modify/read2.c Signed-off-by: Mikulas Patocka <[email protected]> Cc: [email protected] Signed-off-by: Mike Snitzer <[email protected]>
2024-02-20sched/membarrier: reduce the ability to hammer on sys_membarrierLinus Torvalds1-0/+6
On some systems, sys_membarrier can be very expensive, causing overall slowdowns for everything. So put a lock on the path in order to serialize the accesses to prevent the ability for this to be called at too high of a frequency and saturate the machine. Signed-off-by: Greg Kroah-Hartman <[email protected]> Reviewed-and-tested-by: Mathieu Desnoyers <[email protected]> Acked-by: Borislav Petkov <[email protected]> Fixes: 22e4ebb97582 ("membarrier: Provide expedited private command") Fixes: c5f58bd58f43 ("membarrier: Provide GLOBAL_EXPEDITED command") Signed-off-by: Linus Torvalds <[email protected]>
2024-02-20Merge tag 'imx-fixes-6.8' of ↵Arnd Bergmann5-19/+17
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 6.8: - A tqma8mpql device tree fix to correct audio codec iov-supply. - A couple of USB-C connector DT description revert to fix regression on imx8mp-dhcom-pdk3 and imx8mn-var-som-symphony board. - Fix valid range check for imx-weim bus driver. - Disable UART4 on Data Modul i.MX8M Plus eDM SBC to avoid boot hang in case that RDC protection is in place. * tag 'imx-fixes-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: bus: imx-weim: fix valid range check Revert "arm64: dts: imx8mn-var-som-symphony: Describe the USB-C connector" Revert "arm64: dts: imx8mp-dhcom-pdk3: Describe the USB-C connector" arm64: dts: tqma8mpql: fix audio codec iov-supply arm64: dts: imx8mp: Disable UART4 by default on Data Modul i.MX8M Plus eDM SBC Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2024-02-20ARM: ep93xx: Add terminator to gpiod_lookup_tableNikita Shubin1-0/+1
Without the terminator, if a con_id is passed to gpio_find() that does not exist in the lookup table the function will not stop looping correctly, and eventually cause an oops. Cc: [email protected] Fixes: b2e63555592f ("i2c: gpio: Convert to use descriptors") Reported-by: Andy Shevchenko <[email protected]> Signed-off-by: Nikita Shubin <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Acked-by: Alexander Sverdlin <[email protected]> Signed-off-by: Alexander Sverdlin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2024-02-20accel/ivpu: Don't enable any tiles by default on VPU40xxAndrzej Kacprowski1-1/+1
There is no point in requesting 1 tile on VPU40xx as the FW will probably need more tiles to run workloads, so it will have to reconfigure PLL anyway. Don't enable any tiles and allow the FW to perform initial tile configuration. This improves NPU boot stability as the tiles are always enabled only by the FW from the same initial state. Fixes: 79cdc56c4a54 ("accel/ivpu: Add initial support for VPU 4") Cc: [email protected] Signed-off-by: Andrzej Kacprowski <[email protected]> Signed-off-by: Jacek Lawrynowicz <[email protected]> Reviewed-by: Jeffrey Hugo <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-02-20platform/x86: thinkpad_acpi: Only update profile if successfully convertedMario Limonciello1-2/+3
Randomly a Lenovo Z13 will trigger a kernel warning traceback from this condition: ``` if (WARN_ON((profile < 0) || (profile >= ARRAY_SIZE(profile_names)))) ``` This happens because thinkpad-acpi always assumes that convert_dytc_to_profile() successfully updated the profile. On the contrary a condition can occur that when dytc_profile_refresh() is called the profile doesn't get updated as there is a -EOPNOTSUPP branch. Catch this situation and avoid updating the profile. Also log this into dynamic debugging in case any other modes should be added in the future. Fixes: c3bfcd4c6762 ("platform/x86: thinkpad_acpi: Add platform profile support") Signed-off-by: Mario Limonciello <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Hans de Goede <[email protected]>
2024-02-20platform/x86: intel-vbtn: Stop calling "VBDL" from notify_handlerHans de Goede1-3/+0
Commit 14c200b7ca46 ("platform/x86: intel-vbtn: Fix missing tablet-mode-switch events") causes 2 issues on the ThinkPad X1 Tablet Gen2: 1. The ThinkPad will wake up immediately from suspend 2. When put in tablet mode SW_TABLET_MODE reverts to 0 after about 1 second Both these issues are caused by the "VBDL" ACPI method call added at the end of the notify_handler. And it never became entirely clear if this call is even necessary to fix the issue of missing tablet-mode-switch events on the Dell Inspiron 7352. Drop the "VBDL" ACPI method call again to fix the 2 issues this is causing on the ThinkPad X1 Tablet Gen2. Fixes: 14c200b7ca46 ("platform/x86: intel-vbtn: Fix missing tablet-mode-switch events") Reported-by: Alexander Kobel <[email protected]> Closes: https://lore.kernel.org/platform-driver-x86/[email protected]/ Cc: [email protected] Cc: Arnold Gozum <[email protected]> Cc: [email protected] Signed-off-by: Hans de Goede <[email protected]> Tested-by: Alexander Kobel <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-20platform/x86: x86-android-tablets: Fix acer_b1_750_goodix_gpios nameHans de Goede1-2/+2
The Acer B1 750 tablet used a Novatek NVT-ts touchscreen, not a Goodix touchscreen. Rename acer_b1_750_goodix_gpios to acer_b1_750_nvt_ts_gpios to correctly reflect this. Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-20platform/x86: x86-android-tablets: Fix serdev instantiation no longer workingHans de Goede1-26/+9
After commit b286f4e87e32 ("serial: core: Move tty and serdev to be children of serial core port device") x86_instantiate_serdev() no longer works due to the serdev-controller-device moving in the device hierarchy from (e.g.) /sys/devices/pci0000:00/8086228A:00/serial0 to /sys/devices/pci0000:00/8086228A:00/8086228A:00:0/8086228A:00:0.0/serial0 Use the new get_serdev_controller() helper function to fix this. Fixes: b286f4e87e32 ("serial: core: Move tty and serdev to be children of serial core port device") Cc: Tony Lindgren <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-20platform/x86: Add new get_serdev_controller() helperHans de Goede1-0/+80
In some cases UART attached devices which require an in kernel driver, e.g. UART attached Bluetooth HCIs are described in the ACPI tables by an ACPI device with a broken or missing UartSerialBusV2() resource. This causes the kernel to create a /dev/ttyS# char-device for the UART instead of creating an in kernel serdev-controller + serdev-device pair for the in kernel driver. The quirk handling in acpi_quirk_skip_serdev_enumeration() makes the kernel create a serdev-controller device for these UARTs instead of a /dev/ttyS#. Instantiating the actual serdev-device to bind to is up to pdx86 code, so far this was handled by the x86-android-tablets code. But since commit b286f4e87e32 ("serial: core: Move tty and serdev to be children of serial core port device") the serdev-controller device has moved in the device hierarchy from (e.g.) /sys/devices/pci0000:00/8086228A:00/serial0 to /sys/devices/pci0000:00/8086228A:00/8086228A:00:0/8086228A:00:0.0/serial0 . This makes this a bit trickier to do and another driver is in the works which will also need this functionality. Add a new helper to get the serdev-controller device, so that the new code for this can be shared. Fixes: b286f4e87e32 ("serial: core: Move tty and serdev to be children of serial core port device") Cc: Tony Lindgren <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-20platform/x86: x86-android-tablets: Fix keyboard touchscreen on Lenovo ↵Hans de Goede3-0/+5
Yogabook1 X90 After commit 4014ae236b1d ("platform/x86: x86-android-tablets: Stop using gpiolib private APIs") the touchscreen in the keyboard half of the Lenovo Yogabook1 X90 stopped working with the following error: Goodix-TS i2c-goodix_ts: error -EBUSY: Failed to get irq GPIO The problem is that when getting the IRQ for instantiated i2c_client-s from a GPIO (rather then using an IRQ directly from the IOAPIC), x86_acpi_irq_helper_get() now properly requests the GPIO, which disallows other drivers from requesting it. Normally this is a good thing, but the goodix touchscreen also uses the IRQ as an output during reset to select which of its 2 possible I2C addresses should be used. Add a new free_gpio flag to struct x86_acpi_irq_data to deal with this and release the GPIO after getting the IRQ in this special case. Fixes: 4014ae236b1d ("platform/x86: x86-android-tablets: Stop using gpiolib private APIs") Cc: [email protected] Signed-off-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-02-20arm64/sme: Restore SMCR_EL1.EZT0 on exit from suspendMark Brown1-0/+2
The fields in SMCR_EL1 reset to an architecturally UNKNOWN value. Since we do not otherwise manage the traps configured in this register at runtime we need to reconfigure them after a suspend in case nothing else was kind enough to preserve them for us. Do so for SMCR_EL1.EZT0. Fixes: d4913eee152d ("arm64/sme: Add basic enumeration for SME2") Reported-by: Jackson Cooper-Driver <[email protected]> Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-02-20arm64/sme: Restore SME registers on exit from suspendMark Brown3-0/+19
The fields in SMCR_EL1 and SMPRI_EL1 reset to an architecturally UNKNOWN value. Since we do not otherwise manage the traps configured in this register at runtime we need to reconfigure them after a suspend in case nothing else was kind enough to preserve them for us. The vector length will be restored as part of restoring the SME state for the next SME using task. Fixes: a1f4ccd25cc2 ("arm64/sme: Provide Kconfig for SME") Reported-by: Jackson Cooper-Driver <[email protected]> Signed-off-by: Mark Brown <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-02-20Revert "arm64: jump_label: use constraints "Si" instead of "i""Will Deacon1-8/+4
This reverts commit f9daab0ad01cf9d165dbbbf106ca4e61d06e7fe8. Geert reports that his particular GCC 5.5 vintage toolchain fails to build an arm64 defconfig because of this change: | arch/arm64/include/asm/jump_label.h:25:2: error: invalid 'asm': | invalid operand | asm goto( ^ Aopparently, this is something we claim to support, so let's revert back to the old jump label constraint for now while discussions about raising the minimum GCC version are ongoing. Reported-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/CAMuHMdX+6fnAf8Hm6EqYJPAjrrLO9T7c=Gu3S8V_pqjSDowJ6g@mail.gmail.com Signed-off-by: Will Deacon <[email protected]>
2024-02-20perf: CXL: fix CPMU filter value mask lengthHojin Nam1-5/+5
CPMU filter value is described as 4B length in CXL r3.0 8.2.7.2.2. However, it is used as 2B length in code and comments. Reviewed-by: Jonathan Cameron <[email protected]> Signed-off-by: Hojin Nam <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2024-02-20gpiolib: Handle no pin_ranges in gpiochip_generic_config()Emil Renner Berthing1-0/+5
Similar to gpiochip_generic_request() and gpiochip_generic_free() the gpiochip_generic_config() function needs to handle the case where there are no pinctrl pins mapped to the GPIOs, usually through the gpio-ranges device tree property. Commit f34fd6ee1be8 ("gpio: dwapb: Use generic request, free and set_config") set the .set_config callback to gpiochip_generic_config() in the dwapb GPIO driver so the GPIO API can set pinctrl configuration for the corresponding pins. Most boards using the dwapb driver do not set the gpio-ranges device tree property though, and in this case gpiochip_generic_config() would return -EPROPE_DEFER rather than the previous -ENOTSUPP return value. This in turn makes gpio_set_config_with_argument_optional() fail and propagate the error to any driver requesting GPIOs. Fixes: 2956b5d94a76 ("pinctrl / gpio: Introduce .set_config() callback for GPIO chips") Reported-by: Jisheng Zhang <[email protected]> Closes: https://lore.kernel.org/linux-gpio/ZdC_g3U4l0CJIWzh@xhacker/ Tested-by: Jisheng Zhang <[email protected]> Signed-off-by: Emil Renner Berthing <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2024-02-20KVM: PPC: Book3S HV: Fix L2 guest reboot failure due to empty 'arch_compat'Amit Machhiwal2-4/+42
Currently, rebooting a pseries nested qemu-kvm guest (L2) results in below error as L1 qemu sends PVR value 'arch_compat' == 0 via ppc_set_compat ioctl. This triggers a condition failure in kvmppc_set_arch_compat() resulting in an EINVAL. qemu-system-ppc64: Unable to set CPU compatibility mode in KVM: Invalid argument Also, a value of 0 for arch_compat generally refers the default compatibility of the host. But, arch_compat, being a Guest Wide Element in nested API v2, cannot be set to 0 in GSB as PowerVM (L0) expects a non-zero value. A value of 0 triggers a kernel trap during a reboot and consequently causes it to fail: [ 22.106360] reboot: Restarting system KVM: unknown exit, hardware reason ffffffffffffffea NIP 0000000000000100 LR 000000000000fe44 CTR 0000000000000000 XER 0000000020040092 CPU#0 MSR 0000000000001000 HID0 0000000000000000 HF 6c000000 iidx 3 didx 3 TB 00000000 00000000 DECR 0 GPR00 0000000000000000 0000000000000000 c000000002a8c300 000000007fe00000 GPR04 0000000000000000 0000000000000000 0000000000001002 8000000002803033 GPR08 000000000a000000 0000000000000000 0000000000000004 000000002fff0000 GPR12 0000000000000000 c000000002e10000 0000000105639200 0000000000000004 GPR16 0000000000000000 000000010563a090 0000000000000000 0000000000000000 GPR20 0000000105639e20 00000001056399c8 00007fffe54abab0 0000000105639288 GPR24 0000000000000000 0000000000000001 0000000000000001 0000000000000000 GPR28 0000000000000000 0000000000000000 c000000002b30840 0000000000000000 CR 00000000 [ - - - - - - - - ] RES 000@ffffffffffffffff SRR0 0000000000000000 SRR1 0000000000000000 PVR 0000000000800200 VRSAVE 0000000000000000 SPRG0 0000000000000000 SPRG1 0000000000000000 SPRG2 0000000000000000 SPRG3 0000000000000000 SPRG4 0000000000000000 SPRG5 0000000000000000 SPRG6 0000000000000000 SPRG7 0000000000000000 HSRR0 0000000000000000 HSRR1 0000000000000000 CFAR 0000000000000000 LPCR 0000000000020400 PTCR 0000000000000000 DAR 0000000000000000 DSISR 0000000000000000 kernel:trap=0xffffffea | pc=0x100 | msr=0x1000 This patch updates kvmppc_set_arch_compat() to use the host PVR value if 'compat_pvr' == 0 indicating that qemu doesn't want to enforce any specific PVR compat mode. The relevant part of the code might need a rework if PowerVM implements a support for `arch_compat == 0` in nestedv2 API. Fixes: 19d31c5f1157 ("KVM: PPC: Add support for nestedv2 guests") Reviewed-by: "Aneesh Kumar K.V (IBM)" <[email protected]> Reviewed-by: Vaibhav Jain <[email protected]> Signed-off-by: Amit Machhiwal <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://msgid.link/[email protected]
2024-02-20docs: netdev: update the link to the CI repoJakub Kicinski1-1/+1
Netronome graciously transferred the original NIPA repo to our new netdev umbrella org. Link to that instead of my private fork. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-02-20arp: Prevent overflow in arp_req_get().Kuniyuki Iwashima1-1/+2
syzkaller reported an overflown write in arp_req_get(). [0] When ioctl(SIOCGARP) is issued, arp_req_get() looks up an neighbour entry and copies neigh->ha to struct arpreq.arp_ha.sa_data. The arp_ha here is struct sockaddr, not struct sockaddr_storage, so the sa_data buffer is just 14 bytes. In the splat below, 2 bytes are overflown to the next int field, arp_flags. We initialise the field just after the memcpy(), so it's not a problem. However, when dev->addr_len is greater than 22 (e.g. MAX_ADDR_LEN), arp_netmask is overwritten, which could be set as htonl(0xFFFFFFFFUL) in arp_ioctl() before calling arp_req_get(). To avoid the overflow, let's limit the max length of memcpy(). Note that commit b5f0de6df6dc ("net: dev: Convert sa_data to flexible array in struct sockaddr") just silenced syzkaller. [0]: memcpy: detected field-spanning write (size 16) of single field "r->arp_ha.sa_data" at net/ipv4/arp.c:1128 (size 14) WARNING: CPU: 0 PID: 144638 at net/ipv4/arp.c:1128 arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128 Modules linked in: CPU: 0 PID: 144638 Comm: syz-executor.4 Not tainted 6.1.74 #31 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 RIP: 0010:arp_req_get+0x411/0x4a0 net/ipv4/arp.c:1128 Code: fd ff ff e8 41 42 de fb b9 0e 00 00 00 4c 89 fe 48 c7 c2 20 6d ab 87 48 c7 c7 80 6d ab 87 c6 05 25 af 72 04 01 e8 5f 8d ad fb <0f> 0b e9 6c fd ff ff e8 13 42 de fb be 03 00 00 00 4c 89 e7 e8 a6 RSP: 0018:ffffc900050b7998 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ffff88803a815000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff8641a44a RDI: 0000000000000001 RBP: ffffc900050b7a98 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 203a7970636d656d R12: ffff888039c54000 R13: 1ffff92000a16f37 R14: ffff88803a815084 R15: 0000000000000010 FS: 00007f172bf306c0(0000) GS:ffff88805aa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f172b3569f0 CR3: 0000000057f12005 CR4: 0000000000770ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> arp_ioctl+0x33f/0x4b0 net/ipv4/arp.c:1261 inet_ioctl+0x314/0x3a0 net/ipv4/af_inet.c:981 sock_do_ioctl+0xdf/0x260 net/socket.c:1204 sock_ioctl+0x3ef/0x650 net/socket.c:1321 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:870 [inline] __se_sys_ioctl fs/ioctl.c:856 [inline] __x64_sys_ioctl+0x18e/0x220 fs/ioctl.c:856 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x37/0x90 arch/x86/entry/common.c:81 entry_SYSCALL_64_after_hwframe+0x64/0xce RIP: 0033:0x7f172b262b8d Code: 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f172bf300b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 00007f172b3abf80 RCX: 00007f172b262b8d RDX: 0000000020000000 RSI: 0000000000008954 RDI: 0000000000000003 RBP: 00007f172b2d3493 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000000000000000b R14: 00007f172b3abf80 R15: 00007f172bf10000 </TASK> Reported-by: syzkaller <[email protected]> Reported-by: Bjoern Doebel <[email protected]> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-02-20devlink: fix possible use-after-free and memory leaks in devlink_init()Vasiliy Kovalev1-3/+9
The pernet operations structure for the subsystem must be registered before registering the generic netlink family. Make an unregister in case of unsuccessful registration. Fixes: 687125b5799c ("devlink: split out core code") Signed-off-by: Vasiliy Kovalev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-02-20ipv6: sr: fix possible use-after-free and null-ptr-derefVasiliy Kovalev1-9/+11
The pernet operations structure for the subsystem must be registered before registering the generic netlink family. Fixes: 915d7e5e5930 ("ipv6: sr: add code base for control plane support of SR-IPv6") Signed-off-by: Vasiliy Kovalev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-02-20afs: Increase buffer size in afs_update_volume_status()Daniil Dulov1-2/+2
The max length of volume->vid value is 20 characters. So increase idbuf[] size up to 24 to avoid overflow. Found by Linux Verification Center (linuxtesting.org) with SVACE. [DH: Actually, it's 20 + NUL, so increase it to 24 and use snprintf()] Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: Daniil Dulov <[email protected]> Signed-off-by: David Howells <[email protected]> Link: https://lore.kernel.org/r/[email protected]/ # v1 Link: https://lore.kernel.org/r/[email protected]/ # v2 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2024-02-20afs: Fix ignored callbacks over ipv4Marc Dionne3-15/+8
When searching for a matching peer, all addresses need to be searched, not just the ipv6 ones in the fs_addresses6 list. Given that the lists no longer contain addresses, there is little reason to splitting things between separate lists, so unify them into a single list. When processing an incoming callback from an ipv4 address, this would lead to a failure to set call->server, resulting in the callback being ignored and the client seeing stale contents. Fixes: 72904d7b9bfb ("rxrpc, afs: Allow afs to pin rxrpc_peer objects") Reported-by: Markus Suvanto <[email protected]> Link: https://lists.infradead.org/pipermail/linux-afs/2024-February/008035.html Signed-off-by: Marc Dionne <[email protected]> Signed-off-by: David Howells <[email protected]> Link: https://lists.infradead.org/pipermail/linux-afs/2024-February/008037.html # v1 Link: https://lists.infradead.org/pipermail/linux-afs/2024-February/008066.html # v2 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2024-02-20cachefiles: fix memory leak in cachefiles_add_cache()Baokun Li2-0/+3
The following memory leak was reported after unbinding /dev/cachefiles: ================================================================== unreferenced object 0xffff9b674176e3c0 (size 192): comm "cachefilesd2", pid 680, jiffies 4294881224 hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc ea38a44b): [<ffffffff8eb8a1a5>] kmem_cache_alloc+0x2d5/0x370 [<ffffffff8e917f86>] prepare_creds+0x26/0x2e0 [<ffffffffc002eeef>] cachefiles_determine_cache_security+0x1f/0x120 [<ffffffffc00243ec>] cachefiles_add_cache+0x13c/0x3a0 [<ffffffffc0025216>] cachefiles_daemon_write+0x146/0x1c0 [<ffffffff8ebc4a3b>] vfs_write+0xcb/0x520 [<ffffffff8ebc5069>] ksys_write+0x69/0xf0 [<ffffffff8f6d4662>] do_syscall_64+0x72/0x140 [<ffffffff8f8000aa>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 ================================================================== Put the reference count of cache_cred in cachefiles_daemon_unbind() to fix the problem. And also put cache_cred in cachefiles_add_cache() error branch to avoid memory leaks. Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") CC: [email protected] Signed-off-by: Baokun Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Acked-by: David Howells <[email protected]> Reviewed-by: Jingbo Xu <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-02-20usb: typec: tpcm: Fix issues with power being removed during resetMark Brown1-1/+2
Since the merge of b717dfbf73e8 ("Revert "usb: typec: tcpm: fix cc role at port reset"") into mainline the LibreTech Renegade Elite/Firefly has died during boot, the main symptom observed in testing is a sudden stop in console output. Gábor Stefanik identified in review that the patch would cause power to be removed from devices without batteries (like this board), observing that while the patch is correct according to the spec this appears to be an oversight in the spec. Given that the change makes previously working systems unusable let's revert it, there was some discussion of identifying systems that have alternative power and implementing the standards conforming behaviour in only that case. Fixes: b717dfbf73e8 ("Revert "usb: typec: tcpm: fix cc role at port reset"") Cc: stable <[email protected]> Cc: Badhri Jagan Sridharan <[email protected]> Signed-off-by: Mark Brown <[email protected]> Acked-by: Heikki Krogerus <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-02-20erofs: fix handling kern_mount() failureAl Viro1-3/+4
if you have a variable that holds NULL or a pointer to live struct mount, do not shove ERR_PTR() into it - not if you later treat "not NULL" as "holds a pointer to object". Signed-off-by: Al Viro <[email protected]>
2024-02-19KVM/VMX: Move VERW closer to VMentry for MDS mitigationPawan Gupta2-4/+19
During VMentry VERW is executed to mitigate MDS. After VERW, any memory access like register push onto stack may put host data in MDS affected CPU buffers. A guest can then use MDS to sample host data. Although likelihood of secrets surviving in registers at current VERW callsite is less, but it can't be ruled out. Harden the MDS mitigation by moving the VERW mitigation late in VMentry path. Note that VERW for MMIO Stale Data mitigation is unchanged because of the complexity of per-guest conditional VERW which is not easy to handle that late in asm with no GPRs available. If the CPU is also affected by MDS, VERW is unconditionally executed late in asm regardless of guest having MMIO access. Signed-off-by: Pawan Gupta <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Acked-by: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-6-a6216d83edb7%40linux.intel.com
2024-02-19KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCHSean Christopherson2-5/+8
Use EFLAGS.CF instead of EFLAGS.ZF to track whether to use VMRESUME versus VMLAUNCH. Freeing up EFLAGS.ZF will allow doing VERW, which clobbers ZF, for MDS mitigations as late as possible without needing to duplicate VERW for both paths. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Pawan Gupta <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Nikolay Borisov <[email protected]> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-5-a6216d83edb7%40linux.intel.com
2024-02-19x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static keyPawan Gupta6-37/+34
The VERW mitigation at exit-to-user is enabled via a static branch mds_user_clear. This static branch is never toggled after boot, and can be safely replaced with an ALTERNATIVE() which is convenient to use in asm. Switch to ALTERNATIVE() to use the VERW mitigation late in exit-to-user path. Also remove the now redundant VERW in exc_nmi() and arch_exit_to_user_mode(). Signed-off-by: Pawan Gupta <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-4-a6216d83edb7%40linux.intel.com
2024-02-19x86/entry_32: Add VERW just before userspace transitionPawan Gupta1-0/+3
As done for entry_64, add support for executing VERW late in exit to user path for 32-bit mode. Signed-off-by: Pawan Gupta <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Link: https://lore.kernel.org/all/20240213-delay-verw-v8-3-a6216d83edb7%40linux.intel.com