aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-06-09mm/huge_memory: Fix xarray node memory leakMatthew Wilcox (Oracle)3-4/+5
If xas_split_alloc() fails to allocate the necessary nodes to complete the xarray entry split, it sets the xa_state to -ENOMEM, which xas_nomem() then interprets as "Please allocate more memory", not as "Please free any unnecessary memory" (which was the intended outcome). It's confusing to use xas_nomem() to free memory in this context, so call xas_destroy() instead. Reported-by: [email protected] Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Cc: [email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
2022-06-09filemap: Cache the value of vm_flagsMatthew Wilcox (Oracle)1-4/+5
After we have unlocked the mmap_lock for I/O, the file is pinned, but the VMA is not. Checking this flag after that can be a use-after-free. It's not a terribly interesting use-after-free as it can only read one bit, and it's used to decide whether to read 2MB or 4MB. But it upsets the automated tools and it's generally bad practice anyway, so let's fix it. Reported-by: [email protected] Fixes: 4687fdbb805a ("mm/filemap: Support VM_HUGEPAGE for file mappings") Cc: [email protected] Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
2022-06-09filemap: Don't release a locked folioMatthew Wilcox (Oracle)1-0/+2
We must hold a reference over the call to filemap_release_folio(), otherwise the page cache will put the last reference to the folio before we unlock it, leading to splats like this: BUG: Bad page state in process u8:5 pfn:1ab1f4 page:ffffea0006ac7d00 refcount:0 mapcount:0 mapping:0000000000000000 index:0x28b1de pfn:0x1ab1f4 flags: 0x17ff80000040001(locked|reclaim|node=0|zone=2|lastcpupid=0xfff) raw: 017ff80000040001 dead000000000100 dead000000000122 0000000000000000 raw: 000000000028b1de 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set It's an error path, so it doesn't see much testing. Reported-by: Darrick J. Wong <[email protected]> Fixes: a42634a6c07d ("readahead: Use a folio in read_pages()") Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
2022-06-09MIPS: Loongson-3: fix compile mips cpu_hwmon as module build error.Yupeng Li1-1/+1
set cpu_hwmon as a module build with loongson_sysconf, loongson_chiptemp undefined error,fix cpu_hwmon compile options to be bool.Some kernel compilation error information is as follows: Checking missing-syscalls for N32 CALL scripts/checksyscalls.sh Checking missing-syscalls for O32 CALL scripts/checksyscalls.sh CALL scripts/checksyscalls.sh CHK include/generated/compile.h CC [M] drivers/platform/mips/cpu_hwmon.o Building modules, stage 2. MODPOST 200 modules ERROR: "loongson_sysconf" [drivers/platform/mips/cpu_hwmon.ko] undefined! ERROR: "loongson_chiptemp" [drivers/platform/mips/cpu_hwmon.ko] undefined! make[1]: *** [scripts/Makefile.modpost:92:__modpost] 错误 1 make: *** [Makefile:1261:modules] 错误 2 Signed-off-by: Yupeng Li <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Reviewed-by: Huacai Chen <[email protected]> Signed-off-by: Thomas Bogendoerfer <[email protected]>
2022-06-09Merge tag 'fs_for_v5.19-rc2' of ↵Linus Torvalds4-11/+40
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, writeback, and quota fixes and cleanups from Jan Kara: "A fix for race in writeback code and two cleanups in quota and ext2" * tag 'fs_for_v5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: Prevent memory allocation recursion while holding dq_lock writeback: Fix inode->i_io_list not be protected by inode->i_lock error fs: Fix syntax errors in comments
2022-06-09Merge tag 'powerpc-5.19-2' of ↵Linus Torvalds11-21/+38
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - On 32-bit fix overread/overwrite of thread_struct via ptrace PEEK/POKE. - Fix softirqs not switching to the softirq stack since we moved irq_exit(). - Force thread size increase when KASAN is enabled to avoid stack overflows. - On Book3s 64 mark more code as not to be instrumented by KASAN to avoid crashes. - Exempt __get_wchan() from KASAN checking, as it's inherently racy. - Fix a recently introduced crash in the papr_scm driver in some configurations. - Remove include of <generated/compile.h> which is forbidden. Thanks to Ariel Miculas, Chen Jingwen, Christophe Leroy, Erhard Furtner, He Ying, Kees Cook, Masahiro Yamada, Nageswara R Sastry, Paul Mackerras, Sachin Sant, Vaibhav Jain, and Wanming Hu. * tag 'powerpc-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/32: Fix overread/overwrite of thread_struct via ptrace powerpc/book3e: get rid of #include <generated/compile.h> powerpc/kasan: Force thread size increase with KASAN powerpc/papr_scm: don't requests stats with '0' sized stats buffer powerpc: Don't select HAVE_IRQ_EXIT_ON_IRQ_STACK powerpc/kasan: Silence KASAN warnings in __get_wchan() powerpc/kasan: Mark more real-mode code as not to be instrumented
2022-06-09Merge tag 'net-5.19-rc2' of ↵Linus Torvalds44-189/+367
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - eth: amt: fix possible null-ptr-deref in amt_rcv() Previous releases - regressions: - tcp: use alloc_large_system_hash() to allocate table_perturb - af_unix: fix a data-race in unix_dgram_peer_wake_me() - nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling - eth: ixgbe: fix unexpected VLAN rx in promisc mode on VF Previous releases - always broken: - ipv6: fix signed integer overflow in __ip6_append_data - netfilter: - nat: really support inet nat without l3 address - nf_tables: memleak flow rule from commit path - bpf: fix calling global functions from BPF_PROG_TYPE_EXT programs - openvswitch: fix misuse of the cached connection on tuple changes - nfc: nfcmrvl: fix memory leak in nfcmrvl_play_deferred - eth: altera: fix refcount leak in altera_tse_mdio_create Misc: - add Quentin Monnet to bpftool maintainers" * tag 'net-5.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (45 commits) net: amd-xgbe: fix clang -Wformat warning tcp: use alloc_large_system_hash() to allocate table_perturb net: dsa: realtek: rtl8365mb: fix GMII caps for ports with internal PHY net: dsa: mv88e6xxx: correctly report serdes link failure net: dsa: mv88e6xxx: fix BMSR error to be consistent with others net: dsa: mv88e6xxx: use BMSR_ANEGCOMPLETE bit for filling an_complete net: altera: Fix refcount leak in altera_tse_mdio_create net: openvswitch: fix misuse of the cached connection on tuple changes net: ethernet: mtk_eth_soc: fix misuse of mem alloc interface netdev[napi]_alloc_frag ip_gre: test csum_start instead of transport header au1000_eth: stop using virt_to_bus() ipv6: Fix signed integer overflow in l2tp_ip6_sendmsg ipv6: Fix signed integer overflow in __ip6_append_data nfc: nfcmrvl: Fix memory leak in nfcmrvl_play_deferred nfc: st21nfca: fix incorrect sizing calculations in EVT_TRANSACTION nfc: st21nfca: fix memory leaks in EVT_TRANSACTION handling nfc: st21nfca: fix incorrect validating logic in EVT_TRANSACTION net: ipv6: unexport __init-annotated seg6_hmac_init() net: xfrm: unexport __init-annotated xfrm4_protocol_init() net: mdio: unexport __init-annotated mdio_bus_init() ...
2022-06-09docs: arm: tcm: Fix typo in description of TCM and MMU usageSimon Horman1-1/+1
Correct a typo in the description of interaction between the TCM and MMU. Found by inspection. Signed-off-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2022-06-10scripts/check-local-export: avoid 'wait $!' for process substitutionMasahiro Yamada2-15/+33
Bash 4.4, released in 2016, supports 'wait $!' to check the exit status of a process substitution, but it seems too new. Some people using older bash versions (on CentOS 7, Ubuntu 16.04, etc.) reported an error like this: ./scripts/check-local-export: line 54: wait: pid 17328 is not a child of this shell I used the process substitution to avoid a pipeline, which executes each command in a subshell. If the while-loop is executed in the subshell context, variable changes within are lost after the subshell terminates. Fortunately, Bash 4.2, released in 2011, supports the 'lastpipe' option, which makes the last element of a pipeline run in the current shell process. Switch to the pipeline with 'lastpipe' solution, and also set 'pipefail' to catch errors from ${NM}. Add the bash requirement to Documentation/process/changes.rst. Fixes: 31cb50b5590f ("kbuild: check static EXPORT_SYMBOL* by script instead of modpost") Reported-by: Tetsuo Handa <[email protected]> Reported-by: Michael Ellerman <[email protected]> Reported-by: Wang Yugui <[email protected]> Tested-by: Tetsuo Handa <[email protected]> Tested-by: Jon Hunter <[email protected]> Acked-by: Nick Desaulniers <[email protected]> Tested-by: Sedat Dilek <[email protected]> # LLVM-14 (x86-64) Signed-off-by: Masahiro Yamada <[email protected]>
2022-06-09netfs: gcc-12: temporarily disable '-Wattribute-warning' for nowLinus Torvalds2-0/+6
This is a pure band-aid so that I can continue merging stuff from people while some of the gcc-12 fallout gets sorted out. In particular, gcc-12 is very unhappy about the kinds of pointer arithmetic tricks that netfs does, and that makes the fortify checks trigger in afs and ceph: In function ‘fortify_memset_chk’, inlined from ‘netfs_i_context_init’ at include/linux/netfs.h:327:2, inlined from ‘afs_set_netfs_context’ at fs/afs/inode.c:61:2, inlined from ‘afs_root_iget’ at fs/afs/inode.c:543:2: include/linux/fortify-string.h:258:25: warning: call to ‘__write_overflow_field’ declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 258 | __write_overflow_field(p_size_field, size); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ and the reason is that netfs_i_context_init() is passed a 'struct inode' pointer, and then it does struct netfs_i_context *ctx = netfs_i_context(inode); memset(ctx, 0, sizeof(*ctx)); where that netfs_i_context() function just does pointer arithmetic on the inode pointer, knowing that the netfs_i_context is laid out immediately after it in memory. This is all truly disgusting, since the whole "netfs_i_context is laid out immediately after it in memory" is not actually remotely true in general, but is just made to be that way for afs and ceph. See for example fs/cifs/cifsglob.h: struct cifsInodeInfo { struct { /* These must be contiguous */ struct inode vfs_inode; /* the VFS's inode record */ struct netfs_i_context netfs_ctx; /* Netfslib context */ }; [...] and realize that this is all entirely wrong, and the pointer arithmetic that netfs_i_context() is doing is also very very wrong and wouldn't give the right answer if netfs_ctx had different alignment rules from a 'struct inode', for example). Anyway, that's just a long-winded way to say "the gcc-12 warning is actually quite reasonable, and our code happens to work but is pretty disgusting". This is getting fixed properly, but for now I made the mistake of thinking "the week right after the merge window tends to be calm for me as people take a breather" and I did a sustem upgrade. And I got gcc-12 as a result, so to continue merging fixes from people and not have the end result drown in warnings, I am fixing all these gcc-12 issues I hit. Including with these kinds of temporary fixes. Cc: Kees Cook <[email protected]> Cc: David Howells <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Linus Torvalds <[email protected]>
2022-06-09gcc-12: disable '-Warray-bounds' universally for nowLinus Torvalds4-9/+12
In commit 8b202ee21839 ("s390: disable -Warray-bounds") the s390 people disabled the '-Warray-bounds' warning for gcc-12, because the new logic in gcc would cause warnings for their use of the S390_lowcore macro, which accesses absolute pointers. It turns out gcc-12 has many other issues in this area, so this takes that s390 warning disable logic, and turns it into a kernel build config entry instead. Part of the intent is that we can make this all much more targeted, and use this conflig flag to disable it in only particular configurations that cause problems, with the s390 case as an example: select GCC12_NO_ARRAY_BOUNDS and we could do that for other configuration cases that cause issues. Or we could possibly use the CONFIG_CC_NO_ARRAY_BOUNDS thing in a more targeted way, and disable the warning only for particular uses: again the s390 case as an example: KBUILD_CFLAGS_DECOMPRESSOR += $(if $(CONFIG_CC_NO_ARRAY_BOUNDS),-Wno-array-bounds) but this ends up just doing it globally in the top-level Makefile, since the current issues are spread fairly widely all over: KBUILD_CFLAGS-$(CONFIG_CC_NO_ARRAY_BOUNDS) += -Wno-array-bounds We'll try to limit this later, since the gcc-12 problems are rare enough that *much* of the kernel can be built with it without disabling this warning. Cc: Kees Cook <[email protected]> Cc: Nathan Chancellor <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-06-09mellanox: mlx5: avoid uninitialized variable warning with gcc-12Linus Torvalds1-1/+1
gcc-12 started warning about 'tracker' being used uninitialized: drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c: In function ‘mlx5_do_bond’: drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c:786:28: warning: ‘tracker’ is used uninitialized [-Wuninitialized] 786 | struct lag_tracker tracker; | ^~~~~~~ which seems to be because it doesn't track how the use (and initialization) is bound by the 'do_bond' flag. But admittedly that 'do_bond' usage is fairly complicated, and involves passing it around as an argument to helper functions, so it's somewhat understandable that gcc doesn't see how that all works. This function could be rewritten to make the use of that tracker variable more obviously safe, but for now I'm just adding the forced initialization of it. Signed-off-by: Linus Torvalds <[email protected]>
2022-06-09irqchip/uniphier-aidet: Add compatible string for NX1 SoCKunihiko Hayashi1-0/+1
Add the compatible string to support UniPhier NX1 SoC, which has the same kinds of controls as the other UniPhier SoCs. Signed-off-by: Kunihiko Hayashi <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-09dt-bindings: interrupt-controller/uniphier-aidet: Add bindings for NX1 SoCKunihiko Hayashi1-0/+1
Update uniphier-aidet binding document for UniPhier NX1 SoC. Signed-off-by: Kunihiko Hayashi <[email protected]> Acked-by: Rob Herring <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-09gcc-12: disable '-Wdangling-pointer' warning for nowLinus Torvalds1-0/+3
While the concept of checking for dangling pointers to local variables at function exit is really interesting, the gcc-12 implementation is not compatible with reality, and results in false positives. For example, gcc sees us putting things on a local list head allocated on the stack, which involves exactly those kinds of pointers to the local stack entry: In function ‘__list_add’, inlined from ‘list_add_tail’ at include/linux/list.h:102:2, inlined from ‘rebuild_snap_realms’ at fs/ceph/snap.c:434:2: include/linux/list.h:74:19: warning: storing the address of local variable ‘realm_queue’ in ‘*&realm_27(D)->rebuild_item.prev’ [-Wdangling-pointer=] 74 | new->prev = prev; | ~~~~~~~~~~^~~~~~ But then gcc - understandably - doesn't really understand the big picture how the doubly linked list works, so doesn't see how we then end up emptying said list head in a loop and the pointer we added has been removed. Gcc also complains about us (intentionally) using this as a way to store a kind of fake stack trace, eg drivers/acpi/acpica/utdebug.c:40:38: warning: storing the address of local variable ‘current_sp’ in ‘acpi_gbl_entry_stack_pointer’ [-Wdangling-pointer=] 40 | acpi_gbl_entry_stack_pointer = &current_sp; | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~ which is entirely reasonable from a compiler standpoint, and we may want to change those kinds of patterns, but not not. So this is one of those "it would be lovely if the compiler were to complain about us leaving dangling pointers to the stack", but not this way. Signed-off-by: Linus Torvalds <[email protected]>
2022-06-09drm: imx: fix compiler warning with gcc-12Linus Torvalds1-1/+1
Gcc-12 correctly warned about this code using a non-NULL pointer as a truth value: drivers/gpu/drm/imx/ipuv3-crtc.c: In function ‘ipu_crtc_disable_planes’: drivers/gpu/drm/imx/ipuv3-crtc.c:72:21: error: the comparison will always evaluate as ‘true’ for the address of ‘plane’ will never be NULL [-Werror=address] 72 | if (&ipu_crtc->plane[1] && plane == &ipu_crtc->plane[1]->base) | ^ due to the extraneous '&' address-of operator. Philipp Zabel points out that The mistake had no adverse effect since the following condition doesn't actually dereference the NULL pointer, but the intent of the code was obviously to check for it, not to take the address of the member. Fixes: eb8c88808c83 ("drm/imx: add deferred plane disabling") Acked-by: Philipp Zabel <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-06-09irqchip/realtek-rtl: Fix refcount leak in map_interruptsMiaoqian Lin1-1/+1
of_find_node_by_phandle() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. This function doesn't call of_node_put() in error path. Call of_node_put() directly after of_property_read_u32() to cover both normal path and error path. Fixes: 9f3a0f34b84a ("irqchip: Add support for Realtek RTL838x/RTL839x interrupt controller") Signed-off-by: Miaoqian Lin <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-09irqchip/gic-v3: Fix refcount leak in gic_populate_ppi_partitionsMiaoqian Lin1-1/+4
of_find_node_by_phandle() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. Add missing of_node_put() to avoid refcount leak. Fixes: e3825ba1af3a ("irqchip/gic-v3: Add support for partitioned PPIs") Signed-off-by: Miaoqian Lin <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-09irqchip/gic-v3: Fix error handling in gic_populate_ppi_partitionsMiaoqian Lin1-1/+1
of_get_child_by_name() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. When kcalloc fails, it missing of_node_put() and results in refcount leak. Fix this by goto out_put_node label. Fixes: 52085d3f2028 ("irqchip/gic-v3: Dynamically allocate PPI partition descriptors") Signed-off-by: Miaoqian Lin <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-09irqchip/apple-aic: Fix refcount leak in aic_of_ic_initMiaoqian Lin1-0/+1
of_get_child_by_name() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. Add missing of_node_put() to avoid refcount leak. Fixes: a5e8801202b3 ("irqchip/apple-aic: Parse FIQ affinities from device-tree") Signed-off-by: Miaoqian Lin <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-09irqchip/apple-aic: Fix refcount leak in build_fiq_affinityMiaoqian Lin1-0/+1
of_find_node_by_phandle() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. Add missing of_node_put() to avoid refcount leak. Fixes: a5e8801202b3 ("irqchip/apple-aic: Parse FIQ affinities from device-tree") Signed-off-by: Miaoqian Lin <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-09irqchip/gic/realview: Fix refcount leak in realview_gic_of_initMiaoqian Lin1-0/+1
of_find_matching_node_and_match() returns a node pointer with refcount incremented, we should use of_node_put() on it when not need anymore. Add missing of_node_put() to avoid refcount leak. Fixes: 82b0a434b436 ("irqchip/gic/realview: Support more RealView DCC variants") Signed-off-by: Miaoqian Lin <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-09irqchip/xilinx: Remove microblaze+zynq dependencyJamie Iles1-1/+1
The Xilinx IRQ controller doesn't really have any architecture dependencies - it's a generic AXI component that can be used for any FPGA core from Zynq hard processor systems to microblaze+riscv soft cores and more. Signed-off-by: Jamie Iles <[email protected]> Acked-by: Michal Simek <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-09docs: Move the HTE documentation to driver-api/Jonathan Corbet6-2/+2
The hardware timestamp engine documentation is driver API material, and really belongs in the driver-API book; move it there. Cc: Thierry Reding <[email protected]> Acked-by: Dipen Patel <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]>
2022-06-09iavf: Fix issue with MAC address of VF shown as zeroMichal Wilczynski1-1/+1
After reinitialization of iavf, ice driver gets VIRTCHNL_OP_ADD_ETH_ADDR message with incorrectly set type of MAC address. Hardware address should have is_primary flag set as true. This way ice driver knows what it has to set as a MAC address. Check if the address is primary in iavf_add_filter function and set flag accordingly. To test set all-zero MAC on a VF. This triggers iavf re-initialization and VIRTCHNL_OP_ADD_ETH_ADDR message gets sent to PF. For example: ip link set dev ens785 vf 0 mac 00:00:00:00:00:00 This triggers re-initialization of iavf. New MAC should be assigned. Now check if MAC is non-zero: ip link show dev ens785 Fixes: a3e839d539e0 ("iavf: Add usage of new virtchnl format to set default MAC") Signed-off-by: Michal Wilczynski <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-06-09i40e: Fix call trace in setup_tx_descriptorsAleksandr Loktionov1-8/+17
After PF reset and ethtool -t there was call trace in dmesg sometimes leading to panic. When there was some time, around 5 seconds, between reset and test there were no errors. Problem was that pf reset calls i40e_vsi_close in prep_for_reset and ethtool -t calls i40e_vsi_close in diag_test. If there was not enough time between those commands the second i40e_vsi_close starts before previous i40e_vsi_close was done which leads to crash. Add check to diag_test if pf is in reset and don't start offline tests if it is true. Add netif_info("testing failed") into unhappy path of i40e_diag_test() Fixes: e17bc411aea8 ("i40e: Disable offline diagnostics if VFs are enabled") Fixes: 510efb2682b3 ("i40e: Fix ethtool offline diagnostic with netqueues") Signed-off-by: Michal Jaron <[email protected]> Signed-off-by: Aleksandr Loktionov <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-06-09i40e: Fix calculating the number of queue pairsGrzegorz Szczurek1-1/+1
If ADQ is enabled for a VF, then actual number of queue pair is a number of currently available traffic classes for this VF. Without this change the configuration of the Rx/Tx queues fails with error. Fixes: d29e0d233e0d ("i40e: missing input validation on VF message handling by the PF") Signed-off-by: Grzegorz Szczurek <[email protected]> Signed-off-by: Jedrzej Jagielski <[email protected]> Tested-by: Bharathi Sreenivas <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-06-09i40e: Fix adding ADQ filter to TC0Grzegorz Szczurek1-0/+5
Procedure of configure tc flower filters erroneously allows to create filters on TC0 where unfiltered packets are also directed by default. Issue was caused by insufficient checks of hw_tc parameter specifying the hardware traffic class to pass matching packets to. Fix checking hw_tc parameter which blocks creation of filters on TC0. Fixes: 2f4b411a3d67 ("i40e: Enable cloud filters via tc-flower") Signed-off-by: Grzegorz Szczurek <[email protected]> Signed-off-by: Jedrzej Jagielski <[email protected]> Tested-by: Bharathi Sreenivas <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-06-09docs: usb: fix literal block marker in usbmon verification exampleJustin Swartz1-1/+1
The "Verify that bus sockets are present" example was not properly formatted due to a typo in the literal block marker. Signed-off-by: Justin Swartz <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2022-06-09Documentation/features: Update the arch support status filesZheng Zengkai42-5/+47
The arch support status files don't match reality as of v5.19-rc1, use the features-refresh.sh to refresh all the arch-support.txt files in place. The main effect is to add entries for the new loong architecture. Signed-off-by: Zheng Zengkai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jonathan Corbet <[email protected]>
2022-06-09genirq: PM: Use runtime PM for chained interruptsMarc Zyngier1-1/+4
When requesting an interrupt, we correctly call into the runtime PM framework to guarantee that the underlying interrupt controller is up and running. However, we fail to do so for chained interrupt controllers, as the mux interrupt is not requested along the same path. Augment __irq_do_set_handler() to call into the runtime PM code in this case, making sure the PM flow is the same for all interrupts. Reported-by: Lucas Stach <[email protected]> Tested-by: Liu Ying <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-06-09KVM: selftests: Restrict test region to 48-bit physical addresses when using ↵David Matlack1-3/+15
nested The selftests nested code only supports 4-level paging at the moment. This means it cannot map nested guest physical addresses with more than 48 bits. Allow perf_test_util nested mode to work on hosts with more than 48 physical addresses by restricting the guest test region to 48-bits. While here, opportunistically fix an off-by-one error when dealing with vm_get_max_gfn(). perf_test_util.c was treating this as the maximum number of GFNs, rather than the maximum allowed GFN. This didn't result in any correctness issues, but it did end up shifting the test region down slightly when using huge pages. Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: David Matlack <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: selftests: Add option to run dirty_log_perf_test vCPUs in L2David Matlack8-8/+182
Add an option to dirty_log_perf_test that configures the vCPUs to run in L2 instead of L1. This makes it possible to benchmark the dirty logging performance of nested virtualization, which is particularly interesting because KVM must shadow L1's EPT/NPT tables. For now this support only works on x86_64 CPUs with VMX. Otherwise passing -n results in the test being skipped. Signed-off-by: David Matlack <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: selftests: Clean up LIBKVM files in MakefileDavid Matlack1-5/+31
Break up the long lines for LIBKVM and alphabetize each architecture. This makes reading the Makefile easier, and will make reading diffs to LIBKVM easier. No functional change intended. Reviewed-by: Peter Xu <[email protected]> Signed-off-by: David Matlack <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: selftests: Link selftests directly with lib object filesDavid Matlack1-7/+4
The linker does obey strong/weak symbols when linking static libraries, it simply resolves an undefined symbol to the first-encountered symbol. This means that defining __weak arch-generic functions and then defining arch-specific strong functions to override them in libkvm will not always work. More specifically, if we have: lib/generic.c: void __weak foo(void) { pr_info("weak\n"); } void bar(void) { foo(); } lib/x86_64/arch.c: void foo(void) { pr_info("strong\n"); } And a selftest that calls bar(), it will print "weak". Now if you make generic.o explicitly depend on arch.o (e.g. add function to arch.c that is called directly from generic.c) it will print "strong". In other words, it seems that the linker is free to throw out arch.o when linking because generic.o does not explicitly depend on it, which causes the linker to lose the strong symbol. One solution is to link libkvm.a with --whole-archive so that the linker doesn't throw away object files it thinks are unnecessary. However that is a bit difficult to plumb since we are using the common selftests makefile rules. An easier solution is to drop libkvm.a just link selftests with all the .o files that were originally in libkvm.a. Reviewed-by: Peter Xu <[email protected]> Signed-off-by: David Matlack <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: selftests: Drop unnecessary rule for STATIC_LIBSDavid Matlack1-1/+0
Drop the "all: $(STATIC_LIBS)" rule. The KVM selftests already depend on $(STATIC_LIBS), so there is no reason to have an extra "all" rule. Suggested-by: Peter Xu <[email protected]> Signed-off-by: David Matlack <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: selftests: Add a helper to check EPT/VPID capabilitiesDavid Matlack1-1/+6
Create a small helper function to check if a given EPT/VPID capability is supported. This will be re-used in a follow-up commit to check for 1G page support. No functional change intended. Reviewed-by: Peter Xu <[email protected]> Signed-off-by: David Matlack <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: selftests: Move VMX_EPT_VPID_CAP_AD_BITS to vmx.hDavid Matlack2-3/+2
This is a VMX-related macro so move it to vmx.h. While here, open code the mask like the rest of the VMX bitmask macros. No functional change intended. Reviewed-by: Peter Xu <[email protected]> Signed-off-by: David Matlack <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: selftests: Refactor nested_map() to specify target levelDavid Matlack1-4/+12
Refactor nested_map() to specify that it explicityl wants 4K mappings (the existing behavior) and push the implementation down into __nested_map(), which can be used in subsequent commits to create huge page mappings. No function change intended. Reviewed-by: Peter Xu <[email protected]> Signed-off-by: David Matlack <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: selftests: Drop stale function parameter comment for nested_map()David Matlack1-1/+0
nested_map() does not take a parameter named eptp_memslot. Drop the comment referring to it. Reviewed-by: Peter Xu <[email protected]> Signed-off-by: David Matlack <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: selftests: Add option to create 2M and 1G EPT mappingsDavid Matlack1-50/+60
The current EPT mapping code in the selftests only supports mapping 4K pages. This commit extends that support with an option to map at 2M or 1G. This will be used in a future commit to create large page mappings to test eager page splitting. No functional change intended. Signed-off-by: David Matlack <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: selftests: Replace x86_page_size with PG_LEVEL_XXDavid Matlack4-24/+29
x86_page_size is an enum used to communicate the desired page size with which to map a range of memory. Under the hood they just encode the desired level at which to map the page. This ends up being clunky in a few ways: - The name suggests it encodes the size of the page rather than the level. - In other places in x86_64/processor.c we just use a raw int to encode the level. Simplify this by adopting the kernel style of PG_LEVEL_XX enums and pass around raw ints when referring to the level. This makes the code easier to understand since these macros are very common in KVM MMU code. Signed-off-by: David Matlack <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: x86: SVM: fix nested PAUSE filtering when L0 intercepts PAUSEPaolo Bonzini2-20/+23
Commit 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") introduced passthrough support for nested pause filtering, (when the host doesn't intercept PAUSE) (either disabled with kvm module param, or disabled with '-overcommit cpu-pm=on') Before this commit, L1 KVM didn't intercept PAUSE at all; afterwards, the feature was exposed as supported by KVM cpuid unconditionally, thus if L1 could try to use it even when the L0 KVM can't really support it. In this case the fallback caused KVM to intercept each PAUSE instruction; in some cases, such intercept can slow down the nested guest so much that it can fail to boot. Instead, before the problematic commit KVM was already setting both thresholds to 0 in vmcb02, but after the first userspace VM exit shrink_ple_window was called and would reset the pause_filter_count to the default value. To fix this, change the fallback strategy - ignore the guest threshold values, but use/update the host threshold values unless the guest specifically requests disabling PAUSE filtering (either simple or advanced). Also fix a minor bug: on nested VM exit, when PAUSE filter counter were copied back to vmcb01, a dirty bit was not set. Thanks a lot to Suravee Suthikulpanit for debugging this! Fixes: 74fd41ed16fd ("KVM: x86: nSVM: support PAUSE filtering when L0 doesn't intercept PAUSE") Reported-by: Suravee Suthikulpanit <[email protected]> Tested-by: Suravee Suthikulpanit <[email protected]> Co-developed-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: x86: SVM: drop preempt-safe wrappers for avic_vcpu_load/putMaxim Levitsky3-27/+8
Now that these functions are always called with preemption disabled, remove the preempt_disable()/preempt_enable() pair inside them. No functional change intended. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: x86: disable preemption around the call to kvm_arch_vcpu_{un|}blockingMaxim Levitsky1-2/+6
On SVM, if preemption happens right after the call to finish_rcuwait but before call to kvm_arch_vcpu_unblocking on SVM/AVIC, it itself will re-enable AVIC, and then we will try to re-enable it again in kvm_arch_vcpu_unblocking which will lead to a warning in __avic_vcpu_load. The same problem can happen if the vCPU is preempted right after the call to kvm_arch_vcpu_blocking but before the call to prepare_to_rcuwait and in this case, we will end up with AVIC enabled during sleep - Ooops. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: x86: disable preemption while updating apicv inhibitionMaxim Levitsky1-0/+2
Currently nothing prevents preemption in kvm_vcpu_update_apicv. On SVM, If the preemption happens after we update the vcpu->arch.apicv_active, the preemption itself will 'update' the inhibition since the AVIC will be first disabled on vCPU unload and then enabled, when the current task is loaded again. Then we will try to update it again, which will lead to a warning in __avic_vcpu_load, that the AVIC is already enabled. Fix this by disabling preemption in this code. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: x86: SVM: fix avic_kick_target_vcpus_fastMaxim Levitsky1-36/+69
There are two issues in avic_kick_target_vcpus_fast 1. It is legal to issue an IPI request with APIC_DEST_NOSHORT and a physical destination of 0xFF (or 0xFFFFFFFF in case of x2apic), which must be treated as a broadcast destination. Fix this by explicitly checking for it. Also don’t use ‘index’ in this case as it gives no new information. 2. It is legal to issue a logical IPI request to more than one target. Index field only provides index in physical id table of first such target and therefore can't be used before we are sure that only a single target was addressed. Instead, parse the ICRL/ICRH, double check that a unicast interrupt was requested, and use that info to figure out the physical id of the target vCPU. At that point there is no need to use the index field as well. In addition to fixing the above issues, also skip the call to kvm_apic_match_dest. It is possible to do this now, because now as long as AVIC is not inhibited, it is guaranteed that none of the vCPUs changed their apic id from its default value. This fixes boot of windows guest with AVIC enabled because it uses IPI with 0xFF destination and no destination shorthand. Fixes: 7223fd2d5338 ("KVM: SVM: Use target APIC ID to complete AVIC IRQs when possible") Cc: [email protected] Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: x86: SVM: remove avic's broken code that updated APIC IDMaxim Levitsky1-35/+0
AVIC is now inhibited if the guest changes the apic id, and therefore this code is no longer needed. There are several ways this code was broken, including: 1. a vCPU was only allowed to change its apic id to an apic id of an existing vCPU. 2. After such change, the vCPU whose apic id entry was overwritten, could not correctly change its own apic id, because its own entry is already overwritten. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: x86: inhibit APICv/AVIC on changes to APIC ID or APIC baseMaxim Levitsky4-6/+37
Neither of these settings should be changed by the guest and it is a burden to support it in the acceleration code, so just inhibit this code instead. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-06-09KVM: x86: document AVIC/APICv inhibit reasonsMaxim Levitsky1-2/+57
These days there are too many AVIC/APICv inhibit reasons, and it doesn't hurt to have some documentation for them. Signed-off-by: Maxim Levitsky <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>