aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-08-01selftests: kvm: set rax before vmcallAndrei Vagin1-1/+1
kvm_hypercall has to place the hypercall number in rax. Trace events show that kvm_pv_test doesn't work properly: kvm_pv_test-53132: kvm_hypercall: nr 0x0 a0 0x0 a1 0x0 a2 0x0 a3 0x0 kvm_pv_test-53132: kvm_hypercall: nr 0x0 a0 0x0 a1 0x0 a2 0x0 a3 0x0 kvm_pv_test-53132: kvm_hypercall: nr 0x0 a0 0x0 a1 0x0 a2 0x0 a3 0x0 With this change, it starts working as expected: kvm_pv_test-54285: kvm_hypercall: nr 0x5 a0 0x0 a1 0x0 a2 0x0 a3 0x0 kvm_pv_test-54285: kvm_hypercall: nr 0xa a0 0x0 a1 0x0 a2 0x0 a3 0x0 kvm_pv_test-54285: kvm_hypercall: nr 0xb a0 0x0 a1 0x0 a2 0x0 a3 0x0 Signed-off-by: Andrei Vagin <[email protected]> Message-Id: <[email protected]> Fixes: ac4a4d6de22e ("selftests: kvm: test enforcement of paravirtual cpuid features") Signed-off-by: Paolo Bonzini <[email protected]>
2022-08-01selftests: KVM: Add exponent check for boolean statsOliver Upton1-0/+6
The only sensible exponent for a boolean stat is 0. Add a test assertion requiring all boolean statistics to have an exponent of 0. Signed-off-by: Oliver Upton <[email protected]> Reviewed-by: Andrew Jones <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-08-01selftests: KVM: Provide descriptive assertions in kvm_binary_stats_testOliver Upton1-8/+16
As it turns out, tests sometimes fail. When that is the case, packing the test assertion with as much relevant information helps track down the problem more quickly. Sharpen up the stat descriptor assertions in kvm_binary_stats_test to more precisely describe the reason for the test assertion and which stat is to blame. Signed-off-by: Oliver Upton <[email protected]> Reviewed-by: Andrew Jones <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-08-01selftests: KVM: Check stat name before other fieldsOliver Upton1-3/+5
In order to provide more useful test assertions that describe the broken stats descriptor, perform sanity check on the stat name before any other descriptor field. While at it, avoid dereferencing the name field if the sanity check fails as it is more likely to contain garbage. Signed-off-by: Oliver Upton <[email protected]> Reviewed-by: Andrew Jones <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-08-01Merge branch 'funeth-tx-xdp-frags'David S. Miller1-58/+77
Dimitris Michailidis says: ==================== net/funeth: Tx support for XDP with frags Support XDP with fragments for XDP_TX and ndo_xdp_xmit. The first three patches rework existing code used by the skb path to make it suitable also for XDP. With these all the callees of the main Tx XDP function, fun_xdp_tx(), are fragment-capable. The last patch updates fun_xdp_tx() to handle fragments. ==================== Tested-by: Ido Schimmel <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-08-01net/funeth: Tx handling of XDP with fragments.Dimitris Michailidis1-10/+20
By now all the functions fun_xdp_tx() calls are shared with the skb path and thus are fragment-capable. Update fun_xdp_tx(), that up to now has been passing just one buffer, to check for fragments and call accordingly. This makes XDP_TX and ndo_xdp_xmit fragment-capable. Signed-off-by: Dimitris Michailidis <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-08-01net/funeth: Unify skb/XDP packet mapping.Dimitris Michailidis1-15/+17
Instead of passing an skb to the mapping function pass an skb_shared_info plus an additional address/length pair. This makes it usable for both skbs and XDP. Call it from the XDP path and adjust the skb path. Signed-off-by: Dimitris Michailidis <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-08-01net/funeth: Unify skb/XDP gather list writing.Dimitris Michailidis1-15/+31
Extract the Tx gather list writing code that skbs use into a utility function and use it also for XDP. Signed-off-by: Dimitris Michailidis <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-08-01net/funeth: Unify skb/XDP Tx packet unmapping.Dimitris Michailidis1-21/+12
Current XDP unmapping is a subset of its skb analog, dealing with only one buffer. In preparation for multi-frag XDP rename the skb function and use it also for XDP. The XDP version is removed. Signed-off-by: Dimitris Michailidis <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-08-01KVM: x86/mmu: remove unused variablePaolo Bonzini1-2/+0
The last use of 'pfn' went away with the same-named argument to host_pfn_mapping_level; now that the hugepage level is obtained exclusively from the host page tables, kvm_mmu_zap_collapsible_spte does not need to know host pfns at all. Fixes: a8ac499bb6ab ("KVM: x86/mmu: Don't require refcounted "struct page" to create huge SPTEs") Signed-off-by: Paolo Bonzini <[email protected]>
2022-08-01Merge branch 'devlink-parallel-commands'David S. Miller5-107/+21
Jiri Pirko says: ==================== net: devlink: allow parallel commands on multiple devlinks Aim of this patchset is to remove devlink_mutex and eventually to enable parallel ops on devlink netlink interface. ==================== Tested-by: Ido Schimmel <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-08-01net: devlink: enable parallel ops on netlink interfaceJiri Pirko1-0/+1
As the devlink_mutex was removed and all devlink instances are protected individually by devlink->lock mutex, allow the netlink ops to run in parallel and therefore allow user to execute commands on multiple devlink instances simultaneously. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-08-01net: devlink: remove devlink_mutexJiri Pirko1-76/+4
All accesses to devlink structure from userspace and drivers are locked with devlink->lock instance mutex. Also, devlinks xa_array iteration is taken care of by iteration helpers taking devlink reference. Therefore, remove devlink_mutex as it is no longer needed. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-08-01net: devlink: convert reload command to take implicit devlink->lockJiri Pirko5-31/+5
Convert reload command to behave the same way as the rest of the commands and let if be called with devlink->lock held. Remove the temporary devl_lock taking from drivers. As the DEVLINK_NL_FLAG_NO_LOCK flag is no longer used, remove it alongside. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-08-01net: devlink: introduce "unregistering" mark and use it during devlinks ↵Jiri Pirko1-0/+11
iteration Add new mark called "unregistering" to be set at the beginning of devlink_unregister() function. Check this mark during devlinks iteration in order to prevent getting a reference of devlink which is being currently unregistered. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-08-01udp: Remove redundant __udp_sysctl_init() call from udp_init().Kuniyuki Iwashima1-7/+1
__udp_sysctl_init() is called for init_net via udp_sysctl_ops. While at it, we can rename __udp_sysctl_init() to udp_sysctl_init(). Fixes: 1e8029515816 ("udp: Move the udp sysctl to namespace.") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-08-01Merge branch '40GbE' of ↵David S. Miller2-2/+50
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-07-29 This series contains updates to iavf driver only. Przemyslaw prevents setting of TC max rate below minimum supported values and reports updated queue values when setting up TCs. --- v2: Dropped patch 3 (hw-tc-offload check) ==================== Signed-off-by: David S. Miller <[email protected]>
2022-08-01net/rds: Use PTR_ERR instead of IS_ERR for rdsdebug()Li Qiong1-1/+1
If 'local_odp_mr->r_trans_private' is a error code, it is better to print the error code than to print the value of IS_ERR(). Signed-off-by: Li Qiong <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-08-01Merge tag 'linux-can-next-for-5.20-20220731' of ↵David S. Miller53-379/+541
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2022-07-31 this is a pull request of 36 patches for net-next/master. The 1st patch is by me and fixes a typo in the mcp251xfd driver. Vincent Mailhol contributes a series of 9 patches, which clean up the drivers to make use of KBUILD_MODNAME instead of hard coded names and remove DRV_VERSION. Followed by 3 patches by Vincent Mailhol that directly set the ethtool_ops in instead of calling a function in the slcan, c_can and flexcan driver. Vincent Mailhol contributes a KBUILD_MODNAME and pr_fmt cleanup patch for the slcan driver. Dario Binacchi contributes 6 patches to clean up the driver and remove the legacy driver infrastructure. The next 14 patches are by Vincent Mailhol and target the various drivers, they add ethtool support and reporting of timestamping capabilities. Another patch by Vincent Mailhol for the etas_es58x driver to remove useless calls to usb_fill_bulk_urb(). The last patch is by Christophe JAILLET and fixes a broken link to Documentation in the can327 driver. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-08-01Merge tag 'kvmarm-5.20' of ↵Paolo Bonzini41-950/+1592
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 5.20: - Unwinder implementations for both nVHE modes (classic and protected), complete with an overflow stack - Rework of the sysreg access from userspace, with a complete rewrite of the vgic-v3 view to allign with the rest of the infrastructure - Disagregation of the vcpu flags in separate sets to better track their use model. - A fix for the GICv2-on-v3 selftest - A small set of cosmetic fixes
2022-08-01Merge remote-tracking branch 'kvm/next' into kvm-next-5.20Paolo Bonzini238-7060/+12137
KVM/s390, KVM/x86 and common infrastructure changes for 5.20 x86: * Permit guests to ignore single-bit ECC errors * Fix races in gfn->pfn cache refresh; do not pin pages tracked by the cache * Intel IPI virtualization * Allow getting/setting pending triple fault with KVM_GET/SET_VCPU_EVENTS * PEBS virtualization * Simplify PMU emulation by just using PERF_TYPE_RAW events * More accurate event reinjection on SVM (avoid retrying instructions) * Allow getting/setting the state of the speaker port data bit * Refuse starting the kvm-intel module if VM-Entry/VM-Exit controls are inconsistent * "Notify" VM exit (detect microarchitectural hangs) for Intel * Cleanups for MCE MSR emulation s390: * add an interface to provide a hypervisor dump for secure guests * improve selftests to use TAP interface * enable interpretive execution of zPCI instructions (for PCI passthrough) * First part of deferred teardown * CPU Topology * PV attestation * Minor fixes Generic: * new selftests API using struct kvm_vcpu instead of a (vm, id) tuple x86: * Use try_cmpxchg64 instead of cmpxchg64 * Bugfixes * Ignore benign host accesses to PMU MSRs when PMU is disabled * Allow disabling KVM's "MONITOR/MWAIT are NOPs!" behavior * x86/MMU: Allow NX huge pages to be disabled on a per-vm basis * Port eager page splitting to shadow MMU as well * Enable CMCI capability by default and handle injected UCNA errors * Expose pid of vcpu threads in debugfs * x2AVIC support for AMD * cleanup PIO emulation * Fixes for LLDT/LTR emulation * Don't require refcounted "struct page" to create huge SPTEs x86 cleanups: * Use separate namespaces for guest PTEs and shadow PTEs bitmasks * PIO emulation * Reorganize rmap API, mostly around rmap destruction * Do not workaround very old KVM bugs for L0 that runs with nesting enabled * new selftests API for CPUID
2022-08-01xen: don't require virtio with grants for non-PV guestsJuergen Gross7-11/+38
Commit fa1f57421e0b ("xen/virtio: Enable restricted memory access using Xen grant mappings") introduced a new requirement for using virtio devices: the backend now needs to support the VIRTIO_F_ACCESS_PLATFORM feature. This is an undue requirement for non-PV guests, as those can be operated with existing backends without any problem, as long as those backends are running in dom0. Per default allow virtio devices without grant support for non-PV guests. On Arm require VIRTIO_F_ACCESS_PLATFORM for devices having been listed in the device tree to use grants. Add a new config item to always force use of grants for virtio. Fixes: fa1f57421e0b ("xen/virtio: Enable restricted memory access using Xen grant mappings") Reported-by: Viresh Kumar <[email protected]> Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Oleksandr Tyshchenko <[email protected]> Tested-by: Oleksandr Tyshchenko <[email protected]> # Arm64 guest using Xen Reviewed-by: Stefano Stabellini <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2022-08-01kernel: remove platform_has() infrastructureJuergen Gross6-60/+1
The only use case of the platform_has() infrastructure has been removed again, so remove the whole feature. Signed-off-by: Juergen Gross <[email protected]> Tested-by: Oleksandr Tyshchenko <[email protected]> # Arm64 guest using Xen Reviewed-by: Stefano Stabellini <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2022-08-01virtio: replace restricted mem access flag with callbackJuergen Gross9-13/+51
Instead of having a global flag to require restricted memory access for all virtio devices, introduce a callback which can select that requirement on a per-device basis. For convenience add a common function returning always true, which can be used for use cases like SEV. Per default use a callback always returning false. As the callback needs to be set in early init code already, add a virtio anchor which is builtin in case virtio is enabled. Signed-off-by: Juergen Gross <[email protected]> Tested-by: Oleksandr Tyshchenko <[email protected]> # Arm64 guest using Xen Reviewed-by: Stefano Stabellini <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2022-08-01xen: Fix spelling mistakeZhang Jiaming1-2/+2
Change 'maped' to 'mapped'. Change 'unmaped' to 'unmapped'. Signed-off-by: Zhang Jiaming <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2022-08-01xen/manage: Use orderly_reboot() to rebootRoss Lagerwall1-1/+1
Currently when the toolstack issues a reboot, it gets translated into a call to ctrl_alt_del(). But tying reboot to ctrl-alt-del means rebooting may fail if e.g. the user has masked the ctrl-alt-del.target under systemd. A previous attempt to fix this issue made a change that sets the kernel.ctrl-alt-del sysctl to 1 before ctrl_alt_del() is called. However, this doesn't give userspace the opportunity to block rebooting or even do any cleanup or syncing. Instead, call orderly_reboot() which will call the "reboot" command, giving userspace the opportunity to block it or perform the usual reboot process while being independent of the ctrl-alt-del behaviour. It also matches what happens in the shutdown case. Signed-off-by: Ross Lagerwall <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2022-07-31csky: abiv1: Fixup compile errorGuo Ren1-0/+6
LD vmlinux.o arch/csky/lib/string.o: In function `memmove': string.c:(.text+0x108): multiple definition of `memmove' lib/string.o:string.c:(.text+0x7e8): first defined here arch/csky/lib/string.o: In function `memset': string.c:(.text+0x148): multiple definition of `memset' lib/string.o:string.c:(.text+0x2ac): first defined here scripts/Makefile.vmlinux_o:68: recipe for target 'vmlinux.o' failed make[4]: *** [vmlinux.o] Error 1 Fixes: e4df2d5e852a ("csky: Add C based string functions") Signed-off-by: Guo Ren <[email protected]> Signed-off-by: Guo Ren <[email protected]> Cc: <[email protected]>
2022-07-31csky: cmpxchg: Coding convention for BUILD_BUG()Guo Ren1-6/+5
Use BUILD_BUG() instead of the custom bad_xchg. Signed-off-by: Guo Ren <[email protected]> Signed-off-by: Guo Ren <[email protected]>
2022-07-31Linux 5.19Linus Torvalds1-1/+1
2022-07-31can: can327: fix a broken link to DocumentationChristophe JAILLET1-1/+1
Since commit 482a4360c56a ("docs: networking: convert netdevices.txt to ReST"), Documentation/networking/netdevices.txt has been replaced by Documentation/networking/netdevices.rst. Update the comment accordingly to avoid a 'make htmldocs' warning. Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/all/6a54aff884ea4f84b661527d75aabd6632140715.1659249135.git.christophe.jaillet@wanadoo.fr Fixes: 43da2f07622f ("can: can327: CAN/ldisc driver for ELM327 based OBD-II adapters") Signed-off-by: Marc Kleine-Budde <[email protected]>
2022-07-31Merge tag 'clk-fixes-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk fix from Stephen Boyd: "One-liner fix of a NULL pointer deref in the Allwinner clk driver" * tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: clk: sunxi-ng: Fix H6 RTC clock definition
2022-07-31Merge tag 'x86_urgent_for_v5.19' of ↵Linus Torvalds6-33/+22
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Update the 'mitigations=' kernel param documentation - Check the IBPB feature flag before enabling IBPB in firmware calls because cloud vendors' fantasy when it comes to creating guest configurations is unlimited - Unexport sev_es_ghcb_hv_call() before 5.19 releases now that HyperV doesn't need it anymore - Remove dead CONFIG_* items * tag 'x86_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV" x86/configs: Update configs in x86_debug.config
2022-07-31Merge tag 'locking_urgent_for_v5.19' of ↵Linus Torvalds1-10/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Borislav Petkov: - Avoid rwsem lockups in certain situations when handling the handoff bit * tag 'locking_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
2022-07-31Merge tag 'edac_urgent_for_v5.19' of ↵Linus Torvalds2-22/+33
git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fixes from Borislav Petkov: - Relax the condition under which the DIMM label in ghes_edac is set in order to accomodate an HPE BIOS which sets only the device but not the bank - Two forgotten fixes to synopsys_edac when handling error interrupts * tag 'edac_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/ghes: Set the DIMM label unconditionally EDAC/synopsys: Re-enable the error interrupts on v3 hw EDAC/synopsys: Use the correct register to disable the error interrupt on v3 hw
2022-07-31erofs: update ctx->pos for every emitted direntHongnan Li1-9/+7
erofs_readdir update ctx->pos after filling a batch of dentries and it may cause dir/files duplication for NFS readdirplus which depends on ctx->pos to fill dir correctly. So update ctx->pos for every emitted dirent in erofs_fill_dentries to fix it. Also fix the update of ctx->pos when the initial file position has exceeded nameoff. Fixes: 3e917cc305c6 ("erofs: make filesystem exportable") Signed-off-by: Hongnan Li <[email protected]> Signed-off-by: Jeffle Xu <[email protected]> Reviewed-by: Gao Xiang <[email protected]> Reviewed-by: Chao Yu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Gao Xiang <[email protected]>
2022-07-31csky: Enable ARCH_INLINE_READ*/WRITE*/SPIN*Guo Ren1-0/+26
Enable ARCH_INLINE_READ*/WRITE*/SPIN* when !PREEMPTION, it is copied from arch/arm64. It could reduce procedure calls and improves performance. Signed-off-by: Guo Ren <[email protected]> Signed-off-by: Guo Ren <[email protected]>
2022-07-31csky: Add qspinlock supportGuo Ren5-2/+44
Enable qspinlock by the requirements mentioned in a8ad07e5240c9 ("asm-generic: qspinlock: Indicate the use of mixed-size atomics"). C-SKY only has "ldex/stex" for all atomic operations. So csky give a strong forward guarantee for "ldex/stex." That means when ldex grabbed the cache line into $L1, it would block other cores from snooping the address with several cycles. The atomic_fetch_add & xchg16 has the same forward guarantee level in C-SKY. Qspinlock has better code size and performance in a fast path. Signed-off-by: Guo Ren <[email protected]> Signed-off-by: Guo Ren <[email protected]>
2022-07-31staging: r8188eu: fix potential uninitialised variable use in rtw_pwrctrl.cPhillip Potter1-3/+3
Set ret to 0 (success) before entering first if statement, thereby assuring that even if the device is not associated and further checks pass, we do not then end up returning the uninitialized value of ret. This assignment is deliberately now directly before the if statement, in order to keep it clear what is happening as opposed to having it as an initialization at the start of the function like it was originally. Also add a comment to make it clear this first if block is currently a success path. As a side note, smatch does not trigger warnings for this change, for me at least. Within core/rtw_pwrctrl.c in the rtw_pwr_wakeup function, I previously dropped the initialization of 'ret' (int ret = 0;) in favour of its assignment which happens inside the first if block directly before its corresponding goto. This was the cause of this bug, and was introduced by: commit f3a76018dd55 ("staging: r8188eu: remove initializer from ret in rtw_pwr_wakeup"). Fixes: f3a76018dd55 ("staging: r8188eu: remove initializer from ret in rtw_pwr_wakeup") Reported-by: kernel test robot <[email protected]> Signed-off-by: Phillip Potter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-07-30Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-armLinus Torvalds2-9/+9
Pull ARM fixes from Russell King: "Last set of ARM fixes for 5.19: - fix for MAX_DMA_ADDRESS overflow - fix for find_*_bit performing an out of bounds memory access" * tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm: ARM: findbit: fix overflowing offset ARM: 9216/1: Fix MAX_DMA_ADDRESS overflow
2022-07-30csky: Add jump-label implementationGuo Ren4-0/+104
Add jump-label implementation for static branch Signed-off-by: Guo Ren <[email protected]> Signed-off-by: Guo Ren <[email protected]>
2022-07-30locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by ↵Waiman Long1-10/+20
first waiter With commit d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent"), the writer that sets the handoff bit can be interrupted out without clearing the bit if the wait queue isn't empty. This disables reader and writer optimistic lock spinning and stealing. Now if a non-first writer in the queue is somehow woken up or a new waiter enters the slowpath, it can't acquire the lock. This is not the case before commit d257cc8cb8d5 as the writer that set the handoff bit will clear it when exiting out via the out_nolock path. This is less efficient as the busy rwsem stays in an unlock state for a longer time. In some cases, this new behavior may cause lockups as shown in [1] and [2]. This patch allows a non-first writer to ignore the handoff bit if it is not originally set or initiated by the first waiter. This patch is shown to be effective in fixing the lockup problem reported in [1]. [1] https://lore.kernel.org/lkml/[email protected]/ [2] https://lore.kernel.org/lkml/[email protected]/ Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent") Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: John Donnelly <[email protected]> Tested-by: Mel Gorman <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-07-29Merge tag 'mlx5-updates-2022-07-28' of ↵Jakub Kicinski30-511/+1265
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-07-28 Misc updates to mlx5 driver: 1) Gal corrects to use skb_tcp_all_headers on encapsulated skbs. 2) Roi Adds the support for offloading standalone police actions. 3) lama, did some refactoring to minimize code coupling with mlx5e_priv "god object" in some of the follows, and converts some of the objects to pointers to preserve on memory when these objects aren't needed. This is part one of two parts series. * tag 'mlx5-updates-2022-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: Move mlx5e_init_l2_addr to en_main net/mlx5e: Split en_fs ndo's and move to en_main net/mlx5e: Separate mlx5e_set_rx_mode_work and move caller to en_main net/mlx5e: Add mdev to flow_steering struct net/mlx5e: Report flow steering errors with mdev err report API net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer net/mlx5e: Allocate VLAN and TC for featured profiles only net/mlx5e: Make mlx5e_tc_table private net/mlx5e: Convert mlx5e_tc_table member of mlx5e_flow_steering to pointer net/mlx5e: TC, Support tc action api for police net/mlx5e: TC, Separate get/update/replace meter functions net/mlx5e: Add red and green counters for metering net/mlx5e: TC, Allocate post meter ft per rule net/mlx5: DR, Add support for flow metering ASO net/mlx5e: Fix wrong use of skb_tcp_all_headers() with encapsulation ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-30fs/dcache: Move wakeup out of i_seq_dir write held region.Sebastian Andrzej Siewior1-5/+5
__d_add() and __d_move() wake up waiters on dentry::d_wait from within the i_seq_dir write held region. This violates the PREEMPT_RT constraints as the wake up acquires wait_queue_head::lock which is a "sleeping" spinlock on RT. There is no requirement to do so. __d_lookup_unhash() has cleared DCACHE_PAR_LOOKUP and dentry::d_wait and returned the now unreachable wait queue head pointer to the caller, so the actual wake up can be postponed until the i_dir_seq write side critical section is left. The only requirement is that dentry::lock is held across the whole sequence including the wake up. The previous commit includes an analysis why this is considered safe. Move the wake up past end_dir_add() which leaves the i_dir_seq write side critical section and enables preemption. For non RT kernels there is no difference because preemption is still disabled due to dentry::lock being held, but it shortens the time between wake up and unlocking dentry::lock, which reduces the contention for the woken up waiter. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Al Viro <[email protected]>
2022-07-30fs/dcache: Move the wakeup from __d_lookup_done() to the caller.Sebastian Andrzej Siewior2-13/+31
__d_lookup_done() wakes waiters on dentry->d_wait. On PREEMPT_RT we are not allowed to do that with preemption disabled, since the wakeup acquired wait_queue_head::lock, which is a "sleeping" spinlock on RT. Calling it under dentry->d_lock is not a problem, since that is also a "sleeping" spinlock on the same configs. Unfortunately, two of its callers (__d_add() and __d_move()) are holding more than just ->d_lock and that needs to be dealt with. The key observation is that wakeup can be moved to any point before dropping ->d_lock. As a first step to solve this, move the wake up outside of the hlist_bl_lock() held section. This is safe because: Waiters get inserted into ->d_wait only after they'd taken ->d_lock and observed DCACHE_PAR_LOOKUP in flags. As long as they are woken up (and evicted from the queue) between the moment __d_lookup_done() has removed DCACHE_PAR_LOOKUP and dropping ->d_lock, we are safe, since the waitqueue ->d_wait points to won't get destroyed without having __d_lookup_done(dentry) called (under ->d_lock). ->d_wait is set only by d_alloc_parallel() and only in case when it returns a freshly allocated in-lookup dentry. Whenever that happens, we are guaranteed that __d_lookup_done() will be called for resulting dentry (under ->d_lock) before the wq in question gets destroyed. With two exceptions wq lives in call frame of the caller of d_alloc_parallel() and we have an explicit d_lookup_done() on the resulting in-lookup dentry before we leave that frame. One of those exceptions is nfs_call_unlink(), where wq is embedded into (dynamically allocated) struct nfs_unlinkdata. It is destroyed in nfs_async_unlink_release() after an explicit d_lookup_done() on the dentry wq went into. Remaining exception is d_add_ci(). There wq is what we'd found in ->d_wait of d_add_ci() argument. Callers of d_add_ci() are two instances of ->d_lookup() and they must have been given an in-lookup dentry. Which means that they'd been called by __lookup_slow() or lookup_open(), with wq in the call frame of one of those. Result of d_alloc_parallel() in d_add_ci() is fed to d_splice_alias(), which either returns non-NULL (and d_add_ci() does d_lookup_done()) or feeds dentry to __d_add() that will do __d_lookup_done() under ->d_lock. That concludes the analysis. Let __d_lookup_unhash(): 1) Lock the lookup hash and clear DCACHE_PAR_LOOKUP 2) Unhash the dentry 3) Retrieve and clear dentry::d_wait 4) Unlock the hash and return the retrieved waitqueue head pointer 5) Let the caller handle the wake up. 6) Rename __d_lookup_done() to __d_lookup_unhash_wake() to enforce build failures for OOT code that used __d_lookup_done() and is not aware of the new return value. This does not yet solve the PREEMPT_RT problem completely because preemption is still disabled due to i_dir_seq being held for write. This will be addressed in subsequent steps. An alternative solution would be to switch the waitqueue to a simple waitqueue, but aside of Linus not being a fan of them, moving the wake up closer to the place where dentry::lock is unlocked reduces lock contention time for the woken up waiter. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Al Viro <[email protected]>
2022-07-30fs/dcache: Disable preemption on i_dir_seq write side on PREEMPT_RTSebastian Andrzej Siewior1-1/+11
i_dir_seq is a sequence counter with a lock which is represented by the lowest bit. The writer atomically updates the counter which ensures that it can be modified by only one writer at a time. This requires preemption to be disabled across the write side critical section. On !PREEMPT_RT kernels this is implicit by the caller acquiring dentry::lock. On PREEMPT_RT kernels spin_lock() does not disable preemption which means that a preempting writer or reader would live lock. It's therefore required to disable preemption explicitly. An alternative solution would be to replace i_dir_seq with a seqlock_t for PREEMPT_RT, but that comes with its own set of problems due to arbitrary lock nesting. A pure sequence count with an associated spinlock is not possible because the locks held by the caller are not necessarily related. As the critical section is small, disabling preemption is a sensible solution. Reported-by: [email protected] Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Al Viro <[email protected]>
2022-07-30d_add_ci(): make sure we don't miss d_lookup_done()Al Viro1-0/+1
All callers of d_alloc_parallel() must make sure that resulting in-lookup dentry (if any) will encounter __d_lookup_done() before the final dput(). d_add_ci() might end up creating in-lookup dentries; they are fed to d_splice_alias(), which will normally make sure they meet __d_lookup_done(). However, it is possible to end up with d_splice_alias() failing with ERR_PTR(-ELOOP) without having done so. It takes a corrupted ntfs or case-insensitive xfs image, but neither should end up with memory corruption... Signed-off-by: Al Viro <[email protected]>
2022-07-29Merge tag 'mlx5-fixes-2022-07-28' of ↵Jakub Kicinski9-33/+55
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2022-07-28 This series provides bug fixes to mlx5 driver. * tag 'mlx5-fixes-2022-07-28' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Fix driver use of uninitialized timeout net/mlx5: DR, Fix SMFS steering info dump format net/mlx5: Adjust log_max_qp to be 18 at most net/mlx5e: Modify slow path rules to go to slow fdb net/mlx5e: Fix calculations related to max MPWQE size net/mlx5e: xsk: Account for XSK RQ UMRs when calculating ICOSQ size net/mlx5e: Fix the value of MLX5E_MAX_RQ_NUM_MTTS net/mlx5e: TC, Fix post_act to not match on in_port metadata net/mlx5e: Remove WARN_ON when trying to offload an unsupported TLS cipher/version ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-29Merge branch '100GbE' of ↵Jakub Kicinski12-202/+269
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== 100GbE Intel Wired LAN Driver Updates 2022-07-28 This series contains updates to ice driver only. Michal allows for VF true promiscuous mode to be set for multiple VFs and adds clearing of promiscuous filters when VF trust is removed. Maciej refactors ice_set_features() to track/check changed features instead of constantly checking against netdev features and adds support for NETIF_F_LOOPBACK. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: allow toggling loopback mode via ndo_set_features callback ice: compress branches in ice_set_features() ice: Fix promiscuous mode not turning off ice: Introduce enabling promiscuous mode on multiple VF's ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-29Merge branch 'sfc-vf-representors-for-ef100-rx-side'Jakub Kicinski22-49/+1085
Edward Cree says: ==================== sfc: VF representors for EF100 - RX side This series adds the receive path for EF100 VF representors, plus other minor features such as statistics. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-29sfc: implement ethtool get/set RX ring size for EF100 repsEdward Cree1-0/+27
It's not truly a ring, but the maximum length of the list of queued RX SKBs is analogous to an RX ring size, so use that API to configure it. Signed-off-by: Edward Cree <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>