aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-08-19x86/kvm: Simplify FOP_SETCC()Josh Poimboeuf1-19/+4
SETCC_ALIGN and FOP_ALIGN are both 16. Remove the special casing for FOP_SETCC() and just make it a normal fastop. Signed-off-by: Josh Poimboeuf <[email protected]> Message-Id: <7c13d94d1a775156f7e36eed30509b274a229140.1660837839.git.jpoimboe@kernel.org> Signed-off-by: Paolo Bonzini <[email protected]>
2022-08-19x86/ibt, objtool: Add IBT_NOSEAL()Josh Poimboeuf2-1/+13
Add a macro which prevents a function from getting sealed if there are no compile-time references to it. Signed-off-by: Josh Poimboeuf <[email protected]> Message-Id: <20220818213927.e44fmxkoq4yj6ybn@treble> Signed-off-by: Paolo Bonzini <[email protected]>
2022-08-19KVM: Rename mmu_notifier_* to mmu_invalidate_*Chao Peng15-98/+103
The motivation of this renaming is to make these variables and related helper functions less mmu_notifier bound and can also be used for non mmu_notifier based page invalidation. mmu_invalidate_* was chosen to better describe the purpose of 'invalidating' a page that those variables are used for. - mmu_notifier_seq/range_start/range_end are renamed to mmu_invalidate_seq/range_start/range_end. - mmu_notifier_retry{_hva} helper functions are renamed to mmu_invalidate_retry{_hva}. - mmu_notifier_count is renamed to mmu_invalidate_in_progress to avoid confusion with mn_active_invalidate_count. - While here, also update kvm_inc/dec_notifier_count() to kvm_mmu_invalidate_begin/end() to match the change for mmu_notifier_count. No functional change intended. Signed-off-by: Chao Peng <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-08-19KVM: Rename KVM_PRIVATE_MEM_SLOTS to KVM_INTERNAL_MEM_SLOTSChao Peng2-4/+4
KVM_INTERNAL_MEM_SLOTS better reflects the fact those slots are KVM internally used (invisible to userspace) and avoids confusion to future private slots that can have different meaning. Signed-off-by: Chao Peng <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-08-19KVM: MIPS: remove unnecessary definition of KVM_PRIVATE_MEM_SLOTSPaolo Bonzini1-2/+0
KVM_PRIVATE_MEM_SLOTS defaults to zero, so it is not necessary to define it in MIPS's asm/kvm_host.h. Signed-off-by: Paolo Bonzini <[email protected]>
2022-08-19KVM: Move coalesced MMIO initialization (back) into kvm_create_vm()Sean Christopherson1-5/+6
Invoke kvm_coalesced_mmio_init() from kvm_create_vm() now that allocating and initializing coalesced MMIO objects is separate from registering any associated devices. Moving coalesced MMIO cleans up the last oddity where KVM does VM creation/initialization after kvm_create_vm(), and more importantly after kvm_arch_post_init_vm() is called and the VM is added to the global vm_list, i.e. after the VM is fully created as far as KVM is concerned. Originally, kvm_coalesced_mmio_init() was called by kvm_create_vm(), but the original implementation was completely devoid of error handling. Commit 6ce5a090a9a0 ("KVM: coalesced_mmio: fix kvm_coalesced_mmio_init()'s error handling" fixed the various bugs, and in doing so rightly moved the call to after kvm_create_vm() because kvm_coalesced_mmio_init() also registered the coalesced MMIO device. Commit 2b3c246a682c ("KVM: Make coalesced mmio use a device per zone") cleaned up that mess by having each zone register a separate device, i.e. moved device registration to its logical home in kvm_vm_ioctl_register_coalesced_mmio(). As a result, kvm_coalesced_mmio_init() is now a "pure" initialization helper and can be safely called from kvm_create_vm(). Opportunstically drop the #ifdef, KVM provides stubs for kvm_coalesced_mmio_{init,free}() when CONFIG_KVM_MMIO=n (s390). Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-08-19KVM: Unconditionally get a ref to /dev/kvm module when creating a VMSean Christopherson1-10/+4
Unconditionally get a reference to the /dev/kvm module when creating a VM instead of using try_get_module(), which will fail if the module is in the process of being forcefully unloaded. The error handling when try_get_module() fails doesn't properly unwind all that has been done, e.g. doesn't call kvm_arch_pre_destroy_vm() and doesn't remove the VM from the global list. Not removing VMs from the global list tends to be fatal, e.g. leads to use-after-free explosions. The obvious alternative would be to add proper unwinding, but the justification for using try_get_module(), "rmmod --wait", is completely bogus as support for "rmmod --wait", i.e. delete_module() without O_NONBLOCK, was removed by commit 3f2b9c9cdf38 ("module: remove rmmod --wait option.") nearly a decade ago. It's still possible for try_get_module() to fail due to the module dying (more like being killed), as the module will be tagged MODULE_STATE_GOING by "rmmod --force", i.e. delete_module(..., O_TRUNC), but playing nice with forced unloading is an exercise in futility and gives a falsea sense of security. Using try_get_module() only prevents acquiring _new_ references, it doesn't magically put the references held by other VMs, and forced unloading doesn't wait, i.e. "rmmod --force" on KVM is all but guaranteed to cause spectacular fireworks; the window where KVM will fail try_get_module() is tiny compared to the window where KVM is building and running the VM with an elevated module refcount. Addressing KVM's inability to play nice with "rmmod --force" is firmly out-of-scope. Forcefully unloading any module taints kernel (for obvious reasons) _and_ requires the kernel to be built with CONFIG_MODULE_FORCE_UNLOAD=y, which is off by default and comes with the amusing disclaimer that it's "mainly for kernel developers and desperate users". In other words, KVM is free to scoff at bug reports due to using "rmmod --force" while VMs may be running. Fixes: 5f6de5cbebee ("KVM: Prevent module exit until all VMs are freed") Cc: [email protected] Cc: David Matlack <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-08-19KVM: Properly unwind VM creation if creating debugfs failsSean Christopherson1-8/+8
Properly unwind VM creation if kvm_create_vm_debugfs() fails. A recent change to invoke kvm_create_vm_debug() in kvm_create_vm() was led astray by buggy try_get_module() handling adding by commit 5f6de5cbebee ("KVM: Prevent module exit until all VMs are freed"). The debugfs error path effectively inherits the bad error path of try_module_get(), e.g. KVM leaves the to-be-free VM on vm_list even though KVM appears to do the right thing by calling module_put() and falling through. Opportunistically hoist kvm_create_vm_debugfs() above the call to kvm_arch_post_init_vm() so that the "post-init" arch hook is actually invoked after the VM is initialized (ignoring kvm_coalesced_mmio_init() for the moment). x86 is the only non-nop implementation of the post-init hook, and it doesn't allocate/initialize any objects that are reachable via debugfs code (spawns a kthread worker for the NX huge page mitigation). Leave the buggy try_get_module() alone for now, it will be fixed in a separate commit. Fixes: b74ed7a68ec1 ("KVM: Actually create debugfs in kvm_create_vm()") Reported-by: [email protected] Cc: Oliver Upton <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Reviewed-by: Oliver Upton <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-08-18Merge tag 'net-6.0-rc2' of ↵Linus Torvalds65-793/+2351
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from netfilter. Current release - regressions: - tcp: fix cleanup and leaks in tcp_read_skb() (the new way BPF socket maps get data out of the TCP stack) - tls: rx: react to strparser initialization errors - netfilter: nf_tables: fix scheduling-while-atomic splat - net: fix suspicious RCU usage in bpf_sk_reuseport_detach() Current release - new code bugs: - mlxsw: ptp: fix a couple of races, static checker warnings and error handling Previous releases - regressions: - netfilter: - nf_tables: fix possible module reference underflow in error path - make conntrack helpers deal with BIG TCP (skbs > 64kB) - nfnetlink: re-enable conntrack expectation events - net: fix potential refcount leak in ndisc_router_discovery() Previous releases - always broken: - sched: cls_route: disallow handle of 0 - neigh: fix possible local DoS due to net iface start/stop loop - rtnetlink: fix module refcount leak in rtnetlink_rcv_msg - sched: fix adding qlen to qcpu->backlog in gnet_stats_add_queue_cpu - virtio_net: fix endian-ness for RSS - dsa: mv88e6060: prevent crash on an unused port - fec: fix timer capture timing in `fec_ptp_enable_pps()` - ocelot: stats: fix races, integer wrapping and reading incorrect registers (the change of register definitions here accounts for bulk of the changed LoC in this PR)" * tag 'net-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (77 commits) net: moxa: MAC address reading, generating, validity checking tcp: handle pure FIN case correctly tcp: refactor tcp_read_skb() a bit tcp: fix tcp_cleanup_rbuf() for tcp_read_skb() tcp: fix sock skb accounting in tcp_read_skb() igb: Add lock to avoid data race dt-bindings: Fix incorrect "the the" corrections net: genl: fix error path memory leak in policy dumping stmmac: intel: Add a missing clk_disable_unprepare() call in intel_eth_pci_remove() net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_xdp_run net/mlx5e: Allocate flow steering storage during uplink initialization net: mscc: ocelot: report ndo_get_stats64 from the wraparound-resistant ocelot->stats net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offset net: mscc: ocelot: make struct ocelot_stat_layout array indexable net: mscc: ocelot: fix race between ndo_get_stats64 and ocelot_check_stats_work net: mscc: ocelot: turn stats_lock into a spinlock net: mscc: ocelot: fix address of SYS_COUNT_TX_AGING counter net: mscc: ocelot: fix incorrect ndo_get_stats64 packet counters net: dsa: felix: fix ethtool 256-511 and 512-1023 TX packet counters net: dsa: don't warn in dsa_port_set_state_now() when driver doesn't support it ...
2022-08-18Merge tag 'linux-kselftest-next-6.0-rc2' of ↵Linus Torvalds1-2/+5
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fix from Shuah Khan: - fix landlock test build regression * tag 'linux-kselftest-next-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/landlock: fix broken include of linux/landlock.h
2022-08-18Merge tag 'trace-rtla-v6.0' of ↵Linus Torvalds4-33/+43
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull rtla tool fixes from Steven Rostedt: "Fixes for the Real-Time Linux Analysis tooling: - Fix tracer name in comments and prints - Fix setting up symlinks - Allow extra flags to be set in build - Consolidate and show all necessary libraries not found in build error" * tag 'trace-rtla-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: rtla: Consolidate and show all necessary libraries that failed for building tools/rtla: Build with EXTRA_{C,LD}FLAGS tools/rtla: Fix command symlinks rtla: Fix tracer name
2022-08-19Merge tag 'amd-drm-fixes-6.0-2022-08-17' of ↵Dave Airlie108-1184/+1758
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-6.0-2022-08-17: amdgpu: - Revert some DML stack changes - Rounding fixes in KFD allocations - atombios vram info table parsing fix - DCN 3.1.4 fixes - Clockgating fixes for various new IPs - SMU 13.0.4 fixes - DCN 3.1.4 FP fixes - TMDS fixes for YCbCr420 4k modes - DCN 3.2.x fixes - USB 4 fixes - SMU 13.0 fixes - SMU driver unload memory leak fixes - Display orientation fix - Regression fix for generic fbdev conversion - SDMA 6.x fixes - SR-IOV fixes - IH 6.x fixes - Use after free fix in bo list handling - Revert pipe1 support - XGMI hive reset fix amdkfd: - Fix potential crach in kfd_create_indirect_link_prop() Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-08-18riscv: traps: add missing prototypeConor Dooley2-1/+4
Sparse complains: arch/riscv/kernel/traps.c:213:6: warning: symbol 'shadow_stack' was not declared. Should it be static? The variable is used in entry.S, so declare shadow_stack there alongside SHADOW_OVERFLOW_STACK_SIZE. Fixes: 31da94c25aea ("riscv: add VMAP_STACK overflow detection") Signed-off-by: Conor Dooley <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2022-08-18riscv: signal: fix missing prototype warningConor Dooley2-0/+13
Fix the warning: arch/riscv/kernel/signal.c:316:27: warning: no previous prototype for function 'do_notify_resume' [-Wmissing-prototypes] asmlinkage __visible void do_notify_resume(struct pt_regs *regs, All other functions in the file are static & none of the existing headers stood out as an obvious location. Create signal.h to hold the declaration. Fixes: e2c0cdfba7f6 ("RISC-V: User-facing API") Signed-off-by: Conor Dooley <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2022-08-18perf: riscv legacy: fix kerneldoc comment warningConor Dooley1-1/+1
Fix the warning: drivers/perf/riscv_pmu_legacy.c:76: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst Fixes: 9b3e150e310e ("RISC-V: Add a simple platform driver for RISC-V legacy perf") Signed-off-by: Conor Dooley <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2022-08-18net: moxa: MAC address reading, generating, validity checkingSergei Antonov1-6/+7
This device does not remember its MAC address, so add a possibility to get it from the platform. If it fails, generate a random address. This will provide a MAC address early during boot without user space being involved. Also remove extra calls to is_valid_ether_addr(). Made after suggestions by Andrew Lunn: 1) Use eth_hw_addr_random() to assign a random MAC address during probe. 2) Remove is_valid_ether_addr() from moxart_mac_open() 3) Add a call to platform_get_ethdev_address() during probe 4) Remove is_valid_ether_addr() from moxart_set_mac_address(). The core does this v1 -> v2: Handle EPROBE_DEFER returned from platform_get_ethdev_address(). Move MAC reading code to the beginning of the probe function. Signed-off-by: Sergei Antonov <[email protected]> Suggested-by: Andrew Lunn <[email protected]> CC: Yang Yingliang <[email protected]> CC: Pavel Skripkin <[email protected]> CC: Guobin Huang <[email protected]> CC: Yang Wei <[email protected]> CC: Christophe JAILLET <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18Merge branch 'tcp-some-bug-fixes-for-tcp_read_skb'Jakub Kicinski2-28/+26
Cong Wang says: ==================== tcp: some bug fixes for tcp_read_skb() This patchset contains 3 bug fixes and 1 minor refactor patch for tcp_read_skb(). V1 only had the first patch, as Eric prefers to fix all of them together, I have to group them together. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18tcp: handle pure FIN case correctlyCong Wang2-3/+4
When skb->len==0, the recv_actor() returns 0 too, but we also use 0 for error conditions. This patch amends this by propagating the errors to tcp_read_skb() so that we can distinguish skb->len==0 case from error cases. Fixes: 04919bed948d ("tcp: Introduce tcp_read_skb()") Reported-by: Eric Dumazet <[email protected]> Cc: John Fastabend <[email protected]> Cc: Jakub Sitnicki <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18tcp: refactor tcp_read_skb() a bitCong Wang1-17/+9
As tcp_read_skb() only reads one skb at a time, the while loop is unnecessary, we can turn it into an if. This also simplifies the code logic. Cc: Eric Dumazet <[email protected]> Cc: John Fastabend <[email protected]> Cc: Jakub Sitnicki <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18tcp: fix tcp_cleanup_rbuf() for tcp_read_skb()Cong Wang1-10/+14
tcp_cleanup_rbuf() retrieves the skb from sk_receive_queue, it assumes the skb is not yet dequeued. This is no longer true for tcp_read_skb() case where we dequeue the skb first. Fix this by introducing a helper __tcp_cleanup_rbuf() which does not require any skb and calling it in tcp_read_skb(). Fixes: 04919bed948d ("tcp: Introduce tcp_read_skb()") Cc: Eric Dumazet <[email protected]> Cc: John Fastabend <[email protected]> Cc: Jakub Sitnicki <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18tcp: fix sock skb accounting in tcp_read_skb()Cong Wang1-0/+1
Before commit 965b57b469a5 ("net: Introduce a new proto_ops ->read_skb()"), skb was not dequeued from receive queue hence when we close TCP socket skb can be just flushed synchronously. After this commit, we have to uncharge skb immediately after being dequeued, otherwise it is still charged in the original sock. And we still need to retain skb->sk, as eBPF programs may extract sock information from skb->sk. Therefore, we have to call skb_set_owner_sk_safe() here. Fixes: 965b57b469a5 ("net: Introduce a new proto_ops ->read_skb()") Reported-and-tested-by: [email protected] Tested-by: Stanislav Fomichev <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: John Fastabend <[email protected]> Cc: Jakub Sitnicki <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18igb: Add lock to avoid data raceLin Ma2-1/+13
The commit c23d92b80e0b ("igb: Teardown SR-IOV before unregister_netdev()") places the unregister_netdev() call after the igb_disable_sriov() call to avoid functionality issue. However, it introduces several race conditions when detaching a device. For example, when .remove() is called, the below interleaving leads to use-after-free. (FREE from device detaching) | (USE from netdev core) igb_remove | igb_ndo_get_vf_config igb_disable_sriov | vf >= adapter->vfs_allocated_count? kfree(adapter->vf_data) | adapter->vfs_allocated_count = 0 | | memcpy(... adapter->vf_data[vf] Moreover, the igb_disable_sriov() also suffers from data race with the requests from VF driver. (FREE from device detaching) | (USE from requests) igb_remove | igb_msix_other igb_disable_sriov | igb_msg_task kfree(adapter->vf_data) | vf < adapter->vfs_allocated_count adapter->vfs_allocated_count = 0 | To this end, this commit first eliminates the data races from netdev core by using rtnl_lock (similar to commit 719479230893 ("dpaa2-eth: add MAC/PHY support through phylink")). And then adds a spinlock to eliminate races from driver requests. (similar to commit 1e53834ce541 ("ixgbe: Add locking to prevent panic when setting sriov_numvfs to zero") Fixes: c23d92b80e0b ("igb: Teardown SR-IOV before unregister_netdev()") Signed-off-by: Lin Ma <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18Merge branch '100GbE' of ↵Jakub Kicinski6-18/+85
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-08-17 (ice) This series contains updates to ice driver only. Grzegorz prevents modifications to VLAN 0 when setting VLAN promiscuous as it will already be set. He also ignores -EEXIST error when attempting to set promiscuous and ensures promiscuous mode is properly cleared from the hardware when being removed. Benjamin ignores additional -EEXIST errors when setting promiscuous mode since the existing mode is the desired mode. Sylwester fixes VFs to allow sending of tagged traffic when no VLAN filters exist. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: ice: Fix VF not able to send tagged traffic with no VLAN filters ice: Ignore error message when setting same promiscuous mode ice: Fix clearing of promisc mode with bridge over bond ice: Ignore EEXIST when setting promisc mode ice: Fix double VLAN error when entering promisc mode ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18dt-bindings: Fix incorrect "the the" correctionsGeert Uytterhoeven2-2/+2
Lots of double occurrences of "the" were replaced by single occurrences, but some of them should become "to the" instead. Fixes: 12e5bde18d7f6ca4 ("dt-bindings: Fix typo in comment") Signed-off-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/c5743c0a1a24b3a8893797b52fed88b99e56b04b.1660755148.git.geert+renesas@glider.be Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18net: genl: fix error path memory leak in policy dumpingJakub Kicinski2-3/+17
If construction of the array of policies fails when recording non-first policy we need to unwind. netlink_policy_dump_add_policy() itself also needs fixing as it currently gives up on error without recording the allocated pointer in the pstate pointer. Reported-by: [email protected] Fixes: 50a896cf2d6f ("genetlink: properly support per-op policy dumping") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18stmmac: intel: Add a missing clk_disable_unprepare() call in ↵Christophe JAILLET1-0/+1
intel_eth_pci_remove() Commit 09f012e64e4b ("stmmac: intel: Fix clock handling on error and remove paths") removed this clk_disable_unprepare() This was partly revert by commit ac322f86b56c ("net: stmmac: Fix clock handling on remove path") which removed this clk_disable_unprepare() because: " While unloading the dwmac-intel driver, clk_disable_unprepare() is being called twice in stmmac_dvr_remove() and intel_eth_pci_remove(). This causes kernel panic on the second call. " However later on, commit 5ec55823438e8 ("net: stmmac: add clocks management for gmac driver") has updated stmmac_dvr_remove() which do not call clk_disable_unprepare() anymore. So this call should now be called from intel_eth_pci_remove(). Fixes: 5ec55823438e8 ("net: stmmac: add clocks management for gmac driver") Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Link: https://lore.kernel.org/r/d7c8c1dadf40df3a7c9e643f76ffadd0ccc1ad1b.1660659689.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-18tee: add overflow check in register_shm_helper()Jens Wiklander1-0/+3
With special lengths supplied by user space, register_shm_helper() has an integer overflow when calculating the number of pages covered by a supplied user space memory region. This causes internal_get_user_pages_fast() a helper function of pin_user_pages_fast() to do a NULL pointer dereference: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000010 Modules linked in: CPU: 1 PID: 173 Comm: optee_example_a Not tainted 5.19.0 #11 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 pc : internal_get_user_pages_fast+0x474/0xa80 Call trace: internal_get_user_pages_fast+0x474/0xa80 pin_user_pages_fast+0x24/0x4c register_shm_helper+0x194/0x330 tee_shm_register_user_buf+0x78/0x120 tee_ioctl+0xd0/0x11a0 __arm64_sys_ioctl+0xa8/0xec invoke_syscall+0x48/0x114 Fix this by adding an an explicit call to access_ok() in tee_shm_register_user_buf() to catch an invalid user space address early. Fixes: 033ddf12bcf5 ("tee: add register user memory") Cc: [email protected] Reported-by: Nimish Mishra <[email protected]> Reported-by: Anirban Chakraborty <[email protected]> Reported-by: Debdeep Mukhopadhyay <[email protected]> Suggested-by: Jerome Forissier <[email protected]> Signed-off-by: Jens Wiklander <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2022-08-18Merge tag 'irqchip-fixes-6.0-1' of ↵Thomas Gleixner7-32/+34
git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent Pull irqchip fixes from Marc Zyngier: - A bunch of small fixes for the recently merged LoongArch drivers - A leftover from the non-SMP IRQ affinity rework affecting the Hyper-V IOMMU code Link: https://lore.kernel.org/r/[email protected]
2022-08-18drm/vc4: hdmi: Rework power upMaxime Ripard1-8/+7
The current code tries to handle the case where CONFIG_PM isn't selected by first calling our runtime_resume implementation and then properly report the power state to the runtime_pm core. This allows to have a functionning device even if pm_runtime_get_* functions are nops. However, the device power state if CONFIG_PM is enabled is RPM_SUSPENDED, and thus our vc4_hdmi_write() and vc4_hdmi_read() calls in the runtime_pm hooks will now report a warning since the device might not be properly powered. Even more so, we need CONFIG_PM enabled since the previous RaspberryPi have a power domain that needs to be powered up for the HDMI controller to be usable. The previous patch has created a dependency on CONFIG_PM, now we can just assume it's there and only call pm_runtime_resume_and_get() to make sure our device is powered in bind. Link: https://lore.kernel.org/r/[email protected] Acked-by: Thomas Zimmermann <[email protected]> Tested-by: Stefan Wahren <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> (cherry picked from commit 53565c28e6af2cef6bbf438c34250135e3564459) Signed-off-by: Maxime Ripard <[email protected]>
2022-08-18drm/vc4: hdmi: Depends on CONFIG_PMMaxime Ripard2-1/+2
We already depend on runtime PM to get the power domains and clocks for most of the devices supported by the vc4 driver, so let's just select it to make sure it's there. Link: https://lore.kernel.org/r/[email protected] Acked-by: Thomas Zimmermann <[email protected]> Tested-by: Stefan Wahren <[email protected]> Signed-off-by: Maxime Ripard <[email protected]> (cherry picked from commit f1bc386b319e93e56453ae27e9e83817bb1f6f95) Signed-off-by: Maxime Ripard <[email protected]>
2022-08-18blk-mq: run queue no matter whether the request is the last requestYufen Yu1-1/+1
We do test on a virtio scsi device (/dev/sda) and the default mq scheduler is 'none'. We found a IO hung as following: blk_finish_plug blk_mq_plug_issue_direct scsi_mq_get_budget //get budget_token fail and sdev->restarts=1 scsi_end_request scsi_run_queue_async //sdev->restart=0 and run queue blk_mq_request_bypass_insert //add request to hctx->dispatch list //continue to dispath plug list blk_mq_dispatch_plug_list blk_mq_try_issue_list_directly //success issue all requests from plug list After .get_budget fail, scsi_mq_get_budget will increase 'restarts'. Normally, it will run hw queue when io complete and set 'restarts' as 0. But if we run queue before adding request to the dispatch list and blk_mq_dispatch_plug_list also success issue all requests, then on one will run queue, and the request will be stall in the dispatch list and cannot complete forever. It is wrong to use last request of plug list to decide if run queue is needed since all the remained requests in plug list may be from other hctxs. To fix the bug, pass run_queue as true always to blk_mq_request_bypass_insert(). Fix-suggested-by: Ming Lei <[email protected]> Signed-off-by: Yufen Yu <[email protected]> Reviewed-by: Ming Lei <[email protected]> Fixes: dc5fc361d891 ("block: attempt direct issue of plug list") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-08-18blk-mq: remove unused function blk_mq_queue_stopped()Yu Kuai2-21/+0
blk_mq_queue_stopped() doesn't have any caller, which was found by code coverage test, thus remove it. Signed-off-by: Yu Kuai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-08-18x86/bugs: Add "unknown" reporting for MMIO Stale DataPawan Gupta4-19/+56
Older Intel CPUs that are not in the affected processor list for MMIO Stale Data vulnerabilities currently report "Not affected" in sysfs, which may not be correct. Vulnerability status for these older CPUs is unknown. Add known-not-affected CPUs to the whitelist. Report "unknown" mitigation status for CPUs that are not in blacklist, whitelist and also don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware immunity to MMIO Stale Data vulnerabilities. Mitigation is not deployed when the status is unknown. [ bp: Massage, fixup. ] Fixes: 8d50cdf8b834 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data") Suggested-by: Andrew Cooper <[email protected]> Suggested-by: Tony Luck <[email protected]> Signed-off-by: Pawan Gupta <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.1657846269.git.pawan.kumar.gupta@linux.intel.com
2022-08-18io_uring/net: use right helpers for async_dataPavel Begunkov1-2/+2
There is another spot where we check ->async_data directly instead of using req_has_async_data(), which is the way to do it, fix it up. Fixes: 43e0bbbd0b0e3 ("io_uring: add netmsg cache") Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/42f33b9a81dd6ae65dda92f0372b0ff82d548517.1660822636.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2022-08-18fs: __file_remove_privs(): restore call to inode_has_no_xattr()Stefan Roesch1-6/+8
This restores the call to inode_has_no_xattr() in the function __file_remove_privs(). In case the dentry_meeds_remove_privs() returned 0, the function inode_has_no_xattr() was not called. Signed-off-by: Stefan Roesch <[email protected]> Fixes: faf99b563558 ("fs: add __remove_file_privs() with flags parameter") Reviewed-by: Christian Brauner (Microsoft) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
2022-08-17net: ethernet: mtk_eth_soc: fix possible NULL pointer dereference in mtk_xdp_runLorenzo Bianconi1-1/+1
Fix possible NULL pointer dereference in mtk_xdp_run() if the ebpf program returns XDP_TX and xdp_convert_buff_to_frame routine fails returning NULL. Fixes: 5886d26fd25bb ("net: ethernet: mtk_eth_soc: add xmit XDP support") Signed-off-by: Lorenzo Bianconi <[email protected]> Link: https://lore.kernel.org/r/627a07d759020356b64473e09f0855960e02db28.1660659112.git.lorenzo@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net/mlx5e: Allocate flow steering storage during uplink initializationLeon Romanovsky1-8/+17
IPsec code relies on valid priv->fs pointer that is the case in NIC flow, but not correct in uplink. Before commit that mentioned in the Fixes line, that pointer was valid in all flows as it was allocated together with priv struct. In addition, the cleanup representors routine called to that not-initialized priv->fs pointer and its internals which caused NULL deference. So, move FS allocation to be as early as possible. Fixes: af8bbf730068 ("net/mlx5e: Convert mlx5e_flow_steering member of mlx5e_priv to pointer") Signed-off-by: Leon Romanovsky <[email protected]> Link: https://lore.kernel.org/r/ae46fa5bed3c67f937bfdfc0370101278f5422f1.1660639564.git.leonro@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17Merge branch 'fixes-for-ocelot-driver-statistics'Jakub Kicinski7-378/+1581
Vladimir Oltean says: ==================== Fixes for Ocelot driver statistics This series contains bug fixes for the ocelot drivers (both switchdev and DSA). Some concern the counters exposed to ethtool -S, and others to the counters exposed to ifconfig. I'm aware that the changes are fairly large, but I wanted to prioritize on a proper approach to addressing the issues rather than a quick hack. Some of the noticed problems: - bad register offsets for some counters - unhandled concurrency leading to corrupted counters - unhandled 32-bit wraparound of ifconfig counters The issues on the ocelot switchdev driver were noticed through code inspection, I do not have the hardware to test. This patch set necessarily converts ocelot->stats_lock from a mutex to a spinlock. I know this affects Colin Foster's development with the SPI controlled VSC7512. I have other changes prepared for net-next that convert this back into a mutex (along with other changes in this area). ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: mscc: ocelot: report ndo_get_stats64 from the wraparound-resistant ↵Vladimir Oltean1-27/+26
ocelot->stats Rather than reading the stats64 counters directly from the 32-bit hardware, it's better to rely on the output produced by the periodic ocelot_port_update_stats(). It would be even better to call ocelot_port_update_stats() right from ocelot_get_stats64() to make sure we report the current values rather than the ones from 2 seconds ago. But we need to export ocelot_port_update_stats() from the switch lib towards the switchdev driver for that, and future work will largely undo that. There are more ocelot-based drivers waiting to be introduced, an example of which is the SPI-controlled VSC7512. In that driver's case, it will be impossible to call ocelot_port_update_stats() from ndo_get_stats64 context, since the latter is atomic, and reading the stats over SPI is sleepable. So the compromise taken here, which will also hold going forward, is to report 64-bit counters to stats64, which are not 100% up to date. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: mscc: ocelot: keep ocelot_stat_layout by reg address, not offsetVladimir Oltean6-289/+540
With so many counter addresses recently discovered as being wrong, it is desirable to at least have a central database of information, rather than two: one through the SYS_COUNT_* registers (used for ndo_get_stats64), and the other through the offset field of struct ocelot_stat_layout elements (used for ethtool -S). The strategy will be to keep the SYS_COUNT_* definitions as the single source of truth, but for that we need to expand our current definitions to cover all registers. Then we need to convert the ocelot region creation logic, and stats worker, to the read semantics imposed by going through SYS_COUNT_* absolute register addresses, rather than offsets of 32-bit words relative to SYS_COUNT_RX_OCTETS (which should have been SYS_CNT, by the way). Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: mscc: ocelot: make struct ocelot_stat_layout array indexableVladimir Oltean5-306/+1243
The ocelot counters are 32-bit and require periodic reading, every 2 seconds, by ocelot_port_update_stats(), so that wraparounds are detected. Currently, the counters reported by ocelot_get_stats64() come from the 32-bit hardware counters directly, rather than from the 64-bit accumulated ocelot->stats, and this is a problem for their integrity. The strategy is to make ocelot_get_stats64() able to cherry-pick individual stats from ocelot->stats the way in which it currently reads them out from SYS_COUNT_* registers. But currently it can't, because ocelot->stats is an opaque u64 array that's used only to feed data into ethtool -S. To solve that problem, we need to make ocelot->stats indexable, and associate each element with an element of struct ocelot_stat_layout used by ethtool -S. This makes ocelot_stat_layout a fat (and possibly sparse) array, so we need to change the way in which we access it. We no longer need OCELOT_STAT_END as a sentinel, because we know the array's size (OCELOT_NUM_STATS). We just need to skip the array elements that were left unpopulated for the switch revision (ocelot, felix, seville). Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: mscc: ocelot: fix race between ndo_get_stats64 and ocelot_check_stats_workVladimir Oltean1-0/+4
The 2 methods can run concurrently, and one will change the window of counters (SYS_STAT_CFG_STAT_VIEW) that the other sees. The fix is similar to what commit 7fbf6795d127 ("net: mscc: ocelot: fix mutex lock error during ethtool stats read") has done for ethtool -S. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: mscc: ocelot: turn stats_lock into a spinlockVladimir Oltean3-9/+8
ocelot_get_stats64() currently runs unlocked and therefore may collide with ocelot_port_update_stats() which indirectly accesses the same counters. However, ocelot_get_stats64() runs in atomic context, and we cannot simply take the sleepable ocelot->stats_lock mutex. We need to convert it to an atomic spinlock first. Do that as a preparatory change. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: mscc: ocelot: fix address of SYS_COUNT_TX_AGING counterVladimir Oltean1-1/+1
This register, used as part of stats->tx_dropped in ocelot_get_stats64(), has a wrong address. At the address currently given, there is actually the c_tx_green_prio_6 counter. Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: mscc: ocelot: fix incorrect ndo_get_stats64 packet countersVladimir Oltean5-30/+42
Reading stats using the SYS_COUNT_* register definitions is only used by ocelot_get_stats64() from the ocelot switchdev driver, however, currently the bucket definitions are incorrect. Separately, on both RX and TX, we have the following problems: - a 256-1023 bucket which actually tracks the 256-511 packets - the 1024-1526 bucket actually tracks the 512-1023 packets - the 1527-max bucket actually tracks the 1024-1526 packets => nobody tracks the packets from the real 1527-max bucket Additionally, the RX_PAUSE, RX_CONTROL, RX_LONGS and RX_CLASSIFIED_DROPS all track the wrong thing. However this doesn't seem to have any consequence, since ocelot_get_stats64() doesn't use these. Even though this problem only manifests itself for the switchdev driver, we cannot split the fix for ocelot and for DSA, since it requires fixing the bucket definitions from enum ocelot_reg, which makes us necessarily adapt the structures from felix and seville as well. Fixes: 84705fc16552 ("net: dsa: felix: introduce support for Seville VSC9953 switch") Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Fixes: a556c76adc05 ("net: mscc: Add initial Ocelot switch support") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: dsa: felix: fix ethtool 256-511 and 512-1023 TX packet countersVladimir Oltean1-1/+2
What the driver actually reports as 256-511 is in fact 512-1023, and the TX packets in the 256-511 bucket are not reported. Fix that. Fixes: 56051948773e ("net: dsa: ocelot: add driver for Felix switch family") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: dsa: don't warn in dsa_port_set_state_now() when driver doesn't support itVladimir Oltean1-2/+5
ds->ops->port_stp_state_set() is, like most DSA methods, optional, and if absent, the port is supposed to remain in the forwarding state (as standalone). Such is the case with the mv88e6060 driver, which does not offload the bridge layer. DSA warns that the STP state can't be changed to FORWARDING as part of dsa_port_enable_rt(), when in fact it should not. The error message is also not up to modern standards, so take the opportunity to make it more descriptive. Fixes: fd3645413197 ("net: dsa: change scope of STP state setter") Reported-by: Sergei Antonov <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Sergei Antonov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17net: dsa: sja1105: fix buffer overflow in sja1105_setup_devlink_regions()Rustam Subkhankulov1-1/+1
If an error occurs in dsa_devlink_region_create(), then 'priv->regions' array will be accessed by negative index '-1'. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Rustam Subkhankulov <[email protected]> Fixes: bf425b82059e ("net: dsa: sja1105: expose static config as devlink region") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-17cifs: remove useless parameter 'is_fsctl' from SMB2_ioctl()Enzo Matsumiya4-36/+24
SMB2_ioctl() is always called with is_fsctl = true, so doesn't make any sense to have it at all. Thus, always set SMB2_0_IOCTL_IS_FSCTL flag on the request. Also, as per MS-SMB2 3.3.5.15 "Receiving an SMB2 IOCTL Request", servers must fail the request if the request flags is zero anyway. Signed-off-by: Enzo Matsumiya <[email protected]> Reviewed-by: Tom Talpey <[email protected]> Signed-off-by: Steve French <[email protected]>
2022-08-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski9-261/+390
Florian Westphal says: ==================== netfilter: conntrack and nf_tables bug fixes The following patchset contains netfilter fixes for net. Broken since 5.19: A few ancient connection tracking helpers assume TCP packets cannot exceed 64kb in size, but this isn't the case anymore with 5.19 when BIG TCP got merged, from myself. Regressions since 5.19: 1. 'conntrack -E expect' won't display anything because nfnetlink failed to enable events for expectations, only for normal conntrack events. 2. partially revert change that added resched calls to a function that can be in atomic context. Both broken and fixed up by myself. Broken for several releases (up to original merge of nf_tables): Several fixes for nf_tables control plane, from Pablo. This fixes up resource leaks in error paths and adds more sanity checks for mutually exclusive attributes/flags. Kconfig: NF_CONNTRACK_PROCFS is very old and doesn't provide all info provided via ctnetlink, so it should not default to y. From Geert Uytterhoeven. Selftests: rework nft_flowtable.sh: it frequently indicated failure; the way it tried to detect an offload failure did not work reliably. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: testing: selftests: nft_flowtable.sh: rework test to detect offload failure testing: selftests: nft_flowtable.sh: use random netns names netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y netfilter: nf_tables: check NFT_SET_CONCAT flag if field_count is specified netfilter: nf_tables: disallow NFT_SET_ELEM_CATCHALL and NFT_SET_ELEM_INTERVAL_END netfilter: nf_tables: NFTA_SET_ELEM_KEY_END requires concat and interval flags netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flag netfilter: nf_tables: really skip inactive sets when allocating name netfilter: nfnetlink: re-enable conntrack expectation events netfilter: nf_tables: fix scheduling-while-atomic splat netfilter: nf_ct_irc: cap packet search space to 4k netfilter: nf_ct_ftp: prefer skb_linearize netfilter: nf_ct_h323: cap packet size at 64k netfilter: nf_ct_sane: remove pseudo skb linearization netfilter: nf_tables: possible module reference underflow in error path netfilter: nf_tables: disallow NFTA_SET_ELEM_KEY_END with NFT_SET_ELEM_INTERVAL_END flag netfilter: nf_tables: use READ_ONCE and WRITE_ONCE for shared generation id access ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>