aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2023-03-31Merge tag 'kvm-s390-master-6.3-1' of ↵Paolo Bonzini35-45/+185
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD A small fix that repairs the external loop detection code for PV guests.
2023-03-31net: phylink: add phylink_expects_phy() methodMichael Sit Wei Hong1-0/+1
Provide phylink_expects_phy() to allow MAC drivers to check if it is expecting a PHY to attach to. Since fixed-linked setups do not need to attach to a PHY. Provides a boolean value as to if the MAC should expect a PHY. Returns true if a PHY is expected. Reviewed-by: Russell King (Oracle) <[email protected]> Signed-off-by: Michael Sit Wei Hong <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-31iommu/vt-d: Fix an IOMMU perfmon warning when CPU hotplugKan Liang1-1/+0
A warning can be triggered when hotplug CPU 0. $ echo 0 > /sys/devices/system/cpu/cpu0/online ------------[ cut here ]------------ Voluntary context switch within RCU read-side critical section! WARNING: CPU: 0 PID: 19 at kernel/rcu/tree_plugin.h:318 rcu_note_context_switch+0x4f4/0x580 RIP: 0010:rcu_note_context_switch+0x4f4/0x580 Call Trace: <TASK> ? perf_event_update_userpage+0x104/0x150 __schedule+0x8d/0x960 ? perf_event_set_state.part.82+0x11/0x50 schedule+0x44/0xb0 schedule_timeout+0x226/0x310 ? __perf_event_disable+0x64/0x1a0 ? _raw_spin_unlock+0x14/0x30 wait_for_completion+0x94/0x130 __wait_rcu_gp+0x108/0x130 synchronize_rcu+0x67/0x70 ? invoke_rcu_core+0xb0/0xb0 ? __bpf_trace_rcu_stall_warning+0x10/0x10 perf_pmu_migrate_context+0x121/0x370 iommu_pmu_cpu_offline+0x6a/0xa0 ? iommu_pmu_del+0x1e0/0x1e0 cpuhp_invoke_callback+0x129/0x510 cpuhp_thread_fun+0x94/0x150 smpboot_thread_fn+0x183/0x220 ? sort_range+0x20/0x20 kthread+0xe6/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> ---[ end trace 0000000000000000 ]--- The synchronize_rcu() will be invoked in the perf_pmu_migrate_context(), when migrating a PMU to a new CPU. However, the current for_each_iommu() is within RCU read-side critical section. Two methods were considered to fix the issue. - Use the dmar_global_lock to replace the RCU read lock when going through the drhd list. But it triggers a lockdep warning. - Use the cpuhp_setup_state_multi() to set up a dedicated state for each IOMMU PMU. The lock can be avoided. The latter method is implemented in this patch. Since each IOMMU PMU has a dedicated state, add cpuhp_node and cpu in struct iommu_pmu to track the state. The state can be dynamically allocated now. Remove the CPUHP_AP_PERF_X86_IOMMU_PERF_ONLINE. Fixes: 46284c6ceb5e ("iommu/vt-d: Support cpumask for IOMMU perfmon") Reported-by: Ammy Yi <[email protected]> Signed-off-by: Kan Liang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Lu Baolu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2023-03-30Merge tag 'wireless-next-2023-03-30' of ↵Jakub Kicinski3-5/+135
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Johannes Berg says: ==================== Major stack changes: * TC offload support for drivers below mac80211 * reduced neighbor report (RNR) handling for AP mode * mac80211 mesh fast-xmit and fast-rx support * support for another mesh A-MSDU format (seems nobody got the spec right) Major driver changes: Kalle moved the drivers that were just plain C files in drivers/net/wireless/ to legacy/ and virtual/ dirs. hwsim * multi-BSSID support * some FTM support ath11k * MU-MIMO parameters support * ack signal support for management packets rtl8xxxu * support for RTL8710BU aka RTL8188GU chips rtw89 * support for various newer firmware APIs ath10k * enabled threaded NAPI on WCN3990 iwlwifi * lots of work for multi-link/EHT (wifi7) * hardware timestamping support for some devices/firwmares * TX beacon protection on newer hardware * tag 'wireless-next-2023-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (181 commits) wifi: clean up erroneously introduced file wifi: iwlwifi: mvm: correctly use link in iwl_mvm_sta_del() wifi: iwlwifi: separate AP link management queues wifi: iwlwifi: mvm: free probe_resp_data later wifi: iwlwifi: bump FW API to 75 for AX devices wifi: iwlwifi: mvm: move max_agg_bufsize into host TLC lq_sta wifi: iwlwifi: mvm: send full STA during HW restart wifi: iwlwifi: mvm: rework active links counting wifi: iwlwifi: mvm: update mac config when assigning chanctx wifi: iwlwifi: mvm: use the correct link queue wifi: iwlwifi: mvm: clean up mac_id vs. link_id in MLD sta wifi: iwlwifi: mvm: fix station link data leak wifi: iwlwifi: mvm: initialize max_rc_amsdu_len per-link wifi: iwlwifi: mvm: use appropriate link for rate selection wifi: iwlwifi: mvm: use the new lockdep-checking macros wifi: iwlwifi: mvm: remove chanctx WARN_ON wifi: iwlwifi: mvm: avoid sending MAC context for idle wifi: iwlwifi: mvm: remove only link-specific AP keys wifi: iwlwifi: mvm: skip inactive links wifi: iwlwifi: mvm: adjust iwl_mvm_scan_respect_p2p_go_iter() for MLO ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-30net/sched: act_tunnel_key: add support for "don't fragment"Davide Caratti1-0/+1
extend "act_tunnel_key" to allow specifying TUNNEL_DONT_FRAGMENT. Suggested-by: Ilya Maximets <[email protected]> Reviewed-by: Pedro Tammela <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski16-29/+83
Conflicts: drivers/net/ethernet/mediatek/mtk_ppe.c 3fbe4d8c0e53 ("net: ethernet: mtk_eth_soc: ppe: add support for flow accounting") 924531326e2d ("net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow") Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-30Merge tag 'net-6.3-rc5' of ↵Linus Torvalds2-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from CAN and WPAN. Still quite a few bugs from this release. This pull is a bit smaller because major subtrees went into the previous one. Or maybe people took spring break off? Current release - regressions: - phy: micrel: correct KSZ9131RNX EEE capabilities and advertisement Current release - new code bugs: - eth: wangxun: fix vector length of interrupt cause - vsock/loopback: consistently protect the packet queue with sk_buff_head.lock - virtio/vsock: fix header length on skb merging - wpan: ca8210: fix unsigned mac_len comparison with zero Previous releases - regressions: - eth: stmmac: don't reject VLANs when IFF_PROMISC is set - eth: smsc911x: avoid PHY being resumed when interface is not up - eth: mtk_eth_soc: fix tx throughput regression with direct 1G links - eth: bnx2x: use the right build_skb() helper after core rework - wwan: iosm: fix 7560 modem crash on use on unsupported channel Previous releases - always broken: - eth: sfc: don't overwrite offload features at NIC reset - eth: r8169: fix RTL8168H and RTL8107E rx crc error - can: j1939: prevent deadlock by moving j1939_sk_errqueue() - virt: vmxnet3: use GRO callback when UPT is enabled - virt: xen: don't do grant copy across page boundary - phy: dp83869: fix default value for tx-/rx-internal-delay - dsa: ksz8: fix multiple issues with ksz8_fdb_dump - eth: mvpp2: fix classification/RSS of VLAN and fragmented packets - eth: mtk_eth_soc: fix flow block refcounting logic Misc: - constify fwnode pointers in SFP handling" * tag 'net-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (55 commits) net: ethernet: mtk_eth_soc: add missing ppe cache flush when deleting a flow net: ethernet: mtk_eth_soc: fix L2 offloading with DSA untag offload net: ethernet: mtk_eth_soc: fix flow block refcounting logic net: mvneta: fix potential double-frees in mvneta_txq_sw_deinit() net: dsa: sync unicast and multicast addresses for VLAN filters too net: dsa: mv88e6xxx: Enable IGMP snooping on user ports only xen/netback: use same error messages for same errors test/vsock: new skbuff appending test virtio/vsock: WARN_ONCE() for invalid state of socket virtio/vsock: fix header length on skb merging bnxt_en: Add missing 200G link speed reporting bnxt_en: Fix typo in PCI id to device description string mapping bnxt_en: Fix reporting of test result in ethtool selftest i40e: fix registers dump after run ethtool adapter self test bnx2x: use the right build_skb() helper net: ipa: compute DMA pool size properly net: wwan: iosm: fixes 7560 modem crash net: ethernet: mtk_eth_soc: fix tx throughput regression with direct 1G links ice: fix invalid check for empty list in ice_sched_assoc_vsi_to_agg() ice: add profile conflict check for AVF FDIR ...
2023-03-30Merge tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drmLinus Torvalds1-7/+0
Pull drm fixes from Daniel Vetter: "Two regression fixes in here, otherwise just the usual stuff: - i915 fixes for color mgmt, psr, lmem flush, hibernate oops, and more - amdgpu: dp mst and hibernate regression fix - etnaviv: revert fdinfo support (incl drm/sched revert), leak fix - misc ivpu fixes, nouveau backlight, drm buddy allocator 32bit fixes" * tag 'drm-fixes-2023-03-30' of git://anongit.freedesktop.org/drm/drm: (27 commits) Revert "drm/scheduler: track GPU active time per entity" Revert "drm/etnaviv: export client GPU usage statistics via fdinfo" drm/etnaviv: fix reference leak when mmaping imported buffer drm/amdgpu: allow more APUs to do mode2 reset when go to S4 drm/amd/display: Take FEC Overhead into Timeslot Calculation drm/amd/display: Add DSC Support for Synaptics Cascaded MST Hub drm: test: Fix 32-bit issue in drm_buddy_test drm: buddy_allocator: Fix buddy allocator init on 32-bit systems drm/nouveau/kms: Fix backlight registration drm/i915/perf: Drop wakeref on GuC RC error drm/i915/dpt: Treat the DPT BO as a framebuffer drm/i915/gem: Flush lmem contents after construction drm/i915/tc: Fix the ICL PHY ownership check in TC-cold state drm/i915: Disable DC states for all commits drm/i915: Workaround ICL CSC_MODE sticky arming drm/i915: Add a .color_post_update() hook drm/i915: Move CSC load back into .color_commit_arm() when PSR is enabled on skl/glk drm/i915: Split icl_color_commit_noarm() from skl_color_commit_noarm() drm/i915/pmu: Use functions common with sysfs to read actual freq accel/ivpu: Fix IPC buffer header status field value ...
2023-03-30netfilter: Correct documentation errors in nf_tables.hMatthieu De Beule1-4/+4
NFTA_RANGE_OP incorrectly says nft_cmp_ops instead of nft_range_ops. NFTA_LOG_GROUP and NFTA_LOG_QTHRESHOLD claim NLA_U32 instead of NLA_U16 NFTA_EXTHDR_SREG isn't documented as a register Signed-off-by: Matthieu De Beule <[email protected]> Signed-off-by: Florian Westphal <[email protected]>
2023-03-30netfilter: nfnetlink_queue: enable classid socket info retrievalEric Sage1-0/+1
This enables associating a socket with a v1 net_cls cgroup. Useful for applying a per-cgroup policy when processing packets in userspace. Signed-off-by: Eric Sage <[email protected]> Signed-off-by: Florian Westphal <[email protected]>
2023-03-30Merge branch 'etnaviv/fixes' of https://git.pengutronix.de/git/lst/linux ↵Daniel Vetter1-7/+0
into drm-fixes - revert gpu time fdinfo support - reference leak fix on imported buffers Signed-off-by: Daniel Vetter <[email protected]> From: Lucas Stach <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-03-30Revert "drm/scheduler: track GPU active time per entity"Lucas Stach1-7/+0
This reverts commit df622729ddbf as it introduces a use-after-free, which isn't easy to fix without going back to the design drawing board. Reported-by: Danilo Krummrich <[email protected]> Signed-off-by: Lucas Stach <[email protected]>
2023-03-30net: add softnet_data.in_net_rx_actionEric Dumazet1-0/+1
We want to make two optimizations in napi_schedule_rps() and ____napi_schedule() which require to know if these helpers are called from net_rx_action(), instead of being called from other contexts. sd.in_net_rx_action is only read/written by the owning cpu. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jason Xing <[email protected]> Tested-by: Jason Xing <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-03-30wifi: nl80211: support advertising S1G capabilitiesKieran Frewen1-0/+7
Include S1G capabilities in netlink band info messages. Signed-off-by: Kieran Frewen <[email protected]> Co-developed-by: Gilad Itzkovitch <[email protected]> Signed-off-by: Gilad Itzkovitch <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2023-03-29Merge tag 'f2fs-fix-6.3-rc5' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fix from Jaegeuk Kim: "This fixes a tracepoint field size in f2fs in preparation for stricter rules for tracing fields" * tag 'f2fs-fix-6.3-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: Fix f2fs_truncate_partial_nodes ftrace event
2023-03-29macvlan: Add netlink attribute for broadcast cutoffHerbert Xu1-0/+1
Make the broadcast cutoff configurable through netlink. Note that macvlan is weird because there is no central device for us to configure (the lowerdev could be anything). So all the options are duplicated over what could be thousands of child devices. IFLA_MACVLAN_BC_QUEUE_LEN took the approach of taking the maximum of all child device settings. This is unnecessary as we could simply store the option in the port device and take the last child device that gets updated as the value to use. Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-29ipv6: Remove in6addr_any alternatives.Kuniyuki Iwashima3-13/+6
Some code defines the IPv6 wildcard address as a local variable and use it with memcmp() or ipv6_addr_equal(). Let's use in6addr_any and ipv6_addr_any() instead. Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-29vsock: support sockmapBobby Eshleman2-0/+18
This patch adds sockmap support for vsock sockets. It is intended to be usable by all transports, but only the virtio and loopback transports are implemented. SOCK_STREAM, SOCK_DGRAM, and SOCK_SEQPACKET are all supported. Signed-off-by: Bobby Eshleman <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-28Merge tag 'mlx5-updates-2023-03-20' of ↵Jakub Kicinski2-2/+8
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-03-20 mlx5 dynamic msix This patch series adds support for dynamic msix vectors allocation in mlx5. Eli Cohen Says: ================ The following series of patches modifies mlx5_core to work with the dynamic MSIX API. Currently, mlx5_core allocates all the interrupt vectors it needs and distributes them amongst the consumers. With the introduction of dynamic MSIX support, which allows for allocation of interrupts more than once, we now allocate vectors as we need them. This allows other drivers running on top of mlx5_core to allocate interrupt vectors for their own use. An example for this is mlx5_vdpa, which uses these vectors to propagate interrupts directly from the hardware to the vCPU [1]. As a preparation for using this series, a use after free issue is fixed in lib/cpu_rmap.c and the allocator for rmap entries has been modified. A complementary API for irq_cpu_rmap_add() has also been introduced. [1] https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/patch/?id=0f2bf1fcae96a83b8c5581854713c9fc3407556e ================ * tag 'mlx5-updates-2023-03-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Provide external API for allocating vectors net/mlx5: Use one completion vector if eth is disabled net/mlx5: Refactor calculation of required completion vectors net/mlx5: Move devlink registration before mlx5_load net/mlx5: Use dynamic msix vectors allocation net/mlx5: Refactor completion irq request/release code net/mlx5: Improve naming of pci function vectors net/mlx5: Use newer affinity descriptor net/mlx5: Modify struct mlx5_irq to use struct msi_map net/mlx5: Fix wrong comment net/mlx5e: Coding style fix, add empty line lib: cpu_rmap: Add irq_cpu_rmap_remove to complement irq_cpu_rmap_add lib: cpu_rmap: Use allocator for rmap entries lib: cpu_rmap: Avoid use after free on rmap->obj array entries ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-28net: dst: Switch to rcuref_t reference countingThomas Gleixner2-10/+11
Under high contention dst_entry::__refcnt becomes a significant bottleneck. atomic_inc_not_zero() is implemented with a cmpxchg() loop, which goes into high retry rates on contention. Switch the reference count to rcuref_t which results in a significant performance gain. Rename the reference count member to __rcuref to reflect the change. The gain depends on the micro-architecture and the number of concurrent operations and has been measured in the range of +25% to +130% with a localhost memtier/memcached benchmark which amplifies the problem massively. Running the memtier/memcached benchmark over a real (1Gb) network connection the conversion on top of the false sharing fix for struct dst_entry::__refcnt results in a total gain in the 2%-5% range over the upstream baseline. Reported-by: Wangyang Guo <[email protected]> Reported-by: Arjan Van De Ven <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-28net: dst: Prevent false sharing vs. dst_entry:: __refcntWangyang Guo4-8/+15
dst_entry::__refcnt is highly contended in scenarios where many connections happen from and to the same IP. The reference count is an atomic_t, so the reference count operations have to take the cache-line exclusive. Aside of the unavoidable reference count contention there is another significant problem which is caused by that: False sharing. perf top identified two affected read accesses. dst_entry::lwtstate and rtable::rt_genid. dst_entry:__refcnt is located at offset 64 of dst_entry, which puts it into a seperate cacheline vs. the read mostly members located at the beginning of the struct. That prevents false sharing vs. the struct members in the first 64 bytes of the structure, but there is also dst_entry::lwtstate which is located after the reference count and in the same cache line. This member is read after a reference count has been acquired. struct rtable embeds a struct dst_entry at offset 0. struct dst_entry has a size of 112 bytes, which means that the struct members of rtable which follow the dst member share the same cache line as dst_entry::__refcnt. Especially rtable::rt_genid is also read by the contexts which have a reference count acquired already. When dst_entry:__refcnt is incremented or decremented via an atomic operation these read accesses stall. This was found when analysing the memtier benchmark in 1:100 mode, which amplifies the problem extremly. Move the rt[6i]_uncached[_list] members out of struct rtable and struct rt6_info into struct dst_entry to provide padding and move the lwtstate member after that so it ends up in the same cache line. The resulting improvement depends on the micro-architecture and the number of CPUs. It ranges from +20% to +120% with a localhost memtier/memcached benchmark. [ tglx: Rearrange struct ] Signed-off-by: Wangyang Guo <[email protected]> Signed-off-by: Arjan van de Ven <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-28Merge branch 'locking/rcuref' of ↵Jakub Kicinski5-11/+464
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pulling rcurefs from Peter for tglx's work. Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-28Merge tag 'urgent-rcu.2023.03.28a' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU fix from Paul McKenney: "This brings the rcu_torture_read event trace into line with the new trace tools by replacing this event trace's __field() with the corresponding __array(). Without this, the new trace tools will fail when presented wtih an rcu_torture_read event trace, which is a regression from the viewpoint of trace tools users" Link: https://lore.kernel.org/all/[email protected]/ * tag 'urgent-rcu.2023.03.28a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: rcu: Fix rcu_torture_read ftrace event
2023-03-28atomics: Provide rcuref - scalable reference countingThomas Gleixner2-0/+161
atomic_t based reference counting, including refcount_t, uses atomic_inc_not_zero() for acquiring a reference. atomic_inc_not_zero() is implemented with a atomic_try_cmpxchg() loop. High contention of the reference count leads to retry loops and scales badly. There is nothing to improve on this implementation as the semantics have to be preserved. Provide rcuref as a scalable alternative solution which is suitable for RCU managed objects. Similar to refcount_t it comes with overflow and underflow detection and mitigation. rcuref treats the underlying atomic_t as an unsigned integer and partitions this space into zones: 0x00000000 - 0x7FFFFFFF valid zone (1 .. (INT_MAX + 1) references) 0x80000000 - 0xBFFFFFFF saturation zone 0xC0000000 - 0xFFFFFFFE dead zone 0xFFFFFFFF no reference rcuref_get() unconditionally increments the reference count with atomic_add_negative_relaxed(). rcuref_put() unconditionally decrements the reference count with atomic_add_negative_release(). This unconditional increment avoids the inc_not_zero() problem, but requires a more complex implementation on the put() side when the count drops from 0 to -1. When this transition is detected then it is attempted to mark the reference count dead, by setting it to the midpoint of the dead zone with a single atomic_cmpxchg_release() operation. This operation can fail due to a concurrent rcuref_get() elevating the reference count from -1 to 0 again. If the unconditional increment in rcuref_get() hits a reference count which is marked dead (or saturated) it will detect it after the fact and bring back the reference count to the midpoint of the respective zone. The zones provide enough tolerance which makes it practically impossible to escape from a zone. The racy implementation of rcuref_put() requires to protect rcuref_put() against a grace period ending in order to prevent a subtle use after free. As RCU is the only mechanism which allows to protect against that, it is not possible to fully replace the atomic_inc_not_zero() based implementation of refcount_t with this scheme. The final drop is slightly more expensive than the atomic_dec_return() counterpart, but that's not the case which this is optimized for. The optimization is on the high frequeunt get()/put() pairs and their scalability. The performance of an uncontended rcuref_get()/put() pair where the put() is not dropping the last reference is still on par with the plain atomic operations, while at the same time providing overflow and underflow detection and mitigation. The performance of rcuref compared to plain atomic_inc_not_zero() and atomic_dec_return() based reference counting under contention: - Micro benchmark: All CPUs running a increment/decrement loop on an elevated reference count, which means the 0 to -1 transition never happens. The performance gain depends on microarchitecture and the number of CPUs and has been observed in the range of 1.3X to 4.7X - Conversion of dst_entry::__refcnt to rcuref and testing with the localhost memtier/memcached benchmark. That benchmark shows the reference count contention prominently. The performance gain depends on microarchitecture and the number of CPUs and has been observed in the range of 1.1X to 2.6X over the previous fix for the false sharing issue vs. struct dst_entry::__refcnt. When memtier is run over a real 1Gb network connection, there is a small gain on top of the false sharing fix. The two changes combined result in a 2%-5% total gain for that networked test. Reported-by: Wangyang Guo <[email protected]> Reported-by: Arjan Van De Ven <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-03-28atomics: Provide atomic_add_negative() variantsThomas Gleixner3-11/+303
atomic_add_negative() does not provide the relaxed/acquire/release variants. Provide them in preparation for a new scalable reference count algorithm. Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Mark Rutland <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-03-27ethtool: Add support for configuring tx_push_buf_lenShay Agroskin2-4/+12
This attribute, which is part of ethtool's ring param configuration allows the user to specify the maximum number of the packet's payload that can be written directly to the device. Example usage: # ethtool -G [interface] tx-push-buf-len [number of bytes] Co-developed-by: Jakub Kicinski <[email protected]> Signed-off-by: Shay Agroskin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-27netlink: Add a macro to set policy message with format stringShay Agroskin1-0/+22
Similar to NL_SET_ERR_MSG_FMT, add a macro which sets netlink policy error message with a format string. Signed-off-by: Shay Agroskin <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-27net: introduce a config option to tweak MAX_SKB_FRAGSEric Dumazet1-11/+5
Currently, MAX_SKB_FRAGS value is 17. For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg() attempts order-3 allocations, stuffing 32768 bytes per frag. But with zero copy, we use order-0 pages. For BIG TCP to show its full potential, we add a config option to be able to fit up to 45 segments per skb. This is also needed for BIG TCP rx zerocopy, as zerocopy currently does not support skbs with frag list. We have used MAX_SKB_FRAGS=45 value for years at Google before we deployed 4K MTU, with no adverse effect, other than a recent issue in mlx4, fixed in commit 26782aad00cc ("net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS") Back then, goal was to be able to receive full size (64KB) GRO packets without the frag_list overhead. Note that /proc/sys/net/core/max_skb_frags can also be used to limit the number of fragments TCP can use in tx packets. By default we keep the old/legacy value of 17 until we get more coverage for the updated values. Sizes of struct skb_shared_info on 64bit arches MAX_SKB_FRAGS | sizeof(struct skb_shared_info): ============================================== 17 320 21 320+64 = 384 25 320+128 = 448 29 320+192 = 512 33 320+256 = 576 37 320+320 = 640 41 320+384 = 704 45 320+448 = 768 This inflation might cause problems for drivers assuming they could pack both the incoming packet (for MTU=1500) and skb_shared_info in half a page, using build_skb(). v3: fix build error when CONFIG_NET=n v2: fix two build errors assuming MAX_SKB_FRAGS was "unsigned long" Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Jason Xing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-27KVM: x86/ioapic: Resample the pending state of an IRQ when unmaskingDmytro Maluka1-0/+10
KVM irqfd based emulation of level-triggered interrupts doesn't work quite correctly in some cases, particularly in the case of interrupts that are handled in a Linux guest as oneshot interrupts (IRQF_ONESHOT). Such an interrupt is acked to the device in its threaded irq handler, i.e. later than it is acked to the interrupt controller (EOI at the end of hardirq), not earlier. Linux keeps such interrupt masked until its threaded handler finishes, to prevent the EOI from re-asserting an unacknowledged interrupt. However, with KVM + vfio (or whatever is listening on the resamplefd) we always notify resamplefd at the EOI, so vfio prematurely unmasks the host physical IRQ, thus a new physical interrupt is fired in the host. This extra interrupt in the host is not a problem per se. The problem is that it is unconditionally queued for injection into the guest, so the guest sees an extra bogus interrupt. [*] There are observed at least 2 user-visible issues caused by those extra erroneous interrupts for a oneshot irq in the guest: 1. System suspend aborted due to a pending wakeup interrupt from ChromeOS EC (drivers/platform/chrome/cros_ec.c). 2. Annoying "invalid report id data" errors from ELAN0000 touchpad (drivers/input/mouse/elan_i2c_core.c), flooding the guest dmesg every time the touchpad is touched. The core issue here is that by the time when the guest unmasks the IRQ, the physical IRQ line is no longer asserted (since the guest has acked the interrupt to the device in the meantime), yet we unconditionally inject the interrupt queued into the guest by the previous resampling. So to fix the issue, we need a way to detect that the IRQ is no longer pending, and cancel the queued interrupt in this case. With IOAPIC we are not able to probe the physical IRQ line state directly (at least not if the underlying physical interrupt controller is an IOAPIC too), so in this patch we use irqfd resampler for that. Namely, instead of injecting the queued interrupt, we just notify the resampler that this interrupt is done. If the IRQ line is actually already deasserted, we are done. If it is still asserted, a new interrupt will be shortly triggered through irqfd and injected into the guest. In the case if there is no irqfd resampler registered for this IRQ, we cannot fix the issue, so we keep the existing behavior: immediately unconditionally inject the queued interrupt. This patch fixes the issue for x86 IOAPIC only. In the long run, we can fix it for other irqchips and other architectures too, possibly taking advantage of reading the physical state of the IRQ line, which is possible with some other irqchips (e.g. with arm64 GIC, maybe even with the legacy x86 PIC). [*] In this description we assume that the interrupt is a physical host interrupt forwarded to the guest e.g. by vfio. Potentially the same issue may occur also with a purely virtual interrupt from an emulated device, e.g. if the guest handles this interrupt, again, as a oneshot interrupt. Signed-off-by: Dmytro Maluka <[email protected]> Link: https://lore.kernel.org/kvm/[email protected]/ Link: https://lore.kernel.org/lkml/[email protected]/ Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-03-27KVM: irqfd: Make resampler_list an RCU listDmytro Maluka2-1/+2
It is useful to be able to do read-only traversal of the list of all the registered irqfd resamplers without locking the resampler_lock mutex. In particular, we are going to traverse it to search for a resampler registered for the given irq of an irqchip, and that will be done with an irqchip spinlock (ioapic->lock) held, so it is undesirable to lock a mutex in this context. So turn this list into an RCU list. For protecting the read side, reuse kvm->irq_srcu which is already used for protecting a number of irq related things (kvm->irq_routing, irqfd->resampler->list, kvm->irq_ack_notifier_list, kvm->arch.mask_notifier_list). Signed-off-by: Dmytro Maluka <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2023-03-27net: phy: constify fwnode_get_phy_node() fwnode argumentRussell King (Oracle)1-1/+1
fwnode_get_phy_node() does not motify the fwnode structure, so make the argument const, Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-27net: sfp: make sfp_bus_find_fwnode() take a const fwnodeRussell King (Oracle)1-2/+3
sfp_bus_find_fwnode() does not write to the fwnode, so let's make it const. Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-03-26Merge tag 'core_urgent_for_v6.3_rc4' of ↵Linus Torvalds2-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core fixes from Borislav Petkov: - Do the delayed RCU wakeup for kthreads in the proper order so that former doesn't get ignored - A noinstr warning fix * tag 'core_urgent_for_v6.3_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: entry/rcu: Check TIF_RESCHED _after_ delayed RCU wake-up entry: Fix noinstr warning in __enter_from_user_mode()
2023-03-25bpf: Use bpf_mem_cache_alloc/free for bpf_local_storageMartin KaFai Lau1-0/+1
This patch uses bpf_mem_cache_alloc/free for allocating and freeing bpf_local_storage for task and cgroup storage. The changes are similar to the previous patch. A few things that worth to mention for bpf_local_storage: The local_storage is freed when the last selem is deleted. Before deleting a selem from local_storage, it needs to retrieve the local_storage->smap because the bpf_selem_unlink_storage_nolock() may have set it to NULL. Note that local_storage->smap may have already been NULL when the selem created this local_storage has been removed. In this case, call_rcu will be used to free the local_storage. Also, the bpf_ma (true or false) value is needed before calling bpf_local_storage_free(). The bpf_ma can either be obtained from the local_storage->smap (if available) or any of its selem's smap. A new helper check_storage_bpf_ma() is added to obtain bpf_ma for a deleting bpf_local_storage. When bpf_local_storage_alloc getting a reused memory, all fields are either in the correct values or will be initialized. 'cache[]' must already be all NULLs. 'list' must be empty. Others will be initialized. Cc: Namhyung Kim <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-03-25bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elemMartin KaFai Lau1-1/+5
This patch uses bpf_mem_alloc for the task and cgroup local storage that the bpf prog can easily get a hold of the storage owner's PTR_TO_BTF_ID. eg. bpf_get_current_task_btf() can be used in some of the kmalloc code path which will cause deadlock/recursion. bpf_mem_cache_alloc is deadlock free and will solve a legit use case in [1]. For sk storage, its batch creation benchmark shows a few percent regression when the sk create/destroy batch size is larger than 32. The sk creation/destruction happens much more often and depends on external traffic. Considering it is hypothetical to be able to cause deadlock with sk storage, it can cross the bridge to use bpf_mem_alloc till a legit (ie. useful) use case comes up. For inode storage, bpf_local_storage_destroy() is called before waiting for a rcu gp and its memory cannot be reused immediately. inode stays with kmalloc/kfree after the rcu [or tasks_trace] gp. A 'bool bpf_ma' argument is added to bpf_local_storage_map_alloc(). Only task and cgroup storage have 'bpf_ma == true' which means to use bpf_mem_cache_alloc/free(). This patch only changes selem to use bpf_mem_alloc for task and cgroup. The next patch will change the local_storage to use bpf_mem_alloc also for task and cgroup. Here is some more details on the changes: * memory allocation: After bpf_mem_cache_alloc(), the SDATA(selem)->data is zero-ed because bpf_mem_cache_alloc() could return a reused selem. It is to keep the existing bpf_map_kzalloc() behavior. Only SDATA(selem)->data is zero-ed. SDATA(selem)->data is the visible part to the bpf prog. No need to use zero_map_value() to do the zeroing because bpf_selem_free(..., reuse_now = true) ensures no bpf prog is using the selem before returning the selem through bpf_mem_cache_free(). For the internal fields of selem, they will be initialized when linking to the new smap and the new local_storage. When 'bpf_ma == false', nothing changes in this patch. It will stay with the bpf_map_kzalloc(). * memory free: The bpf_selem_free() and bpf_selem_free_rcu() are modified to handle the bpf_ma == true case. For the common selem free path where its owner is also being destroyed, the mem is freed in bpf_local_storage_destroy(), the owner (task and cgroup) has gone through a rcu gp. The memory can be reused immediately, so bpf_local_storage_destroy() will call bpf_selem_free(..., reuse_now = true) which will do bpf_mem_cache_free() for immediate reuse consideration. An exception is the delete elem code path. The delete elem code path is called from the helper bpf_*_storage_delete() and the syscall bpf_map_delete_elem(). This path is an unusual case for local storage because the common use case is to have the local storage staying with its owner life time so that the bpf prog and the user space does not have to monitor the owner's destruction. For the delete elem path, the selem cannot be reused immediately because there could be bpf prog using it. It will call bpf_selem_free(..., reuse_now = false) and it will wait for a rcu tasks trace gp before freeing the elem. The rcu callback is changed to do bpf_mem_cache_raw_free() instead of kfree(). When 'bpf_ma == false', it should be the same as before. __bpf_selem_free() is added to do the kfree_rcu and call_tasks_trace_rcu(). A few words on the 'reuse_now == true'. When 'reuse_now == true', it is still racing with bpf_local_storage_map_free which is under rcu protection, so it still needs to wait for a rcu gp instead of kfree(). Otherwise, the selem may be reused by slab for a totally different struct while the bpf_local_storage_map_free() is still using it (as a rcu reader). For the inode case, there may be other rcu readers also. In short, when bpf_ma == false and reuse_now == true => vanilla rcu. [1]: https://lore.kernel.org/bpf/[email protected]/ Cc: Namhyung Kim <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-03-25bpf: Add a few bpf mem allocator functionsMartin KaFai Lau1-0/+2
This patch adds a few bpf mem allocator functions which will be used in the bpf_local_storage in a later patch. bpf_mem_cache_alloc_flags(..., gfp_t flags) is added. When the flags == GFP_KERNEL, it will fallback to __alloc(..., GFP_KERNEL). bpf_local_storage knows its running context is sleepable (GFP_KERNEL) and provides a better guarantee on memory allocation. bpf_local_storage has some uncommon cases that its selem cannot be reused immediately. It handles its own rcu_head and goes through a rcu_trace gp and then free it. bpf_mem_cache_raw_free() is added for direct free purpose without leaking the LLIST_NODE_SZ internal knowledge. During free time, the 'struct bpf_mem_alloc *ma' is no longer available. However, the caller should know if it is percpu memory or not and it can call different raw_free functions. bpf_local_storage does not support percpu value, so only the non-percpu 'bpf_mem_cache_raw_free()' is added in this patch. Signed-off-by: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2023-03-25Merge tag 'xfs-6.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds3-6/+54
Pull xfs percpu counter fixes from Darrick Wong: "We discovered a filesystem summary counter corruption problem that was traced to cpu hot-remove racing with the call to percpu_counter_sum that sets the free block count in the superblock when writing it to disk. The root cause is that percpu_counter_sum doesn't cull from dying cpus and hence misses those counter values if the cpu shutdown hooks have not yet run to merge the values. I'm hoping this is a fairly painless fix to the problem, since the dying cpu mask should generally be empty. It's been in for-next for a week without any complaints from the bots. - Fix a race in the percpu counters summation code where the summation failed to add in the values for any CPUs that were dying but not yet dead. This fixes some minor discrepancies and incorrect assertions when running generic/650" * tag 'xfs-6.3-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: pcpcntr: remove percpu_counter_sum_all() fork: remove use of percpu_counter_sum_all pcpcntrs: fix dying cpu summation race cpumask: introduce for_each_cpu_or
2023-03-24Merge tag 'mm-hotfixes-stable-2023-03-24-17-09' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "21 hotfixes, 8 of which are cc:stable. 11 are for MM, the remainder are for other subsystems" * tag 'mm-hotfixes-stable-2023-03-24-17-09' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (21 commits) mm: mmap: remove newline at the end of the trace mailmap: add entries for Richard Leitner kcsan: avoid passing -g for test kfence: avoid passing -g for test mm: kfence: fix using kfence_metadata without initialization in show_object() lib: dhry: fix unstable smp_processor_id(_) usage mailmap: add entry for Enric Balletbo i Serra mailmap: map Sai Prakash Ranjan's old address to his current one mailmap: map Rajendra Nayak's old address to his current one Revert "kasan: drop skip_kasan_poison variable in free_pages_prepare" mailmap: add entry for Tobias Klauser kasan, powerpc: don't rename memintrinsics if compiler adds prefixes mm/ksm: fix race with VMA iteration and mm_struct teardown kselftest: vm: fix unused variable warning mm: fix error handling for map_deny_write_exec mm: deduplicate error handling for map_deny_write_exec checksyscalls: ignore fstat to silence build warning on LoongArch nilfs2: fix kernel-infoleak in nilfs_ioctl_wrap_copy() test_maple_tree: add more testing for mas_empty_area() maple_tree: fix mas_skip_node() end slot detection ...
2023-03-24net/mlx5: Provide external API for allocating vectorsEli Cohen1-0/+6
Provide external API to be used by other drivers relying on mlx5_core, for allocating MSIX vectors. An example for such a driver would be mlx5_vdpa. Signed-off-by: Eli Cohen <[email protected]> Reviewed-by: Shay Drory <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Reviewed-by: Jacob Keller <[email protected]>
2023-03-24lib: cpu_rmap: Add irq_cpu_rmap_remove to complement irq_cpu_rmap_addEli Cohen1-0/+1
Add a function to complement irq_cpu_rmap_add(). It removes the irq from the reverse mapping by setting the notifier to NULL. The function calls irq_set_affinity_notifier() with NULL at the notify argument which then cancel any pending notifier work and decrement reference on the notifier. When ref count reaches zero, the glue pointer is kfree and the rmap entry is set to NULL serving both to avoid second attempt to release it and also making the rmap entry available for subsequent mapping. It should be noted the drivers usually creates the reverse mapping at initialization time and remove it at unload time so we do not expect failures in allocating rmap due to kref holding the glue entry. Cc: Thomas Gleixner <[email protected]> Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Reviewed-by: Jacob Keller <[email protected]>
2023-03-24lib: cpu_rmap: Use allocator for rmap entriesEli Cohen1-2/+1
Use a proper allocator for rmap entries using a naive for loop. The allocator relies on whether an entry is NULL to be considered free. Remove the used field of rmap which is not needed. Also, avoid crashing the kernel if an entry is not available. Cc: Thomas Gleixner <[email protected]> Signed-off-by: Eli Cohen <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]> Reviewed-by: Jacob Keller <[email protected]>
2023-03-24Merge tag 'block-6.3-2023-03-24' of git://git.kernel.dk/linuxLinus Torvalds2-7/+9
Pull block fixes from Jens Axboe: - NVMe pull request via Christoph: - Send Identify with CNS 06h only to I/O controllers (Martin George) - Fix nvme_tcp_term_pdu to match spec (Caleb Sander) - Pass in issue_flags for uring_cmd, so the end_io handlers don't need to assume what the right context is (me) - Fix for ublk, marking it as LIVE before adding it to avoid races on the initial IO (Ming) * tag 'block-6.3-2023-03-24' of git://git.kernel.dk/linux: nvme-tcp: fix nvme_tcp_term_pdu to match spec nvme: send Identify with CNS 06h only to I/O controllers block/io_uring: pass in issue_flags for uring_cmd task_work handling block: ublk_drv: mark device as LIVE before adding disk
2023-03-24Merge tag 'thermal-6.3-rc4' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull thermal control fixes from Rafael Wysocki: "These address two recent regressions related to thermal control. Specifics: - Restore the thermal core behavior regarding zero-temperature trip points to avoid a driver regression (Ido Schimmel) - Fix a recent regression in the ACPI processor driver preventing it from changing the number of CPU cooling device states exposed via sysfs after the given CPU cooling device has been registered (Rafael Wysocki)" * tag 'thermal-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: thermal: core: Restore behavior regarding invalid trip points ACPI: processor: thermal: Update CPU cooling devices on cpufreq policy changes thermal: core: Introduce thermal_cooling_device_update() thermal: core: Introduce thermal_cooling_device_present() ACPI: processor: Reorder acpi_processor_driver_init()
2023-03-24Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski6-5/+42
Conflicts: drivers/net/ethernet/mellanox/mlx5/core/en_tc.c 6e9d51b1a5cb ("net/mlx5e: Initialize link speed to zero") 1bffcea42926 ("net/mlx5e: Add devlink hairpin queues parameters") https://lore.kernel.org/all/[email protected]/ https://lore.kernel.org/all/[email protected]/ Adjacent changes: drivers/net/phy/phy.c 323fe43cf9ae ("net: phy: Improved PHY error reporting in state machine") 4203d84032e2 ("net: phy: Ensure state transitions are processed from phy_stop()") Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-24Merge tag 'efi-fixes-for-v6.3-1' of ↵Linus Torvalds2-2/+8
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - Set the NX compat flag for arm64 and zboot, to ensure compatibility with EFI firmware that complies with tightening requirements imposed across the ecosystem. - Improve identification of Ampere Altra systems based on SMBIOS data. - Fix some issues related to the EFI framebuffer that were introduced as a result from some refactoring related to zboot and the merge with sysfb. - Makefile tweak to avoid rebuilding vmlinuz unnecessarily. - Fix efi_random_alloc() return value on out of memory condition. * tag 'efi-fixes-for-v6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: efi/libstub: randomalloc: Return EFI_OUT_OF_RESOURCES on failure efi/libstub: Use relocated version of kernel's struct screen_info efi/libstub: zboot: Add compressed image to make targets efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L efi: sysfb_efi: Fix DMI quirks not working for simpledrm efi/libstub: smbios: Drop unused 'recsize' parameter arm64: efi: Use SMBIOS processor version to key off Ampere quirk efi/libstub: smbios: Use length member instead of record struct size efi: earlycon: Reprobe after parsing config tables arm64: efi: Set NX compat flag in PE/COFF header efi/libstub: arm64: Remap relocated image with strict permissions efi/libstub: zboot: Mark zboot EFI application as NX compatible
2023-03-24Merge tag 'net-6.3-rc4' of ↵Linus Torvalds4-5/+29
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, wifi and bluetooth. Current release - regressions: - wifi: mt76: mt7915: add back 160MHz channel width support for MT7915 - libbpf: revert poisoning of strlcpy, it broke uClibc-ng Current release - new code bugs: - bpf: improve the coverage of the "allow reads from uninit stack" feature to fix verification complexity problems - eth: am65-cpts: reset PPS genf adj settings on enable Previous releases - regressions: - wifi: mac80211: serialize ieee80211_handle_wake_tx_queue() - wifi: mt76: do not run mt76_unregister_device() on unregistered hw, fix null-deref - Bluetooth: btqcomsmd: fix command timeout after setting BD address - eth: igb: revert rtnl_lock() that causes a deadlock - dsa: mscc: ocelot: fix device specific statistics Previous releases - always broken: - xsk: add missing overflow check in xdp_umem_reg() - wifi: mac80211: - fix QoS on mesh interfaces - fix mesh path discovery based on unicast packets - Bluetooth: - ISO: fix timestamped HCI ISO data packet parsing - remove "Power-on" check from Mesh feature - usbnet: more fixes to drivers trusting packet length - wifi: iwlwifi: mvm: fix mvmtxq->stopped handling - Bluetooth: btintel: iterate only bluetooth device ACPI entries - eth: iavf: fix inverted Rx hash condition leading to disabled hash - eth: igc: fix the validation logic for taprio's gate list - dsa: tag_brcm: legacy: fix daisy-chained switches Misc: - bpf: adjust insufficient default bpf_jit_limit to account for growth of BPF use over the last 5 years - xdp: bpf_xdp_metadata() use EOPNOTSUPP as unique errno indicating no driver support" * tag 'net-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits) Bluetooth: HCI: Fix global-out-of-bounds Bluetooth: mgmt: Fix MGMT add advmon with RSSI command Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work Bluetooth: L2CAP: Fix responding with wrong PDU type Bluetooth: btqcomsmd: Fix command timeout after setting BD address Bluetooth: btinel: Check ACPI handle for NULL before accessing net: mdio: thunder: Add missing fwnode_handle_put() net: dsa: mt7530: move setting ssc_delta to PHY_INTERFACE_MODE_TRGMII case net: dsa: mt7530: move lowering TRGMII driving to mt7530_setup() net: dsa: mt7530: move enabling disabling core clock to mt7530_pll_setup() net: asix: fix modprobe "sysfs: cannot create duplicate filename" gve: Cache link_speed value from device tools: ynl: Fix genlmsg header encoding formats net: enetc: fix aggregate RMON counters not showing the ranges Bluetooth: Remove "Power-on" check from Mesh feature Bluetooth: Fix race condition in hci_cmd_sync_clear Bluetooth: btintel: Iterate only bluetooth device ACPI entries Bluetooth: ISO: fix timestamped HCI ISO data packet parsing Bluetooth: btusb: Remove detection of ISO packets over bulk Bluetooth: hci_core: Detect if an ACL packet is in fact an ISO packet ...
2023-03-24wifi: nl80211: make nl80211_send_chandef non-staticJaewan Kim1-0/+9
Expose nl80211_send_chandef functionality for mac80211_hwsim or vendor netlink can use it where needed. Signed-off-by: Jaewan Kim <[email protected]> Reviewed-by: Michal Kubiak <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2023-03-24cfg80211: support RNR for EMA APAloka Dixit2-0/+32
As per IEEE Std 802.11ax-2021, 11.1.3.8.3 Discovery of a nontransmitted BSSID profile, an EMA AP that transmits a Beacon frame carrying a partial list of nontransmitted BSSID profiles should include in the frame a Reduced Neighbor Report element carrying information for at least the nontransmitted BSSIDs that are not present in the Multiple BSSID element carried in that frame. Add new nested attribute NL80211_ATTR_EMA_RNR_ELEMS to support the above. Number of RNR elements must be more than or equal to the number of MBSSID elements. This attribute can be used only when EMA is enabled. Userspace is responsible for splitting the RNR into multiple elements such that each element excludes the non-transmitting profiles already included in the MBSSID element (%NL80211_ATTR_MBSSID_ELEMS) at the same index. Each EMA beacon will be generated by adding MBSSID and RNR elements at the same index. If the userspace provides more RNR elements than the number of MBSSID elements then these will be added in every EMA beacon. Signed-off-by: Aloka Dixit <[email protected]> Link: https://lore.kernel.org/r/[email protected] [Johannes: validate elements] Signed-off-by: Johannes Berg <[email protected]>
2023-03-23Merge branch 'main' of ↵Jakub Kicinski1-2/+1
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter updates for net-next This pull request contains changes for the *net-next* tree. 1. Change IPv6 stack to keep conntrack references until ipsec policy checks are done, like ipv4, from Madhu Koriginja. This update was missed when IPv6 NAT support was added 10 years ago. 2. get rid of old 'compat' structure layout in nf_nat_redirect core and move the conversion to the only user that needs the old layout for abi reasons. From Jeremy Sowden. 3. Compact some common code paths in nft_redir, also from Jeremy. 4. Time to remove the 'default y' knob so iptables 32bit compat interface isn't compiled in by default anymore, from myself. 5. Move ip(6)tables builtin icmp matches to the udptcp one. This has the advantage that icmp/icmpv6 match doesn't load the iptables/ip6tables modules anymore when iptables-nft is used. Also from myself. * 'main' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: keep conntrack reference until IPsecv6 policy checks are done xtables: move icmp/icmpv6 logic to xt_tcpudp netfilter: xtables: disable 32bit compat interface by default netfilter: nft_masq: deduplicate eval call-backs netfilter: nft_redir: use `struct nf_nat_range2` throughout and deduplicate eval call-backs ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-03-23docs: networking: document NAPIJakub Kicinski1-5/+8
Add basic documentation about NAPI. We can stop linking to the ancient doc on the LF wiki. Link: https://lore.kernel.org/all/[email protected]/ Reviewed-by: Bagas Sanjaya <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Acked-by: Pavel Pisa <[email protected]> # for ctucanfd-driver.rst Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Stephen Hemminger <[email protected]> Reviewed-by: Randy Dunlap <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>