aboutsummaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2022-07-11netfilter: nf_conntrack: add missing __rcu annotationsFlorian Westphal1-1/+1
Access to the hook pointers use correct helpers but the pointers lack the needed __rcu annotation. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-07-11netfilter: nf_flow_table: count pending offload workqueue tasksVlad Buslov3-0/+41
To improve hardware offload debuggability count pending 'add', 'del' and 'stats' flow_table offload workqueue tasks. Counters are incremented before scheduling new task and decremented when workqueue handler finishes executing. These counters allow user to diagnose congestion on hardware offload workqueues that can happen when either CPU is starved and workqueue jobs are executed at lower rate than new ones are added or when hardware/driver can't keep up with the rate. Implement the described counters as percpu counters inside new struct netns_ft which is stored inside struct net. Expose them via new procfs file '/proc/net/stats/nf_flowtable' that is similar to existing 'nf_conntrack' file. Signed-off-by: Vlad Buslov <[email protected]> Signed-off-by: Oz Shlomo <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-07-11net: Find dst with sk's xfrm policy not ctl_sksewookseo1-0/+2
If we set XFRM security policy by calling setsockopt with option IPV6_XFRM_POLICY, the policy will be stored in 'sock_policy' in 'sock' struct. However tcp_v6_send_response doesn't look up dst_entry with the actual socket but looks up with tcp control socket. This may cause a problem that a RST packet is sent without ESP encryption & peer's TCP socket can't receive it. This patch will make the function look up dest_entry with actual socket, if the socket has XFRM policy(sock_policy), so that the TCP response packet via this function can be encrypted, & aligned on the encrypted TCP socket. Tested: We encountered this problem when a TCP socket which is encrypted in ESP transport mode encryption, receives challenge ACK at SYN_SENT state. After receiving challenge ACK, TCP needs to send RST to establish the socket at next SYN try. But the RST was not encrypted & peer TCP socket still remains on ESTABLISHED state. So we verified this with test step as below. [Test step] 1. Making a TCP state mismatch between client(IDLE) & server(ESTABLISHED). 2. Client tries a new connection on the same TCP ports(src & dst). 3. Server will return challenge ACK instead of SYN,ACK. 4. Client will send RST to server to clear the SOCKET. 5. Client will retransmit SYN to server on the same TCP ports. [Expected result] The TCP connection should be established. Cc: Maciej Żenczykowski <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Steffen Klassert <[email protected]> Cc: Sehee Lee <[email protected]> Signed-off-by: Sewook Seo <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfDavid S. Miller1-5/+9
Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) refcount_inc_not_zero() is not semantically equivalent to atomic_int_not_zero(), from Florian Westphal. My understanding was that refcount_*() API provides a wrapper to easier debugging of reference count leaks, however, there are semantic differences between these two APIs, where refcount_inc_not_zero() needs a barrier. Reason for this subtle difference to me is unknown. 2) packet logging is not correct for ARP and IP packets, from the ARP family and netdev/egress respectively. Use skb_network_offset() to reach the headers accordingly. 3) set element extension length have been growing over time, replace a BUG_ON by EINVAL which might be triggerable from userspace. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-07-09Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski2-2/+2
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-07-09 We've added 94 non-merge commits during the last 19 day(s) which contain a total of 125 files changed, 5141 insertions(+), 6701 deletions(-). The main changes are: 1) Add new way for performing BTF type queries to BPF, from Daniel Müller. 2) Add inlining of calls to bpf_loop() helper when its function callback is statically known, from Eduard Zingerman. 3) Implement BPF TCP CC framework usability improvements, from Jörn-Thorben Hinz. 4) Add LSM flavor for attaching per-cgroup BPF programs to existing LSM hooks, from Stanislav Fomichev. 5) Remove all deprecated libbpf APIs in prep for 1.0 release, from Andrii Nakryiko. 6) Add benchmarks around local_storage to BPF selftests, from Dave Marchevsky. 7) AF_XDP sample removal (given move to libxdp) and various improvements around AF_XDP selftests, from Magnus Karlsson & Maciej Fijalkowski. 8) Add bpftool improvements for memcg probing and bash completion, from Quentin Monnet. 9) Add arm64 JIT support for BPF-2-BPF coupled with tail calls, from Jakub Sitnicki. 10) Sockmap optimizations around throughput of UDP transmissions which have been improved by 61%, from Cong Wang. 11) Rework perf's BPF prologue code to remove deprecated functions, from Jiri Olsa. 12) Fix sockmap teardown path to avoid sleepable sk_psock_stop, from John Fastabend. 13) Fix libbpf's cleanup around legacy kprobe/uprobe on error case, from Chuang Wang. 14) Fix libbpf's bpf_helpers.h to work with gcc for the case of its sec/pragma macro, from James Hilliard. 15) Fix libbpf's pt_regs macros for riscv to use a0 for RC register, from Yixun Lan. 16) Fix bpftool to show the name of type BPF_OBJ_LINK, from Yafang Shao. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (94 commits) selftests/bpf: Fix xdp_synproxy build failure if CONFIG_NF_CONNTRACK=m/n bpf: Correctly propagate errors up from bpf_core_composites_match libbpf: Disable SEC pragma macro on GCC bpf: Check attach_func_proto more carefully in check_return_code selftests/bpf: Add test involving restrict type qualifier bpftool: Add support for KIND_RESTRICT to gen min_core_btf command MAINTAINERS: Add entry for AF_XDP selftests files selftests, xsk: Rename AF_XDP testing app bpf, docs: Remove deprecated xsk libbpf APIs description selftests/bpf: Add benchmark for local_storage RCU Tasks Trace usage libbpf, riscv: Use a0 for RC register libbpf: Remove unnecessary usdt_rel_ip assignments selftests/bpf: Fix few more compiler warnings selftests/bpf: Fix bogus uninitialized variable warning bpftool: Remove zlib feature test from Makefile libbpf: Cleanup the legacy uprobe_event on failed add/attach_event() libbpf: Fix wrong variable used in perf_event_uprobe_open_legacy() libbpf: Cleanup the legacy kprobe_event on failed add/attach_event() selftests/bpf: Add type match test against kernel's task_struct selftests/bpf: Add nested type to type based tests ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-09netfilter: nf_tables: replace BUG_ON by element length checkPablo Neira Ayuso1-5/+9
BUG_ON can be triggered from userspace with an element with a large userdata area. Replace it by length check and return EINVAL instead. Over time extensions have been growing in size. Pick a sufficiently old Fixes: tag to propagate this fix. Fixes: 7d7402642eaf ("netfilter: nf_tables: variable sized set element keys / data") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-07-09mptcp: move MPTCPOPT_HMAC_LEN to net/mptcp.hGeliang Tang1-1/+2
Move macro MPTCPOPT_HMAC_LEN definition from net/mptcp/protocol.h to include/net/mptcp.h. Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-099p: Add client parameter to p9_req_put()Kent Overstreet1-1/+1
This is to aid in adding mempools, in the next patch. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kent Overstreet <[email protected]> Cc: Eric Van Hensbergen <[email protected]> Cc: Latchesar Ionkov <[email protected]> Signed-off-by: Dominique Martinet <[email protected]>
2022-07-099p: Drop kref usageKent Overstreet1-3/+3
An upcoming patch is going to require passing the client through p9_req_put() -> p9_req_free(), but that's awkward with the kref indirection - so this patch switches to using refcount_t directly. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Kent Overstreet <[email protected]> Cc: Eric Van Hensbergen <[email protected]> Cc: Latchesar Ionkov <[email protected]> Signed-off-by: Dominique Martinet <[email protected]>
2022-07-08tls: create an internal headerJakub Kicinski1-276/+1
include/net/tls.h is getting a little long, and is probably hard for driver authors to navigate. Split out the internals into a header which will live under net/tls/. While at it move some static inlines with a single user into the source files, add a few tls_ prefixes and fix spelling of 'proccess'. Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-08tls: rx: always allocate max possible aad size for decryptJakub Kicinski1-0/+1
AAD size is either 5 or 13. Really no point complicating the code for the 8B of difference. This will also let us turn the chunked up buffer into a sane struct. Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-08strparser: pad sk_skb_cb to avoid straddling cachelinesJakub Kicinski1-4/+8
sk_skb_cb lives within skb->cb[]. skb->cb[] straddles 2 cache lines, each containing 24B of data. The first cache line does not contain much interesting information for users of strparser, so pad things a little. Previously strp_msg->full_len would live in the first cache line and strp_msg->offset in the second. We need to reorder the 8 byte temp_reg with struct tls_msg to prevent a 4B hole which would push the struct over 48B. Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-08net: Fix data-races around sysctl_mem.Kuniyuki Iwashima1-1/+1
While reading .sysctl_mem, it can be changed concurrently. So, we need to add READ_ONCE() to avoid data-races. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+1
No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-06tls: rx: add sockopt for enabling optimistic decrypt with TLS 1.3Jakub Kicinski1-0/+3
Since optimisitic decrypt may add extra load in case of retries require socket owner to explicitly opt-in. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-06net/sched: act_police: allow 'continue' action offloadVlad Buslov1-0/+1
Offloading police with action TC_ACT_UNSPEC was erroneously disabled even though it was supported by mlx5 matchall offload implementation, which didn't verify the action type but instead assumed that any single police action attached to matchall classifier is a 'continue' action. Lack of action type check made it non-obvious what mlx5 matchall implementation actually supports and caused implementers and reviewers of referenced commits to disallow it as a part of improved validation code. Fixes: b8cd5831c61c ("net: flow_offload: add tc police action parameters") Fixes: b50e462bc22d ("net/sched: act_police: Add extack messages for offload failure") Signed-off-by: Vlad Buslov <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Tested-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-05net: sched: provide shim definitions for taprio_offload_{get,free}Vladimir Oltean1-0/+17
All callers of taprio_offload_get() and taprio_offload_free() prior to the blamed commit are conditionally compiled based on CONFIG_NET_SCH_TAPRIO. felix_vsc9959.c is different; it provides vsc9959_qos_port_tas_set() even when taprio is compiled out. Provide shim definitions for the functions exported by taprio so that felix_vsc9959.c is able to compile. vsc9959_qos_port_tas_set() in that case is dead code anyway, and ocelot_port->taprio remains NULL, which is fine for the rest of the logic. Fixes: 1c9017e44af2 ("net: dsa: felix: keep reference on entire tc-taprio config") Reported-by: Colin Foster <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Tested-by: Colin Foster <[email protected]> Acked-by: Vinicius Costa Gomes <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-07-02net: dsa: tag_ksz: add tag handling for Microchip LAN937xPrasanna Vengateshan1-0/+2
The Microchip LAN937X switches have a tagging protocol which is very similar to KSZ tagging. So that the implementation is added to tag_ksz.c and reused common APIs Signed-off-by: Prasanna Vengateshan <[email protected]> Signed-off-by: Arun Ramadoss <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-029p fid refcount: add a 9p_fid_ref tracepointDominique Martinet1-0/+13
This adds a tracepoint event for 9p fid lifecycle tracing: when a fid is created, its reference count increased/decreased, and freed. The new 9p_fid_ref tracepoint should help anyone wishing to debug any fid problem such as missing clunk (destroy) or use-after-free. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Dominique Martinet <[email protected]>
2022-07-029p fid refcount: add p9_fid_get/put wrappersDominique Martinet1-0/+28
I was recently reminded that it is not clear that p9_client_clunk() was actually just decrementing refcount and clunking only when that reaches zero: make it clear through a set of helpers. This will also allow instrumenting refcounting better for debugging next patch Link: https://lkml.kernel.org/r/[email protected] Reviewed-by: Tyler Hicks <[email protected]> Reviewed-by: Christian Schoenebeck <[email protected]> Signed-off-by: Dominique Martinet <[email protected]>
2022-07-01net: remove SK_RECLAIM_THRESHOLD and SK_RECLAIM_CHUNKPaolo Abeni1-5/+0
There are no more users for the mentioned macros, just drop them. Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-07-01wifi: nl80211: retrieve EHT related elements in AP modeAloka Dixit1-0/+4
Add support to retrieve EHT capabilities and EHT operation elements passed by the userspace in the beacon template and store the pointers in struct cfg80211_ap_settings to be used by the drivers. Co-developed-by: Vikram Kandukuri <[email protected]> Signed-off-by: Vikram Kandukuri <[email protected]> Co-developed-by: Veerendranath Jakkam <[email protected]> Signed-off-by: Veerendranath Jakkam <[email protected]> Signed-off-by: Aloka Dixit <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-07-01wifi: cfg80211: Increase akm_suites array size in cfg80211_crypto_settingsVeerendranath Jakkam1-1/+10
Increase akm_suites array size in struct cfg80211_crypto_settings to 10 and advertise the capability to userspace. This allows userspace to send more than two AKMs to driver in netlink commands such as NL80211_CMD_CONNECT. This capability is needed for implementing WPA3-Personal transition mode correctly with any driver that handles roaming internally. Currently, the possible AKMs for multi-AKM connect can include PSK, PSK-SHA-256, SAE, FT-PSK and FT-SAE. Since the count is already 5, increasing the akm_suites array size to 10 should be reasonable for future usecases. Signed-off-by: Veerendranath Jakkam <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-07-01wifi: mac80211: switch airtime fairness back to deficit round-robin schedulingFelix Fietkau1-3/+14
This reverts commits 6a789ba679d652587532cec2a0e0274fda172f3b and 2433647bc8d983a543e7d31b41ca2de1c7e2c198. The virtual time scheduler code has a number of issues: - queues slowed down by hardware/firmware powersave handling were not properly handled. - on ath10k in push-pull mode, tx queues that the driver tries to pull from were starved, causing excessive latency - delay between tx enqueue and reported airtime use were causing excessively bursty tx behavior The bursty behavior may also be present on the round-robin scheduler, but there it is much easier to fix without introducing additional regressions Signed-off-by: Felix Fietkau <[email protected]> Acked-by: Toke Høiland-Jørgensen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-07-01wifi: mac80211: fix a kernel-doc complaintJohannes Berg1-1/+1
Somehow kernel-doc complains here about strong markup, but we really don't need the [] so just remove that. Signed-off-by: Johannes Berg <[email protected]>
2022-07-01wifi: cfg80211: remove redundant documentationJohannes Berg1-10/+0
These struct members no longer exist, remove them from documentation. Signed-off-by: Johannes Berg <[email protected]>
2022-07-01wifi: mac80211: add a missing comma at kernel-doc markupMauro Carvalho Chehab1-1/+1
The lack of the colon makes it not parse the function parameter: include/net/mac80211.h:6250: warning: Function parameter or member 'vif' not described in 'ieee80211_channel_switch_disconnect' Fix it. Signed-off-by: Mauro Carvalho Chehab <[email protected]> Link: https://lore.kernel.org/r/11c1bdb861d89c93058fcfe312749b482851cbdb.1656409369.git.mchehab@kernel.org Signed-off-by: Johannes Berg <[email protected]>
2022-07-01wifi: cfg80211: fix kernel-doc warnings all over the fileMauro Carvalho Chehab1-7/+21
There are currently 17 kernel-doc warnings on this file: include/net/cfg80211.h:391: warning: Function parameter or member 'bw' not described in 'ieee80211_eht_mcs_nss_supp' include/net/cfg80211.h:437: warning: Function parameter or member 'eht_cap' not described in 'ieee80211_sband_iftype_data' include/net/cfg80211.h:507: warning: Function parameter or member 's1g' not described in 'ieee80211_sta_s1g_cap' include/net/cfg80211.h:1390: warning: Function parameter or member 'counter_offset_beacon' not described in 'cfg80211_color_change_settings' include/net/cfg80211.h:1390: warning: Function parameter or member 'counter_offset_presp' not described in 'cfg80211_color_change_settings' include/net/cfg80211.h:1430: warning: Enum value 'STATION_PARAM_APPLY_STA_TXPOWER' not described in enum 'station_parameters_apply_mask' include/net/cfg80211.h:2195: warning: Function parameter or member 'dot11MeshConnectedToAuthServer' not described in 'mesh_config' include/net/cfg80211.h:2341: warning: Function parameter or member 'short_ssid' not described in 'cfg80211_scan_6ghz_params' include/net/cfg80211.h:3328: warning: Function parameter or member 'kck_len' not described in 'cfg80211_gtk_rekey_data' include/net/cfg80211.h:3698: warning: Function parameter or member 'ftm' not described in 'cfg80211_pmsr_result' include/net/cfg80211.h:3828: warning: Function parameter or member 'global_mcast_stypes' not described in 'mgmt_frame_regs' include/net/cfg80211.h:4977: warning: Function parameter or member 'ftm' not described in 'cfg80211_pmsr_capabilities' include/net/cfg80211.h:5742: warning: Function parameter or member 'u' not described in 'wireless_dev' include/net/cfg80211.h:5742: warning: Function parameter or member 'links' not described in 'wireless_dev' include/net/cfg80211.h:5742: warning: Function parameter or member 'valid_links' not described in 'wireless_dev' include/net/cfg80211.h:6076: warning: Function parameter or member 'is_amsdu' not described in 'ieee80211_data_to_8023_exthdr' include/net/cfg80211.h:6949: warning: Function parameter or member 'sig_dbm' not described in 'cfg80211_notify_new_peer_candidate' Address them, in order to build a better documentation from this header. Signed-off-by: Mauro Carvalho Chehab <[email protected]> Link: https://lore.kernel.org/r/f6f522cdc716a01744bb0eae2186f4592976222b.1656409369.git.mchehab@kernel.org Signed-off-by: Johannes Berg <[email protected]>
2022-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-6/+10
drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c 9c5de246c1db ("net: sparx5: mdb add/del handle non-sparx5 devices") fbb89d02e33a ("net: sparx5: Allow mdb entries to both CPU and ports") Signed-off-by: Jakub Kicinski <[email protected]>
2022-06-30net, neigh: introduce interval_probe_time_ms for periodic probeYuwei Wang1-0/+1
commit ed6cd6a17896 ("net, neigh: Set lower cap for neigh_managed_work rearming") fixed a case when DELAY_PROBE_TIME is configured to 0, the processing of the system work queue hog CPU to 100%, and further more we should introduce a new option used by periodic probe Signed-off-by: Yuwei Wang <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2022-06-29net: switchdev: add reminder near struct switchdev_notifier_fdb_infoVladimir Oltean1-0/+3
br_switchdev_fdb_notify() creates an on-stack FDB info variable, and initializes it member by member. As such, newly added fields which are not initialized by br_switchdev_fdb_notify() will contain junk bytes from the stack. Other uses of struct switchdev_notifier_fdb_info have a struct initializer which should put zeroes in the uninitialized fields. Add a reminder above the structure for future developers. Recently discussed during review. Link: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/#24877698 Link: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/#24912269 Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-06-29net: dsa: add get_pause_stats supportOleksij Rempel1-0/+2
Add support for pause stats Signed-off-by: Oleksij Rempel <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-06-29wifi: mac80211: add gfp_t parameter to ieeee80211_obss_color_collision_notifyLorenzo Bianconi2-3/+5
Introduce the capability to specify gfp_t parameter to ieeee80211_obss_color_collision_notify routine since it runs in interrupt context in ieee80211_rx_check_bss_color_collision(). Fixes: 6d945a33f2b0a ("mac80211: introduce BSS color collision detection") Co-developed-by: Ryder Lee <[email protected]> Signed-off-by: Ryder Lee <[email protected]> Signed-off-by: Lorenzo Bianconi <[email protected]> Link: https://lore.kernel.org/r/02c990fb3fbd929c8548a656477d20d6c0427a13.1655419135.git.lorenzo@kernel.org Signed-off-by: Johannes Berg <[email protected]>
2022-06-27netfilter: nf_tables: avoid skb access on nf_stolenFlorian Westphal1-6/+10
When verdict is NF_STOLEN, the skb might have been freed. When tracing is enabled, this can result in a use-after-free: 1. access to skb->nf_trace 2. access to skb->mark 3. computation of trace id 4. dump of packet payload To avoid 1, keep a cached copy of skb->nf_trace in the trace state struct. Refresh this copy whenever verdict is != STOLEN. Avoid 2 by skipping skb->mark access if verdict is STOLEN. 3 is avoided by precomputing the trace id. Only dump the packet when verdict is not "STOLEN". Reported-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-06-27net: dsa: add Renesas RZ/N1 switch tag driverClément Léger1-0/+2
The switch that is present on the Renesas RZ/N1 SoC uses a specific VLAN value followed by 6 bytes which contains forwarding configuration. Signed-off-by: Clément Léger <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-06-27net: dsa: add support for ethtool get_rmon_stats()Clément Léger1-0/+3
Add support to allow dsa drivers to specify the .get_rmon_stats() operation. Signed-off-by: Clément Léger <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-06-24net: helper function skb_len_addRichard Gobert1-3/+1
Move the len fields manipulation in the skbs to a helper function. There is a comment specifically requesting this and there are several other areas in the code displaying the same pattern which can be refactored. This improves code readability. Signed-off-by: Richard Gobert <[email protected]> Link: https://lore.kernel.org/r/20220622160853.GA6478@debian Signed-off-by: Jakub Kicinski <[email protected]>
2022-06-24Bonding: add per-port priority for failover re-selectionHangbin Liu2-0/+2
Add per port priority support for bonding active slave re-selection during failover. A higher number means higher priority in selection. The primary slave still has the highest priority. This option also follows the primary_reselect rules. This option could only be configured via netlink. Signed-off-by: Hangbin Liu <[email protected]> Acked-by: Jonathan Toppins <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-06-24bonding: add slave_dev field for bond_opt_valueHangbin Liu1-2/+8
Currently, bond_opt_value are mostly used for bonding option settings. If we want to set a value for slave, we need to re-alloc a string to store both slave name and vlaue, like bond_option_queue_id_set() does, which is complex and dumb. As Jon suggested, let's add a union field slave_dev for bond_opt_value, which will be benefit for future slave option setting. In function __bond_opt_init(), we will always check the extra field and set it if it's not NULL. Suggested-by: Jonathan Toppins <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Acked-by: Jonathan Toppins <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-06-24xfrm: change the type of xfrm_register_km and xfrm_unregister_kmZhengchao Shao1-2/+2
Functions xfrm_register_km and xfrm_unregister_km do always return 0, change the type of functions to void. Signed-off-by: Zhengchao Shao <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2022-06-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+5
No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2022-06-23sock: redo the psock vs ULP protection checkJakub Kicinski1-0/+5
Commit 8a59f9d1e3d4 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()") has moved the inet_csk_has_ulp(sk) check from sk_psock_init() to the new tcp_bpf_update_proto() function. I'm guessing that this was done to allow creating psocks for non-inet sockets. Unfortunately the destruction path for psock includes the ULP unwind, so we need to fail the sk_psock_init() itself. Otherwise if ULP is already present we'll notice that later, and call tcp_update_ulp() with the sk_proto of the ULP itself, which will most likely result in the ULP looping its callbacks. Fixes: 8a59f9d1e3d4 ("sock: Introduce sk->sk_prot->psock_update_sk_prot()") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: John Fastabend <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Tested-by: Jakub Sitnicki <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-06-22af_unix: Remove unix_table_locks.Kuniyuki Iwashima1-1/+0
unix_table_locks are to protect the global hash table, unix_socket_table. The previous commit removed it, so let's clean up the unnecessary locks. Here is a test result on EC2 c5.9xlarge where 10 processes run concurrently in different netns and bind 100,000 sockets for each. without this series : 1m 38s with this series : 11s It is ~10x faster because the global hash table is split into 10 netns in this case. Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-06-22af_unix: Put a socket into a per-netns hash table.Kuniyuki Iwashima1-1/+0
This commit replaces the global hash table with a per-netns one and removes the global one. We now link a socket in each netns's hash table so we can save some netns comparisons when iterating through a hash bucket. Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-06-22af_unix: Define a per-netns hash table.Kuniyuki Iwashima2-0/+8
This commit adds a per netns hash table for AF_UNIX, which size is fixed as UNIX_HASH_SIZE for now. The first implementation defines a per-netns hash table as a single array of lock and list: struct unix_hashbucket { spinlock_t lock; struct hlist_head head; }; struct netns_unix { struct unix_hashbucket *hash; ... }; But, Eric pointed out memory cost that the structure has holes because of sizeof(spinlock_t), which is 4 (or more if LOCKDEP is enabled). [0] It could be expensive on a host with thousands of netns and few AF_UNIX sockets. For this reason, a per-netns hash table uses two dense arrays. struct unix_table { spinlock_t *locks; struct hlist_head *buckets; }; struct netns_unix { struct unix_table table; ... }; Note the length of the list has a significant impact rather than lock contention, so having shared locks can be an option. But, per-netns locks and lists still perform better than the global locks and per-netns lists. [1] Also, this patch adds a change so that struct netns_unix disappears from struct net if CONFIG_UNIX is disabled. [0]: https://lore.kernel.org/netdev/CANn89iLVxO5aqx16azNU7p7Z-nz5NrnM5QTqOzueVxEnkVTxyg@mail.gmail.com/ [1]: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-06-22af_unix: Include the whole hash table size in UNIX_HASH_SIZE.Kuniyuki Iwashima1-3/+4
Currently, the size of AF_UNIX hash table is UNIX_HASH_SIZE * 2, the first half for bind()ed sockets and the second half for unbound ones. UNIX_HASH_SIZE * 2 is used to define the table and iterate over it. In some places, we use ARRAY_SIZE(unix_socket_table) instead of UNIX_HASH_SIZE * 2. However, we cannot use it anymore because we will allocate the hash table dynamically. Then, we would have to add UNIX_HASH_SIZE * 2 in many places, which would be troublesome. This patch adapts the UNIX_HASH_SIZE definition to include bound and unbound sockets and defines a new UNIX_HASH_MOD macro to ease calculations. Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-06-21raw: complete rcu conversionEric Dumazet1-2/+2
raw_diag_dump() can use rcu_read_lock() instead of read_lock() Now the hashinfo lock is only used from process context, in write mode only, we can convert it to a spinlock, and we do not need to block BH anymore. Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-06-20net: Introduce a new proto_ops ->read_skb()Cong Wang2-4/+2
Currently both splice() and sockmap use ->read_sock() to read skb from receive queue, but for sockmap we only read one entire skb at a time, so ->read_sock() is too conservative to use. Introduce a new proto_ops ->read_skb() which supports this sematic, with this we can finally pass the ownership of skb to recv actors. For non-TCP protocols, all ->read_sock() can be simply converted to ->read_skb(). Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-06-20tcp: Introduce tcp_read_skb()Cong Wang1-0/+2
This patch inroduces tcp_read_skb() based on tcp_read_sock(), a preparation for the next patch which actually introduces a new sock ops. TCP is special here, because it has tcp_read_sock() which is mainly used by splice(). tcp_read_sock() supports partial read and arbitrary offset, neither of them is needed for sockmap. Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-06-20cfg80211: Indicate MLO connection info in connect and roam callbacksVeerendranath Jakkam1-23/+61
The MLO links used for connection with an MLD AP are decided by the driver in case of SME offloaded to driver. Add support for the drivers to indicate the information of links used for MLO connection in connect and roam callbacks, update the connected links information in wdev from connect/roam result sent by driver. Also, send the connected links information to userspace. Add a netlink flag attribute to indicate that userspace supports handling of MLO connection. Drivers must not do MLO connection when this flag is not set. This is to maintain backwards compatibility with older supplicant versions which doesn't have support for MLO connection. Signed-off-by: Veerendranath Jakkam <[email protected]> Signed-off-by: Johannes Berg <[email protected]>