aboutsummaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2022-12-12mptcp: remove MPTCP 'ifdef' in TCP SYN cookiesMatthieu Baerts1-2/+10
To ease the maintenance, it is often recommended to avoid having #ifdef preprocessor conditions. Here the section related to CONFIG_MPTCP was quite short but the next commit needs to add more code around. It is then cleaner to move specific MPTCP code to functions located in net/mptcp directory. Now that mptcp_subflow_request_sock_ops structure can be static, it can also be marked as "read only after init". Suggested-by: Paolo Abeni <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Cc: [email protected] Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-12Merge tag 'for-netdev' of ↵Jakub Kicinski4-3/+21
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2022-12-11 We've added 74 non-merge commits during the last 11 day(s) which contain a total of 88 files changed, 3362 insertions(+), 789 deletions(-). The main changes are: 1) Decouple prune and jump points handling in the verifier, from Andrii. 2) Do not rely on ALLOW_ERROR_INJECTION for fmod_ret, from Benjamin. Merged from hid tree. 3) Do not zero-extend kfunc return values. Necessary fix for 32-bit archs, from Björn. 4) Don't use rcu_users to refcount in task kfuncs, from David. 5) Three reg_state->id fixes in the verifier, from Eduard. 6) Optimize bpf_mem_alloc by reusing elements from free_by_rcu, from Hou. 7) Refactor dynptr handling in the verifier, from Kumar. 8) Remove the "/sys" mount and umount dance in {open,close}_netns in bpf selftests, from Martin. 9) Enable sleepable support for cgrp local storage, from Yonghong. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (74 commits) selftests/bpf: test case for relaxed prunning of active_lock.id selftests/bpf: Add pruning test case for bpf_spin_lock bpf: use check_ids() for active_lock comparison selftests/bpf: verify states_equal() maintains idmap across all frames bpf: states_equal() must build idmap for all function frames selftests/bpf: test cases for regsafe() bug skipping check_id() bpf: regsafe() must not skip check_ids() docs/bpf: Add documentation for BPF_MAP_TYPE_SK_STORAGE selftests/bpf: Add test for dynptr reinit in user_ringbuf callback bpf: Use memmove for bpf_dynptr_{read,write} bpf: Move PTR_TO_STACK alignment check to process_dynptr_func bpf: Rework check_func_arg_reg_off bpf: Rework process_dynptr_func bpf: Propagate errors from process_* checks in check_func_arg bpf: Refactor ARG_PTR_TO_DYNPTR checks into process_dynptr_func bpf: Skip rcu_barrier() if rcu_trace_implies_rcu_gp() is true bpf: Reuse freed element in free_by_rcu during allocation selftests/bpf: Bring test_offload.py back to life bpf: Fix comment error in fixup_kfunc_call function bpf: Do not zero-extend kfunc return values ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-12Merge tag 'linux-can-next-for-6.2-20221212' of ↵David S. Miller1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== linux-can-next-for-6.2-20221212 this is a pull request of 39 patches for net-next/master. The first 2 patches are by me fix a warning and coding style in the kvaser_usb driver. Vivek Yadav's patch sorts the includes of the m_can driver. Biju Das contributes 5 patches for the rcar_canfd driver improve the support for different IP core variants. Jean Delvare's patch for the ctucanfd drops the dependency on COMPILE_TEST. Vincent Mailhol's patch sorts the includes of the etas_es58x driver. Haibo Chen's contributes 2 patches that add i.MX93 support to the flexcan driver. Lad Prabhakar's patch updates the dt-bindings documentation of the rcar_canfd driver. Minghao Chi's patch converts the c_can platform driver to devm_platform_get_and_ioremap_resource(). In the next 7 patches Vincent Mailhol adds devlink support to the etas_es58x driver to report firmware, bootloader and hardware version. Xu Panda's patch converts a strncpy() -> strscpy() in the ucan driver. Ye Bin's patch removes a useless parameter from the AF_CAN protocol. The next 2 patches by Vincent Mailhol and remove unneeded or unused pointers to struct usb_interface in device's priv struct in the ucan and gs_usb driver. Vivek Yadav's patch cleans up the usage of the RAM initialization in the m_can driver. A patch by me add support for SO_MARK to the AF_CAN protocol. Geert Uytterhoeven's patch fixes the number of CAN channels in the rcan_canfd bindings documentation. In the last 11 patches Markus Schneider-Pargmann optimizes the register access in the t_can driver and cleans up the tcan glue driver. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-12-12net: devlink: add DEVLINK_INFO_VERSION_GENERIC_FW_BOOTLOADERVincent Mailhol1-0/+2
As discussed in [1], abbreviating the bootloader to "bl" might not be well understood. Instead, a bootloader technically being a firmware, name it "fw.bootloader". Add a new macro to devlink.h to formalize this new info attribute name and update the documentation. [1] https://lore.kernel.org/netdev/[email protected]/ Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Vincent Mailhol <[email protected]> Link: https://lore.kernel.org/all/[email protected] Signed-off-by: Marc Kleine-Budde <[email protected]>
2022-12-12net: move the nat function to nf_nat_ovs for ovs and tcXin Long1-0/+4
There are two nat functions are nearly the same in both OVS and TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat(). This patch creates nf_nat_ovs.c under netfilter and moves them there then exports nf_ct_nat() so that it can be shared by both OVS and TC, and keeps the nat (type) check and nat flag update in OVS and TC's own place, as these parts are different between OVS and TC. Note that in OVS nat function it was using skb->protocol to get the proto as it already skips vlans in key_extract(), while it doesn't in TC, and TC has to call skb_protocol() to get proto. So in nf_ct_nat_execute(), we keep using skb_protocol() which works for both OVS and TC contrack. Signed-off-by: Xin Long <[email protected]> Acked-by: Aaron Conole <[email protected]> Acked-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-12-10ipvs: run_estimation should control the kthread tasksJulian Anastasov1-2/+4
Change the run_estimation flag to start/stop the kthread tasks. Signed-off-by: Julian Anastasov <[email protected]> Cc: yunhong-cgl jiang <[email protected]> Cc: "dust.li" <[email protected]> Reviewed-by: Jiri Wiesner <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-12-10ipvs: add est_cpulist and est_nice sysctl varsJulian Anastasov1-0/+58
Allow the kthreads for stats to be configured for specific cpulist (isolation) and niceness (scheduling priority). Signed-off-by: Julian Anastasov <[email protected]> Cc: yunhong-cgl jiang <[email protected]> Cc: "dust.li" <[email protected]> Reviewed-by: Jiri Wiesner <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-12-10ipvs: use kthreads for stats estimationJulian Anastasov1-5/+83
Estimating all entries in single list in timer context by single CPU causes large latency with multiple IPVS rules as reported in [1], [2], [3]. Spread the estimator structures in multiple chains and use kthread(s) for the estimation. The chains are processed in multiple (50) timer ticks to ensure the 2-second interval between estimations with some accuracy. Every chain is processed under RCU lock. Every kthread works over its own data structure and all such contexts are attached to array. The contexts can be preserved while the kthread tasks are stopped or restarted. When estimators are removed, unused kthread contexts are released and the slots in array are left empty. First kthread determines parameters to use, eg. maximum number of estimators to process per kthread based on chain's length (chain_max), allowing sub-100us cond_resched rate and estimation taking up to 1/8 of the CPU capacity to avoid any problems if chain_max is not correctly calculated. chain_max is calculated taking into account factors such as CPU speed and memory/cache speed where the cache_factor (4) is selected from real tests with current generation of CPU/NUMA configurations to correct the difference in CPU usage between cached (during calc phase) and non-cached (working) state of the estimated per-cpu data. First kthread also plays the role of distributor of added estimators to all kthreads, keeping low the time to add estimators. The optimization is based on the fact that newly added estimator should be estimated after 2 seconds, so we have the time to offload the adding to chain from controlling process to kthread 0. The allocated kthread context may grow from 1 to 50 allocated structures for timer ticks which saves memory for setups with small number of estimators. We also add delayed work est_reload_work that will make sure the kthread tasks are properly started/stopped. ip_vs_start_estimator() is changed to report errors which allows to safely store the estimators in allocated structures. Many thanks to Jiri Wiesner for his valuable comments and for spending a lot of time reviewing and testing the changes on different platforms with 48-256 CPUs and 1-8 NUMA nodes under different cpufreq governors. [1] Report from Yunhong Jiang: https://lore.kernel.org/netdev/[email protected]/ [2] https://marc.info/?l=linux-virtual-server&m=159679809118027&w=2 [3] Report from Dust: https://archive.linuxvirtualserver.org/html/lvs-devel/2020-12/msg00000.html Signed-off-by: Julian Anastasov <[email protected]> Cc: yunhong-cgl jiang <[email protected]> Cc: "dust.li" <[email protected]> Reviewed-by: Jiri Wiesner <[email protected]> Tested-by: Jiri Wiesner <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-12-10ipvs: use u64_stats_t for the per-cpu countersJulian Anastasov1-5/+5
Use the provided u64_stats_t type to avoid load/store tearing. Fixes: 316580b69d0a ("u64_stats: provide u64_stats_t type") Signed-off-by: Julian Anastasov <[email protected]> Cc: yunhong-cgl jiang <[email protected]> Cc: "dust.li" <[email protected]> Reviewed-by: Jiri Wiesner <[email protected]> Tested-by: Jiri Wiesner <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-12-10ipvs: use common functions for stats allocationJulian Anastasov1-0/+5
Move alloc_percpu/free_percpu logic in new functions Signed-off-by: Julian Anastasov <[email protected]> Cc: yunhong-cgl jiang <[email protected]> Cc: "dust.li" <[email protected]> Reviewed-by: Jiri Wiesner <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-12-10ipvs: add rcu protection to statsJulian Anastasov1-1/+7
In preparation to using RCU locking for the list with estimators, make sure the struct ip_vs_stats are released after RCU grace period by using RCU callbacks. This affects ipvs->tot_stats where we can not use RCU callbacks for ipvs, so we use allocated struct ip_vs_stats_rcu. For services and dests we force RCU callbacks for all cases. Signed-off-by: Julian Anastasov <[email protected]> Cc: yunhong-cgl jiang <[email protected]> Cc: "dust.li" <[email protected]> Reviewed-by: Jiri Wiesner <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-12-09Merge tag 'ipsec-next-2022-12-09' of ↵Jakub Kicinski1-23/+101
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next 2022-12-09 1) Add xfrm packet offload core API. From Leon Romanovsky. 2) Add xfrm packet offload support for mlx5. From Leon Romanovsky and Raed Salem. 3) Fix a typto in a error message. From Colin Ian King. * tag 'ipsec-next-2022-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: (38 commits) xfrm: Fix spelling mistake "oflload" -> "offload" net/mlx5e: Open mlx5 driver to accept IPsec packet offload net/mlx5e: Handle ESN update events net/mlx5e: Handle hardware IPsec limits events net/mlx5e: Update IPsec soft and hard limits net/mlx5e: Store all XFRM SAs in Xarray net/mlx5e: Provide intermediate pointer to access IPsec struct net/mlx5e: Skip IPsec encryption for TX path without matching policy net/mlx5e: Add statistics for Rx/Tx IPsec offloaded flows net/mlx5e: Improve IPsec flow steering autogroup net/mlx5e: Configure IPsec packet offload flow steering net/mlx5e: Use same coding pattern for Rx and Tx flows net/mlx5e: Add XFRM policy offload logic net/mlx5e: Create IPsec policy offload tables net/mlx5e: Generalize creation of default IPsec miss group and rule net/mlx5e: Group IPsec miss handles into separate struct net/mlx5e: Make clear what IPsec rx_err does net/mlx5e: Flatten the IPsec RX add rule path net/mlx5e: Refactor FTE setup code to be more clear net/mlx5e: Move IPsec flow table creation to separate function ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-09Merge tag 'v6.1-rc8' into rdma.git for-nextJason Gunthorpe7-26/+42
For dependencies in following patches Signed-off-by: Jason Gunthorpe <[email protected]>
2022-12-09net/sched: add retpoline wrapper for tcPedro Tammela1-0/+251
On kernels using retpoline as a spectrev2 mitigation, optimize actions and filters that are compiled as built-ins into a direct call. On subsequent patches we expose the classifiers and actions functions and wire up the wrapper into tc. Signed-off-by: Pedro Tammela <[email protected]> Reviewed-by: Jamal Hadi Salim <[email protected]> Reviewed-by: Victor Nogueira <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-12-09net/sched: move struct action_ops definition out of ifdefPedro Tammela1-5/+5
The type definition should be visible even in configurations not using CONFIG_NET_CLS_ACT. Signed-off-by: Pedro Tammela <[email protected]> Reviewed-by: Jamal Hadi Salim <[email protected]> Reviewed-by: Victor Nogueira <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-12-08net_tstamp: add SOF_TIMESTAMPING_OPT_ID_TCPWillem de Bruijn1-3/+3
Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from write_seq sockets instead of snd_una. This should have been the behavior from the start. Because processes may now exist that rely on the established behavior, do not change behavior of the existing option, but add the right behavior with a new flag. It is encouraged to always set SOF_TIMESTAMPING_OPT_ID_TCP on stream sockets along with the existing SOF_TIMESTAMPING_OPT_ID. Intuitively the contract is that the counter is zero after the setsockopt, so that the next write N results in a notification for the last byte N - 1. On idle sockets snd_una == write_seq and this holds for both. But on sockets with data in transmission, snd_una records the unacked offset in the stream. This depends on the ACK response from the peer. A process cannot learn this in a race free manner (ioctl SIOCOUTQ is one racy approach). write_seq records the offset at the last byte written by the process. This is a better starting point. It matches the intuitive contract in all circumstances, unaffected by external behavior. The new timestamp flag necessitates increasing sk_tsflags to 32 bits. Move the field in struct sock to avoid growing the socket (for some common CONFIG variants). The UAPI interface so_timestamping.flags is already int, so 32 bits wide. Reported-by: Sotirios Delimanolis <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-08Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-5/+19
No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07devlink: Expose port function commands to control migratableShay Drory1-0/+21
Expose port function commands to enable / disable migratable capability, this is used to set the port function as migratable. Live migration is the process of transferring a live virtual machine from one physical host to another without disrupting its normal operation. In order for a VM to be able to perform LM, all the VM components must be able to perform migration. e.g.: to be migratable. In order for VF to be migratable, VF must be bound to VFIO driver with migration support. When migratable capability is enabled for a function of the port, the device is making the necessary preparations for the function to be migratable, which might include disabling features which cannot be migrated. Example of LM with migratable function configuration: Set migratable of the VF's port function. $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 migratable disable $ devlink port function set pci/0000:06:00.0/2 migratable enable $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 migratable enable Bind VF to VFIO driver with migration support: $ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/unbind $ echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:08:00.0/driver_override $ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/bind Attach VF to the VM. Start the VM. Perform LM. Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Acked-by: Shannon Nelson <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07devlink: Expose port function commands to control RoCEShay Drory1-0/+18
Expose port function commands to enable / disable RoCE, this is used to control the port RoCE device capabilities. When RoCE is disabled for a function of the port, function cannot create any RoCE specific resources (e.g GID table). It also saves system memory utilization. For example disabling RoCE enable a VF/SF saves 1 Mbytes of system memory per function. Example of a PCI VF port which supports function configuration: Set RoCE of the VF's port function. $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 roce enable $ devlink port function set pci/0000:06:00.0/2 roce disable $ devlink port show pci/0000:06:00.0/2 pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0 vfnum 1 function: hw_addr 00:00:00:00:00:00 roce disable Signed-off-by: Shay Drory <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-07Merge tag 'ieee802154-for-net-next-2022-12-05' of ↵Jakub Kicinski2-0/+61
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== ieee802154-next 2022-12-05 Miquel continued his work towards full scanning support. For this, we now allow the creation of dedicated coordinator interfaces to allow a PAN coordinator to serve in the network and set the needed address filters with the hardware. On top of this we have the first part to allow scanning for available 15.4 networks. A new netlink scan group, within the existing nl802154 API, was added. In addition Miquel fixed two issues that have been introduced in the former patches to free an skb correctly and clarifying an expression in the stack. From David Girault we got tracing support when registering new PANs. * tag 'ieee802154-for-net-next-2022-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next: mac802154: Trace the registration of new PANs ieee802154: Advertize coordinators discovery mac802154: Allow the creation of coordinator interfaces mac802154: Clarify an expression mac802154: Move an skb free within the rx path ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-06net: xsk: Don't include <linux/rculist.h>Christophe JAILLET1-1/+1
There is no need to include <linux/rculist.h> here. Prefer the less invasive <linux/types.h> which is needed for 'hlist_head'. Signed-off-by: Christophe JAILLET <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/r/88d6a1d88764cca328610854f890a9ca1f4b029e.1670086246.git.christophe.jaillet@wanadoo.fr Signed-off-by: Alexei Starovoitov <[email protected]>
2022-12-05xfrm: interface: Add unstable helpers for setting/getting XFRM metadata from ↵Eyal Birger2-0/+18
TC-BPF This change adds xfrm metadata helpers using the unstable kfunc call interface for the TC-BPF hooks. This allows steering traffic towards different IPsec connections based on logic implemented in bpf programs. This object is built based on the availability of BTF debug info. When setting the xfrm metadata, percpu metadata dsts are used in order to avoid allocating a metadata dst per packet. In order to guarantee safe module unload, the percpu dsts are allocated on first use and never freed. The percpu pointer is stored in net/core/filter.c so that it can be reused on module reload. The metadata percpu dsts take ownership of the original skb dsts so that they may be used as part of the xfrm transmission logic - e.g. for MTU calculations. Signed-off-by: Eyal Birger <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2022-12-06net/9p: distinguish zero-copy requestsChristian Schoenebeck1-0/+2
Add boolean `zc` member to struct p9_fcall to distinguish zero-copy messages (not using the linear `sdata` buffer for message payload) from regular messages (which do copy message payload to `sdata` before being further processed). This new member is appended to end of structure to avoid inserting huge padding in generated layout. Link: https://lkml.kernel.org/r/8f2a5c12a446c3b544da64e0b1550e1fb2d6f972.1669144861.git.linux_oss@crudebyte.com Signed-off-by: Christian Schoenebeck <[email protected]> Tested-by: Stefano Stabellini <[email protected]> Signed-off-by: Dominique Martinet <[email protected]>
2022-12-05Merge tag 'rxrpc-next-20221201-b' of ↵David S. Miller1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc: Increasing SACK size and moving away from softirq, parts 2 & 3 Here are the second and third parts of patches in the process of moving rxrpc from doing a lot of its stuff in softirq context to doing it in an I/O thread in process context and thereby making it easier to support a larger SACK table. The full description is in the description for the first part[1] which is already in net-next. The second part includes some cleanups, adds some testing and overhauls some tracing: (1) Remove declaration of rxrpc_kernel_call_is_complete() as the definition is no longer present. (2) Remove the knet() and kproto() macros in favour of using tracepoints. (3) Remove handling of duplicate packets from recvmsg. The input side isn't now going to insert overlapping/duplicate packets into the recvmsg queue. (4) Don't use the rxrpc_conn_parameters struct in the rxrpc_connection or rxrpc_bundle structs - rather put the members in directly. (5) Extract the abort code from a received abort packet right up front rather than doing it in multiple places later. (6) Use enums and symbol lists rather than __builtin_return_address() to indicate where a tracepoint was triggered for local, peer, conn, call and skbuff tracing. (7) Add a refcount tracepoint for the rxrpc_bundle struct. (8) Implement an in-kernel server for the AFS rxperf testing program to talk to (enabled by a Kconfig option). This is tagged as rxrpc-next-20221201-a. The third part introduces the I/O thread and switches various bits over to running there: (1) Fix call timers and call and connection workqueues to not hold refs on the rxrpc_call and rxrpc_connection structs to thereby avoid messy cleanup when the last ref is put in softirq mode. (2) Split input.c so that the call packet processing bits are separate from the received packet distribution bits. Call packet processing gets bumped over to the call event handler. (3) Create a per-local endpoint I/O thread. Barring some tiny bits that still get done in softirq context, all packet reception, processing and transmission is done in this thread. That will allow a load of locking to be removed. (4) Perform packet processing and error processing from the I/O thread. (5) Provide a mechanism to process call event notifications in the I/O thread rather than queuing a work item for that call. (6) Move data and ACK transmission into the I/O thread. ACKs can then be transmitted at the point they're generated rather than getting delegated from softirq context to some process context somewhere. (7) Move call and local processor event handling into the I/O thread. (8) Move cwnd degradation to after packets have been transmitted so that they don't shorten the window too quickly. A bunch of simplifications can then be done: (1) The input_lock is no longer necessary as exclusion is achieved by running the code in the I/O thread only. (2) Don't need to use sk->sk_receive_queue.lock to guard socket state changes as the socket mutex should suffice. (3) Don't take spinlocks in RCU callback functions as they get run in softirq context and thus need _bh annotations. (4) RCU is then no longer needed for the peer's error_targets list. (5) Simplify the skbuff handling in the receive path by dropping the ref in the basic I/O thread loop and getting an extra ref as and when we need to queue the packet for recvmsg or another context. (6) Get the peer address earlier in the input process and pass it to the users so that we only do it once. This is tagged as rxrpc-next-20221201-b. Changes: ======== ver #2) - Added a patch to change four assertions into warnings in rxrpc_read() and fixed a checker warning from a __user annotation that should have been removed.. - Change a min() to min_t() in rxperf as PAGE_SIZE doesn't seem to match type size_t on i386. - Three error handling issues in rxrpc_new_incoming_call(): - If not DATA or not seq #1, should drop the packet, not abort. - Fix a goto that went to the wrong place, dropping a non-held lock. - Fix an rcu_read_lock that should've been an unlock. Tested-by: Marc Dionne <[email protected]> Tested-by: [email protected] Link: https://lore.kernel.org/r/166794587113.2389296.16484814996876530222.stgit@warthog.procyon.org.uk/ [1] Link: https://lore.kernel.org/r/166982725699.621383.2358362793992993374.stgit@warthog.procyon.org.uk/ # v1 ==================== Signed-off-by: David S. Miller <[email protected]>
2022-12-05xfrm: add support to HW update soft and hard limitsLeon Romanovsky1-0/+17
Both in RX and TX, the traffic that performs IPsec packet offload transformation is accounted by HW. It is needed to properly handle hard limits that require to drop the packet. It means that XFRM core needs to update internal counters with the one that accounted by the HW, so new callbacks are introduced in this patch. In case of soft or hard limit is occurred, the driver should call to xfrm_state_check_expire() that will perform key rekeying exactly as done by XFRM core. Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2022-12-05xfrm: add RX datapath protection for IPsec packet offload modeLeon Romanovsky1-23/+32
Traffic received by device with enabled IPsec packet offload should be forwarded to the stack only after decryption, packet headers and trailers removed. Such packets are expected to be seen as normal (non-XFRM) ones, while not-supported packets should be dropped by the HW. Reviewed-by: Raed Salem <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2022-12-05xfrm: add an interface to offload policyLeon Romanovsky1-0/+45
Extend netlink interface to add and delete XFRM policy from the device. This functionality is a first step to implement packet IPsec offload solution. Signed-off-by: Raed Salem <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2022-12-05xfrm: add new packet offload flagLeon Romanovsky1-0/+7
In the next patches, the xfrm core code will be extended to support new type of offload - packet offload. In that mode, both policy and state should be specially configured in order to perform whole offloaded data path. Full offload takes care of encryption, decryption, encapsulation and other operations with headers. As this mode is new for XFRM policy flow, we can "start fresh" with flag bits and release first and second bit for future use. Reviewed-by: Raed Salem <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2022-12-02Merge tag 'wireless-next-2022-12-02' of ↵Jakub Kicinski3-7/+16
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.2 Third set of patches for v6.2. mt76 has a new driver for mt7996 Wi-Fi 7 devices and iwlwifi also got initial Wi-Fi 7 support. Otherwise smaller features and fixes. Major changes: ath10k - store WLAN firmware version in SMEM image table mt76 - mt7996: new driver for MediaTek Wi-Fi 7 (802.11be) devices - mt7986, mt7915: enable Wireless Ethernet Dispatch (WED) offload support - mt7915: add ack signal support - mt7915: enable coredump support - mt7921: remain_on_channel support - mt7921: channel context support iwlwifi - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities - 320 MHz channels support * tag 'wireless-next-2022-12-02' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (144 commits) wifi: ath10k: fix QCOM_SMEM dependency wifi: mt76: mt7921e: add pci .shutdown() support wifi: mt76: mt7915: mmio: fix naming convention wifi: mt76: mt7996: add support to configure spatial reuse parameter set wifi: mt76: mt7996: enable ack signal support wifi: mt76: mt7996: enable use_cts_prot support wifi: mt76: mt7915: rely on band_idx of mt76_phy wifi: mt76: mt7915: enable per bandwidth power limit support wifi: mt76: mt7915: introduce mt7915_get_power_bound() mt76: mt7915: Fix PCI device refcount leak in mt7915_pci_init_hif2() wifi: mt76: do not send firmware FW_FEATURE_NON_DL region wifi: mt76: mt7921: Add missing __packed annotation of struct mt7921_clc wifi: mt76: fix coverity overrun-call in mt76_get_txpower() wifi: mt76: mt7996: add driver for MediaTek Wi-Fi 7 (802.11be) devices wifi: mt76: mt76x0: remove dead code in mt76x0_phy_get_target_power wifi: mt76: mt7915: fix band_idx usage wifi: mt76: mt7915: enable .sta_set_txpwr support wifi: mt76: mt7915: add basedband Txpower info into debugfs wifi: mt76: mt7915: add support to configure spatial reuse parameter set wifi: mt76: mt7915: add missing MODULE_PARM_DESC ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-02Bluetooth: Remove codec id field in vendor codec definitionChethan T N1-1/+0
As per the specfication vendor codec id is defined. BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 4, Part E page 2127 Fixes: 9ae664028a9e ("Bluetooth: Add support for Read Local Supported Codecs V2") Signed-off-by: Chethan T N <[email protected]> Signed-off-by: Kiran K <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2022-12-02Bluetooth: btusb: Fix CSR clones again by re-adding ERR_DATA_REPORTING quirkIsmael Ferreras Morezuelas1-0/+11
A patch series by a Qualcomm engineer essentially removed my quirk/workaround because they thought it was unnecessary. It wasn't, and it broke everything again: https://patchwork.kernel.org/project/netdevbpf/list/?series=661703&archive=both&state=* He argues that the quirk is not necessary because the code should check if the dongle says if it's supported or not. The problem is that for these Chinese CSR clones they say that it would work: = New Index: 00:00:00:00:00:00 (Primary,USB,hci0) = Open Index: 00:00:00:00:00:00 < HCI Command: Read Local Version Information (0x04|0x0001) plen 0 > HCI Event: Command Complete (0x0e) plen 12 > [hci0] 11.276039 Read Local Version Information (0x04|0x0001) ncmd 1 Status: Success (0x00) HCI version: Bluetooth 5.0 (0x09) - Revision 2064 (0x0810) LMP version: Bluetooth 5.0 (0x09) - Subversion 8978 (0x2312) Manufacturer: Cambridge Silicon Radio (10) ... < HCI Command: Read Local Supported Features (0x04|0x0003) plen 0 > HCI Event: Command Complete (0x0e) plen 68 > [hci0] 11.668030 Read Local Supported Commands (0x04|0x0002) ncmd 1 Status: Success (0x00) Commands: 163 entries ... Read Default Erroneous Data Reporting (Octet 18 - Bit 2) Write Default Erroneous Data Reporting (Octet 18 - Bit 3) ... ... < HCI Command: Read Default Erroneous Data Reporting (0x03|0x005a) plen 0 = Close Index: 00:1A:7D:DA:71:XX So bring it back wholesale. Fixes: 63b1a7dd38bf ("Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING") Fixes: e168f6900877 ("Bluetooth: btusb: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING for fake CSR") Fixes: 766ae2422b43 ("Bluetooth: hci_sync: Check LMP feature bit instead of quirk") Cc: [email protected] Cc: Zijun Hu <[email protected]> Cc: Luiz Augusto von Dentz <[email protected]> Cc: Hans de Goede <[email protected]> Tested-by: Ismael Ferreras Morezuelas <[email protected]> Signed-off-by: Ismael Ferreras Morezuelas <[email protected]> Reviewed-by: Hans de Goede <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2022-12-01sctp: delete free member from struct sctp_sched_opsXin Long1-2/+0
After commit 9ed7bfc79542 ("sctp: fix memory leak in sctp_stream_outq_migrate()"), sctp_sched_set_sched() is the only place calling sched->free(), and it can actually be replaced by sched->free_sid() on each stream, and yet there's already a loop to traverse all streams in sctp_sched_set_sched(). This patch adds a function sctp_sched_free_sched() where it calls sched->free_sid() for each stream to replace sched->free() calls in sctp_sched_set_sched() and then deletes the unused free member from struct sctp_sched_ops. Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Link: https://lore.kernel.org/r/e10aac150aca2686cb0bd0570299ec716da5a5c0.1669849471.git.lucien.xin@gmail.com Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-01net/tcp: Disable TCP-MD5 static key on tcp_md5sig_info destructionDmitry Safonov1-3/+7
To do that, separate two scenarios: - where it's the first MD5 key on the system, which means that enabling of the static key may need to sleep; - copying of an existing key from a listening socket to the request socket upon receiving a signed TCP segment, where static key was already enabled (when the key was added to the listening socket). Now the life-time of the static branch for TCP-MD5 is until: - last tcp_md5sig_info is destroyed - last socket in time-wait state with MD5 key is closed. Which means that after all sockets with TCP-MD5 keys are gone, the system gets back the performance of disabled md5-key static branch. While at here, provide static_key_fast_inc() helper that does ref counter increment in atomic fashion (without grabbing cpus_read_lock() on CONFIG_JUMP_LABEL=y). This is needed to add a new user for a static_key when the caller controls the lifetime of another user. Signed-off-by: Dmitry Safonov <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-01wifi: mac80211: add support for restricting netdev features per vifFelix Fietkau2-6/+15
This can be used to selectively disable feature flags for checksum offload, scatter/gather or GSO by changing vif->netdev_features. Removing features from vif->netdev_features does not affect the netdev features themselves, but instead fixes up skbs in the tx path so that the offloads are not needed in the driver. Aside from making it easier to deal with vif type based hardware limitations, this also makes it possible to optimize performance on hardware without native GSO support by declaring GSO support in hw->netdev_features and removing it from vif->netdev_features. This allows mac80211 to handle GSO segmentation after the sta lookup, but before itxq enqueue, thus reducing the number of unnecessary sta lookups, as well as some other per-packet processing. Signed-off-by: Felix Fietkau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2022-12-01rxrpc: Remove decl for rxrpc_kernel_call_is_complete()David Howells1-1/+0
rxrpc_kernel_call_is_complete() has been removed, so remove its declaration too. Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected]
2022-12-01rxrpc: Implement an in-kernel rxperf server for testing purposesDavid Howells1-0/+1
Implement an in-kernel rxperf server to allow kernel-based rxrpc services to be tested directly, unlike with AFS where they're accessed by the fileserver when the latter decides it wants to. This is implemented as a module that, if loaded, opens UDP port 7009 (afs3-rmtsys) and listens on it for incoming calls. Calls can be generated using the rxperf command shipped with OpenAFS, for example. Changes ======= ver #2) - Use min_t() instead of min(). Signed-off-by: David Howells <[email protected]> cc: Marc Dionne <[email protected]> cc: [email protected] cc: Jakub Kicinski <[email protected]>
2022-12-01wifi: cfg80211: Correct example of ieee80211_iface_limitPhilipp Hortmann1-1/+1
Correct wrong closing bracket. Signed-off-by: Philipp Hortmann <[email protected]> Link: https://lore.kernel.org/r/20221114200135.GA100176@matrix-ESPRIMO-P710 Signed-off-by: Johannes Berg <[email protected]>
2022-12-01inet: ping: use hlist_nulls rcu iterator during lookupFlorian Westphal1-3/+0
ping_lookup() does not acquire the table spinlock, so iteration should use hlist_nulls_for_each_entry_rcu(). Spotted during code review. Fixes: dbca1596bbb0 ("ping: convert to RCU lookups, get rid of rwlock") Cc: Eric Dumazet <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-11-30net: devlink: let the core report the driver name instead of the driversVincent Mailhol1-2/+0
The driver name is available in device_driver::name. Right now, drivers still have to report this piece of information themselves in their devlink_ops::info_get callback function. In order to factorize code, make devlink_nl_info_fill() add the driver name attribute. Now that the core sets the driver name attribute, drivers are not supposed to call devlink_info_driver_name_put() anymore. Remove devlink_info_driver_name_put() and clean-up all the drivers using this function in their callback. Signed-off-by: Vincent Mailhol <[email protected]> Tested-by: Ido Schimmel <[email protected]> # mlxsw Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-30devlink: support directly reading from region memoryJacob Keller1-0/+16
To read from a region, user space must currently request a new snapshot of the region and then read from that snapshot. This can sometimes be overkill if user space only reads a tiny portion. They first create the snapshot, then request a read, then destroy the snapshot. For regions which have a single underlying "contents", it makes sense to allow supporting direct reading of the region data. Extend the DEVLINK_CMD_REGION_READ to allow direct reading from a region if requested via the new DEVLINK_ATTR_REGION_DIRECT. If this attribute is set, then perform a direct read instead of using a snapshot. Direct read is mutually exclusive with DEVLINK_ATTR_REGION_SNAPSHOT_ID, and care is taken to ensure that we reject commands which provide incorrect attributes. Regions must enable support for direct read by implementing the .read() callback function. If a region does not support such direct reads, a suitable extended error message is reported. Signed-off-by: Jacob Keller <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-12-01bpf, sockmap: Fix missing BPF_F_INGRESS flag when using apply_bytesPengcheng Yang1-2/+2
When redirecting, we use sk_msg_to_ingress() to get the BPF_F_INGRESS flag from the msg->flags. If apply_bytes is used and it is larger than the current data being processed, sk_psock_msg_verdict() will not be called when sendmsg() is called again. At this time, the msg->flags is 0, and we lost the BPF_F_INGRESS flag. So we need to save the BPF_F_INGRESS flag in sk_psock and use it when redirection. Fixes: 8934ce2fd081 ("bpf: sockmap redirect ingress support") Signed-off-by: Pengcheng Yang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Jakub Sitnicki <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-11-30netfilter: conntrack: merge ipv4+ipv6 confirm functionsFlorian Westphal1-2/+1
No need to have distinct functions. After merge, ipv6 can avoid protooff computation if the connection neither needs sequence adjustment nor helper invocation -- this is the normal case. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2022-11-29Merge branch 'master' of ↵Jakub Kicinski1-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next 2022-11-26 1) Remove redundant variable in esp6. From Colin Ian King. 2) Update x->lastused for every packet. It was used only for outgoing mobile IPv6 packets, but showed to be usefull to check if the a SA is still in use in general. From Antony Antony. 3) Remove unused variable in xfrm_byidx_resize. From Leon Romanovsky. 4) Finalize extack support for xfrm. From Sabrina Dubroca. * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: xfrm: add extack to xfrm_set_spdinfo xfrm: add extack to xfrm_alloc_userspi xfrm: add extack to xfrm_do_migrate xfrm: add extack to xfrm_new_ae and xfrm_replay_verify_len xfrm: add extack to xfrm_del_sa xfrm: add extack to xfrm_add_sa_expire xfrm: a few coding style clean ups xfrm: Remove not-used total variable xfrm: update x->lastused for every packet esp6: remove redundant variable err ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-2/+5
tools/lib/bpf/ringbuf.c 927cbb478adf ("libbpf: Handle size overflow for ringbuf mmap") b486d19a0ab0 ("libbpf: checkpatch: Fixed code alignments in ringbuf.c") https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-29sctp: fix memory leak in sctp_stream_outq_migrate()Zhengchao Shao1-0/+2
When sctp_stream_outq_migrate() is called to release stream out resources, the memory pointed to by prio_head in stream out is not released. The memory leak information is as follows: unreferenced object 0xffff88801fe79f80 (size 64): comm "sctp_repo", pid 7957, jiffies 4294951704 (age 36.480s) hex dump (first 32 bytes): 80 9f e7 1f 80 88 ff ff 80 9f e7 1f 80 88 ff ff ................ 90 9f e7 1f 80 88 ff ff 90 9f e7 1f 80 88 ff ff ................ backtrace: [<ffffffff81b215c6>] kmalloc_trace+0x26/0x60 [<ffffffff88ae517c>] sctp_sched_prio_set+0x4cc/0x770 [<ffffffff88ad64f2>] sctp_stream_init_ext+0xd2/0x1b0 [<ffffffff88aa2604>] sctp_sendmsg_to_asoc+0x1614/0x1a30 [<ffffffff88ab7ff1>] sctp_sendmsg+0xda1/0x1ef0 [<ffffffff87f765ed>] inet_sendmsg+0x9d/0xe0 [<ffffffff8754b5b3>] sock_sendmsg+0xd3/0x120 [<ffffffff8755446a>] __sys_sendto+0x23a/0x340 [<ffffffff87554651>] __x64_sys_sendto+0xe1/0x1b0 [<ffffffff89978b49>] do_syscall_64+0x39/0xb0 [<ffffffff89a0008b>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Link: https://syzkaller.appspot.com/bug?exrid=29c402e56c4760763cc0 Fixes: 637784ade221 ("sctp: introduce priority based stream scheduler") Reported-by: [email protected] Signed-off-by: Zhengchao Shao <[email protected]> Reviewed-by: Xin Long <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-29ieee802154: Advertize coordinators discoveryMiquel Raynal2-0/+61
Let's introduce the basics for advertizing discovered PANs and coordinators, which is: - A new "scan" netlink message group. - A couple of netlink command/attribute. - The main netlink helper to send a netlink message with all the necessary information to forward the main information to the user. Two netlink attributes are proactively added to support future UWB complex channels, but are not actually used yet. Co-developed-by: David Girault <[email protected]> Signed-off-by: David Girault <[email protected]> Signed-off-by: Miquel Raynal <[email protected]> Acked-by: Alexander Aring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stefan Schmidt <[email protected]>
2022-11-25xfrm: add extack to xfrm_alloc_userspiSabrina Dubroca1-2/+3
Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2022-11-25xfrm: add extack to xfrm_do_migrateSabrina Dubroca1-1/+2
Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2022-11-22net: dsa: move tag_8021q headers to their proper placeVladimir Oltean1-0/+1
tag_8021q definitions are all over the place. Some are exported to linux/dsa/8021q.h (visible by DSA core, taggers, switch drivers and everyone else), and some are in dsa_priv.h. Move the structures that don't need external visibility into tag_8021q.c, and the ones which don't need the world or switch drivers to see them into tag_8021q.h. We also have the tag_8021q.h inclusion from switch.c, which is basically the entire reason why tag_8021q.c was built into DSA in commit 8b6e638b4be2 ("net: dsa: build tag_8021q.c as part of DSA core"). I still don't know how to better deal with that, so leave it alone. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-22net: dsa: unexport dsa_dev_to_net_device()Vladimir Oltean1-2/+0
dsa.o and dsa2.o are linked into the same dsa_core.o, there is no reason to export this symbol when its only caller is local. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>