aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-06-24net: Move per-CPU flush-lists to bpf_net_context on PREEMPT_RT.Sebastian Andrzej Siewior4-32/+52
The per-CPU flush lists, which are accessed from within the NAPI callback (xdp_do_flush() for instance), are per-CPU. There are subject to the same problem as struct bpf_redirect_info. Add the per-CPU lists cpu_map_flush_list, dev_map_flush_list and xskmap_map_flush_list to struct bpf_net_context. Add wrappers for the access. The lists initialized on first usage (similar to bpf_net_ctx_get_ri()). Cc: "Björn Töpel" <bjorn@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jonathan Lemon <jonathan.lemon@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Cc: Magnus Karlsson <magnus.karlsson@intel.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Yonghong Song <yonghong.song@linux.dev> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-16-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net: Reference bpf_redirect_info via task_struct on PREEMPT_RT.Sebastian Andrzej Siewior9-45/+114
The XDP redirect process is two staged: - bpf_prog_run_xdp() is invoked to run a eBPF program which inspects the packet and makes decisions. While doing that, the per-CPU variable bpf_redirect_info is used. - Afterwards xdp_do_redirect() is invoked and accesses bpf_redirect_info and it may also access other per-CPU variables like xskmap_flush_list. At the very end of the NAPI callback, xdp_do_flush() is invoked which does not access bpf_redirect_info but will touch the individual per-CPU lists. The per-CPU variables are only used in the NAPI callback hence disabling bottom halves is the only protection mechanism. Users from preemptible context (like cpu_map_kthread_run()) explicitly disable bottom halves for protections reasons. Without locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. PREEMPT_RT has forced-threaded interrupts enabled and every NAPI-callback runs in a thread. If each thread has its own data structure then locking can be avoided. Create a struct bpf_net_context which contains struct bpf_redirect_info. Define the variable on stack, use bpf_net_ctx_set() to save a pointer to it, bpf_net_ctx_clear() removes it again. The bpf_net_ctx_set() may nest. For instance a function can be used from within NET_RX_SOFTIRQ/ net_rx_action which uses bpf_net_ctx_set() and NET_TX_SOFTIRQ which does not. Therefore only the first invocations updates the pointer. Use bpf_net_ctx_get_ri() as a wrapper to retrieve the current struct bpf_redirect_info. The returned data structure is zero initialized to ensure nothing is leaked from stack. This is done on first usage of the struct. bpf_net_ctx_set() sets bpf_redirect_info::kern_flags to 0 to note that initialisation is required. First invocation of bpf_net_ctx_get_ri() will memset() the data structure and update bpf_redirect_info::kern_flags. bpf_redirect_info::nh is excluded from memset because it is only used once BPF_F_NEIGH is set which also sets the nh member. The kern_flags is moved past nh to exclude it from memset. The pointer to bpf_net_context is saved task's task_struct. Using always the bpf_net_context approach has the advantage that there is almost zero differences between PREEMPT_RT and non-PREEMPT_RT builds. Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Yonghong Song <yonghong.song@linux.dev> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-15-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net: Use nested-BH locking for bpf_scratchpad.Sebastian Andrzej Siewior1-2/+9
bpf_scratchpad is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-14-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24seg6: Use nested-BH locking for seg6_bpf_srh_states.Sebastian Andrzej Siewior3-8/+18
The access to seg6_bpf_srh_states is protected by disabling preemption. Based on the code, the entry point is input_action_end_bpf() and every other function (the bpf helper functions bpf_lwt_seg6_*()), that is accessing seg6_bpf_srh_states, should be called from within input_action_end_bpf(). input_action_end_bpf() accesses seg6_bpf_srh_states first at the top of the function and then disables preemption. This looks wrong because if preemption needs to be disabled as part of the locking mechanism then the variable shouldn't be accessed beforehand. Looking at how it is used via test_lwt_seg6local.sh then input_action_end_bpf() is always invoked from softirq context. If this is always the case then the preempt_disable() statement is superfluous. If this is not always invoked from softirq then disabling only preemption is not sufficient. Replace the preempt_disable() statement with nested-BH locking. This is not an equivalent replacement as it assumes that the invocation of input_action_end_bpf() always occurs in softirq context and thus the preempt_disable() is superfluous. Add a local_lock_t the data structure and use local_lock_nested_bh() for locking. Add lockdep_assert_held() to ensure the lock is held while the per-CPU variable is referenced in the helper functions. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: David Ahern <dsahern@kernel.org> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-13-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24lwt: Don't disable migration prio invoking BPF.Sebastian Andrzej Siewior1-4/+2
There is no need to explicitly disable migration if bottom halves are also disabled. Disabling BH implies disabling migration. Remove migrate_disable() and rely solely on disabling BH to remain on the same CPU. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-12-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24dev: Use nested-BH locking for softnet_data.process_queue.Sebastian Andrzej Siewior2-1/+12
softnet_data::process_queue is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. softnet_data::input_queue_head can be updated lockless. This is fine because this value is only update CPU local by the local backlog_napi thread. Add a local_lock_t to softnet_data and use local_lock_nested_bh() for locking of process_queue. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-11-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24dev: Remove PREEMPT_RT ifdefs from backlog_lock.*().Sebastian Andrzej Siewior1-4/+4
The backlog_napi locking (previously RPS) relies on explicit locking if either RPS or backlog NAPI is enabled. If both are disabled then locking was achieved by disabling interrupts except on PREEMPT_RT. PREEMPT_RT was excluded because the needed synchronisation was already provided local_bh_disable(). Since the introduction of backlog NAPI and making it mandatory for PREEMPT_RT the ifdef within backlog_lock.*() is obsolete and can be removed. Remove the ifdefs in backlog_lock.*(). Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-10-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net: softnet_data: Make xmit per task.Sebastian Andrzej Siewior5-12/+80
Softirq is preemptible on PREEMPT_RT. Without a per-CPU lock in local_bh_disable() there is no guarantee that only one device is transmitting at a time. With preemption and multiple senders it is possible that the per-CPU `recursion' counter gets incremented by different threads and exceeds XMIT_RECURSION_LIMIT leading to a false positive recursion alert. The `more' member is subject to similar problems if set by one thread for one driver and wrongly used by another driver within another thread. Instead of adding a lock to protect the per-CPU variable it is simpler to make xmit per-task. Sending and receiving skbs happens always in thread context anyway. Having a lock to protected the per-CPU counter would block/ serialize two sending threads needlessly. It would also require a recursive lock to ensure that the owner can increment the counter further. Make the softnet_data.xmit a task_struct member on PREEMPT_RT. Add needed wrapper. Cc: Ben Segall <bsegall@google.com> Cc: Daniel Bristot de Oliveira <bristot@redhat.com> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-9-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24netfilter: br_netfilter: Use nested-BH locking for brnf_frag_data_storage.Sebastian Andrzej Siewior1-4/+16
brnf_frag_data_storage is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: Florian Westphal <fw@strlen.de> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Nikolay Aleksandrov <razor@blackwall.org> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Roopa Prabhu <roopa@nvidia.com> Cc: bridge@lists.linux.dev Cc: coreteam@netfilter.org Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-8-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net/ipv4: Use nested-BH locking for ipv4_tcp_sk.Sebastian Andrzej Siewior2-4/+16
ipv4_tcp_sk is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Make a struct with a sock member (original ipv4_tcp_sk) and a local_lock_t and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: David Ahern <dsahern@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-7-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net/tcp_sigpool: Use nested-BH locking for sigpool_scratch.Sebastian Andrzej Siewior1-4/+13
sigpool_scratch is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Make a struct with a pad member (original sigpool_scratch) and a local_lock_t and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Cc: David Ahern <dsahern@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-6-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net: Use nested-BH locking for napi_alloc_cache.Sebastian Andrzej Siewior1-5/+24
napi_alloc_cache is a per-CPU variable and relies on disabled BH for its locking. Without per-CPU locking in local_bh_disable() on PREEMPT_RT this data structure requires explicit locking. Add a local_lock_t to the data structure and use local_lock_nested_bh() for locking. This change adds only lockdep coverage and does not alter the functional behaviour for !PREEMPT_RT. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-5-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24net: Use __napi_alloc_frag_align() instead of open coding it.Sebastian Andrzej Siewior1-6/+2
The else condition within __netdev_alloc_frag_align() is an open coded __napi_alloc_frag_align(). Use __napi_alloc_frag_align() instead of open coding it. Move fragsz assignment before page_frag_alloc_align() invocation because __napi_alloc_frag_align() also contains this statement. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-4-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24locking/local_lock: Add local nested BH locking infrastructure.Sebastian Andrzej Siewior4-0/+52
Add local_lock_nested_bh() locking. It is based on local_lock_t and the naming follows the preempt_disable_nested() example. For !PREEMPT_RT + !LOCKDEP it is a per-CPU annotation for locking assumptions based on local_bh_disable(). The macro is optimized away during compilation. For !PREEMPT_RT + LOCKDEP the local_lock_nested_bh() is reduced to the usual lock-acquire plus lockdep_assert_in_softirq() - ensuring that BH is disabled. For PREEMPT_RT local_lock_nested_bh() acquires the specified per-CPU lock. It does not disable CPU migration because it relies on local_bh_disable() disabling CPU migration. With LOCKDEP it performans the usual lockdep checks as with !PREEMPT_RT. Due to include hell the softirq check has been moved spinlock.c. The intention is to use this locking in places where locking of a per-CPU variable relies on BH being disabled. Instead of treating disabled bottom halves as a big per-CPU lock, PREEMPT_RT can use this to reduce the locking scope to what actually needs protecting. A side effect is that it also documents the protection scope of the per-CPU variables. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-3-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24locking/local_lock: Introduce guard definition for local_lock.Sebastian Andrzej Siewior1-0/+11
Introduce lock guard definition for local_lock_t. There are no users yet. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Link: https://patch.msgid.link/20240620132727.660738-2-bigeasy@linutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-24MAINTAINERS: adjust file entry in FREESCALE QORIQ DPAA FMAN DRIVERLukas Bulwahn1-1/+1
Commit 243996d172a6 ("dt-bindings: net: Convert fsl-fman to yaml") splits the previous dt text file into four yaml files. It adjusts a corresponding file entry in MAINTAINERS from txt to yaml, but this adjustment misses that the file was split and renamed. Hence, ./scripts/get_maintainer.pl --self-test=patterns complains about a broken reference. Adjust the file entry to match the four yaml files resulting from this commit above. Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-23Merge branch '100GbE' of ↵David S. Miller9-41/+90
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== ice: prepare representor for SF support Michal Swiatkowski says: This is a series to prepare port representor for supporting also subfunctions. We need correct devlink locking and the possibility to update parent VSI after port representor is created. Refactor how devlink lock is taken to suite the subfunction use case. VSI configuration needs to be done after port representor is created. Port representor needs only allocated VSI. It doesn't need to be configured before. VSI needs to be reconfigured when update function is called. The code for this patchset was split from (too big) patchset [1]. [1] https://lore.kernel.org/netdev/20240213072724.77275-1-michal.swiatkowski@linux.intel.com/ --- Originally from https://lore.kernel.org/netdev/20240605-next-2024-06-03-intel-next-batch-v2-0-39c23963fa78@intel.com/ Changes: - delete ice_repr_get_by_vsi() from header - rephrase commit message in moving devlink locking ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-22net: xilinx: axienet: Enable multicast by defaultSean Anderson1-1/+0
We support multicast addresses, so enable it by default. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21Merge tag 'linux-can-next-for-6.11-20240621' of ↵Jakub Kicinski19-173/+613
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next Marc Kleine-Budde says: ==================== pull-request: can-next 2024-06-21 The first 2 patches are by Andy Shevchenko, one cleans up the includes in the mcp251x driver, the other one updates the sja100 plx_pci driver to make use of predefines PCI subvendor ID. Mans Rullgard's patch cleans up the Kconfig help text of for the slcan driver. Oliver Hartkopp provides a patch to update the documentation, which removes the ISO 15675-2 specification version where possible. The next 2 patches are by Harini T and update the documentation of the xilinx_can driver. Francesco Valla provides documentation for the ISO 15765-2 protocol. A patch by Dr. David Alan Gilbert removes an unused struct from the mscan driver. 12 patches are by Martin Jocic. The first three add support for 3 new devices to the kvaser_usb driver. The remaining 9 first clean up the kvaser_pciefd driver, and then add support for MSI. Krzysztof Kozlowski contributes 3 patches simplifies the CAN SPI drivers by making use of spi_get_device_match_data(). The last patch is by Martin Hundebøll, which reworks the m_can driver to not enable the CAN transceiver during probe. * tag 'linux-can-next-for-6.11-20240621' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next: (24 commits) can: m_can: don't enable transceiver when probing can: mcp251xfd: simplify with spi_get_device_match_data() can: mcp251x: simplify with spi_get_device_match_data() can: hi311x: simplify with spi_get_device_match_data() can: kvaser_pciefd: Add MSI interrupts can: kvaser_pciefd: Move reset of DMA RX buffers to the end of the ISR can: kvaser_pciefd: Change name of return code variable can: kvaser_pciefd: Rename board_irq to pci_irq can: kvaser_pciefd: Add unlikely can: kvaser_pciefd: Add inline can: kvaser_pciefd: Remove unnecessary comment can: kvaser_pciefd: Skip redundant NULL pointer check in ISR can: kvaser_pciefd: Group #defines together can: kvaser_usb: Add support for Kvaser Mini PCIe 1xCAN can: kvaser_usb: Add support for Kvaser USBcan Pro 5xCAN can: kvaser_usb: Add support for Vining 800 can: mscan: remove unused struct 'mscan_state' Documentation: networking: document ISO 15765-2 can: xilinx_can: Document driver description to list all supported IPs can: isotp: remove ISO 15675-2 specification version where possible ... ==================== Link: https://patch.msgid.link/20240621080201.305471-1-mkl@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-06-21ice: update representor when VSI is readyMichal Swiatkowski3-10/+17
In case of reset of VF VSI can be reallocated. To handle this case it should be properly updated. Reload representor as vsi->vsi_num can be different than the one stored when representor was created. Instead of only changing antispoof do whole VSI configuration for eswitch. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-21ice: move VSI configuration outside repr setupMichal Swiatkowski3-15/+57
It is needed because subfunction port representor shouldn't configure the source VSI during representor creation. Move the code to separate function and call it only in case the VF port representor is being created. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-21ice: move devlink locking outside the port creationMichal Swiatkowski4-8/+9
In case of subfunction lock will be taken for whole port creation and removing. Do the same in VF case. Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-21ice: store representor ID in bridge portMichal Swiatkowski4-8/+7
It is used to get representor structure during cleaning. Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Tested-by: Sujai Buvaneswaran <sujai.buvaneswaran@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2024-06-21selftests: net: change shebang to bash in amt.shTaehee Yoo1-1/+1
amt.sh is written in bash, not sh. So, shebang should be bash. Signed-off-by: Taehee Yoo <ap420073@gmail.com> Acked-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21net: ethernet: rtsn: Add support for Renesas Ethernet-TSNNiklas Söderlund5-0/+1875
Add initial support for Renesas Ethernet-TSN End-station device of R-Car V4H. The Ethernet End-station can connect to an Ethernet network using a 10 Mbps, 100 Mbps, or 1 Gbps full-duplex link via MII/GMII/RMII/RGMII. Depending on the connected PHY. The driver supports Rx checksum and offload and hardware timestamps. While full power management and suspend/resume is not yet supported the driver enables runtime PM in order to enable the module clock. While explicit clock management using clk_enable() would suffice for the supported SoC, the module could be reused on SoCs where the module is part of a power domain. Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21Merge branch 'qca8k-cleanup-and-port-isolation'David S. Miller2-49/+70
Matthias Schiffer says: ==================== net: dsa: qca8k: cleanup and port isolation A small cleanup patch, and basically the same changes that were just accepted for mt7530 to implement port isolation. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21net: dsa: qca8k: add support for bridge port isolationMatthias Schiffer2-2/+21
Remove a pair of ports from the port matrix when both ports have the isolated flag set. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21net: dsa: qca8k: factor out bridge join/leave logicMatthias Schiffer1-51/+50
Most of the logic in qca8k_port_bridge_join() and qca8k_port_bridge_leave() is the same. Refactor to reduce duplication and prepare for reusing the code for implementing bridge port isolation. dsa_port_offloads_bridge_dev() is used instead of dsa_port_offloads_bridge(), passing the bridge in as a struct netdevice *, as we won't have a struct dsa_bridge in qca8k_port_bridge_flags(). The error handling is changed slightly in the bridge leave case, returning early and emitting an error message when a regmap access fails. This shouldn't matter in practice, as there isn't much we can do if communication with the switch breaks down in the middle of reconfiguration. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21net: dsa: qca8k: do not write port mask twice in bridge join/leaveMatthias Schiffer1-2/+5
qca8k_port_bridge_join() set QCA8K_PORT_LOOKUP_CTRL() for i == port twice, once in the loop handling all other port's masks, and finally at the end with the accumulated port_mask. The first time it would incorrectly set the port's own bit in the mask, only to correct the mistake a moment later. qca8k_port_bridge_leave() had the same issue, but here the regmap_clear_bits() was a no-op rather than setting an unintended value. Remove the duplicate assignment by skipping the whole loop iteration for i == port. The unintended bit setting doesn't seem to have any negative effects (even when not reverted right away), so the change is submitted as a simple cleanup rather than a fix. Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net> Reviewed-by: Wojciech Drewek <wojciech.drewek@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21Merge branch 'net-mscc-miim-switch-reset'David S. Miller2-0/+18
Herve Codina says: ==================== Handle switch reset in mscc-miim These two patches were previously sent as part of a bigger series: https://lore.kernel.org/lkml/20240527161450.326615-1-herve.codina@bootlin.com/ v1 and v2 iterations were handled during the v1 and v2 reviews of this bigger series. As theses two patches are now ready to be applied, they were extracted from the bigger series and sent alone in this current series. This current v3 series takes into account feedback received during the bigger series v2 review. Changes v2 -> v3 - patch 1 Drop one useless sentence. Add 'Reviewed-by: Andrew Lunn <andrew@lunn.ch>' Add 'Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>' - patch 2 Add 'Reviewed-by: Andrew Lunn <andrew@lunn.ch>' Changes v1 -> v2 (as part of the bigger series iterations) - Patch 1 Improve the reset property description - Patch 2 Fix a wrong reverse x-mass tree declaration ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21net: mdio: mscc-miim: Handle the switch resetHerve Codina1-0/+8
The mscc-miim device can be impacted by the switch reset, at least when this device is part of the LAN966x PCI device. Handle this newly added (optional) resets property. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21dt-bindings: net: mscc-miim: Add resets propertyHerve Codina1-0/+10
Add the (optional) resets property. The mscc-miim device is impacted by the switch reset especially when the mscc-miim device is used as part of the LAN966x PCI device. Signed-off-by: Herve Codina <herve.codina@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21Merge branch 'l2tp-sk_user_data'David S. Miller7-265/+308
James Chapman says: ==================== l2tp: don't use the tunnel socket's sk_user_data in datapath This series refactors l2tp to not use the tunnel socket's sk_user_data in the datapath. The main reasons for doing this are * to allow for simplifying internal socket cleanup code (to be done in a later series) * to support multiple L2TPv3 UDP tunnels using the same 5-tuple address When handling received UDP frames, l2tp's current approach is to look up a session in a per-tunnel list. l2tp uses the tunnel socket's sk_user_data to derive the tunnel context from the receiving socket. But this results in the socket and tunnel lifetimes being very tightly coupled and the tunnel/socket cleanup paths being complicated. The latter has historically been a source of l2tp bugs and makes the code more difficult to maintain. Also, if sockets are aliased, we can't trust that the socket's sk_user_data references the right tunnel anyway. Hence the desire to not use sk_user_data in the datapath. The new approach is to lookup sessions in a per-net session list without first deriving the tunnel: * For L2TPv2, the l2tp header has separate tunnel ID and session ID fields which can be trivially combined to make a unique 32-bit key for per-net session lookup. * For L2TPv3, there is no tunnel ID in the packet header, only a session ID, which should be unique over all tunnels so can be used as a key for per-net session lookup. However, this only works when the L2TPv3 session ID really is unique over all tunnels. At least one L2TPv3 application is known to use the same session ID in different L2TPv3 UDP tunnels, relying on UDP address/ports to distinguish them. This worked previously because sessions in UDP tunnels were kept in a per-tunnel list. To retain support for this, L2TPv3 session ID collisions are managed using a separate per-net session hlist, keyed by ID and sk. When looking up a session by ID, if there's more than one match, sk is used to find the right one. L2TPv3 sessions in IP-encap tunnels are already looked up by session ID in a per-net list. This work has UDP sessions also use the per-net session list, while allowing for session ID collisions. The existing per-tunnel hlist becomes a plain list since it is used only in management and cleanup paths to walk a list of sessions in a given tunnel. For better performance, the per-net session lists use IDR. Separate IDRs are used for L2TPv2 and L2TPv3 sessions to avoid potential key collisions. These changes pass l2tp regression tests and improve data forwarding performance by about 10% in some of my test setups. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: replace hlist with simple list for per-tunnel session listJames Chapman3-91/+50
The per-tunnel session list is no longer used by the datapath. However, we still need a list of sessions in the tunnel for l2tp_session_get_nth, which is used by management code. (An alternative might be to walk each session IDR list, matching only sessions of a given tunnel.) Replace the per-tunnel hlist with a per-tunnel list. In functions which walk a list of sessions of a tunnel, walk this list instead. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: drop the now unused l2tp_tunnel_get_sessionJames Chapman2-24/+0
All users of l2tp_tunnel_get_session are now gone so it can be removed. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: use IDR for all session lookupsJames Chapman4-4/+20
Add generic session getter which uses IDR. Replace all users of l2tp_tunnel_get_session which uses the per-tunnel session list to use the generic getter. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: don't use sk_user_data in l2tp_udp_encap_err_recvJames Chapman1-6/+0
If UDP sockets are aliased, sk might be the wrong socket. There's no benefit to using sk_user_data to do some checks on the associated tunnel context. Just report the error anyway, like udp core does. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: refactor udp recv to lookup to not use sk_user_dataJames Chapman1-75/+21
Modify UDP decap to not use the tunnel pointer which comes from the sock's sk_user_data when parsing the L2TP header. By looking up the destination session using only the packet contents we avoid potential UDP 5-tuple aliasing issues which arise from depending on the socket that received the packet. Drop the useless error messages on short packet or on failing to find a session since the tunnel pointer might point to a different tunnel if multiple sockets use the same 5-tuple. Short packets (those not big enough to contain an L2TP header) are no longer counted in the tunnel's invalid counter because we can't derive the tunnel until we parse the l2tp header to lookup the session. l2tp_udp_encap_recv was a small wrapper around l2tp_udp_recv_core which used sk_user_data to derive a tunnel pointer in an RCU-safe way. But we no longer need the tunnel pointer, so remove that code and combine the two functions. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: store l2tpv2 sessions in per-net IDRJames Chapman2-15/+56
L2TPv2 sessions are currently kept in a per-tunnel hashlist, keyed by 16-bit session_id. When handling received L2TPv2 packets, we need to first derive the tunnel using the 16-bit tunnel_id or sock, then lookup the session in a per-tunnel hlist using the 16-bit session_id. We want to avoid using sk_user_data in the datapath and double lookups on every packet. So instead, use a per-net IDR to hold L2TPv2 sessions, keyed by a 32-bit value derived from the 16-bit tunnel_id and session_id. This will allow the L2TPv2 UDP receive datapath to lookup a session with a single lookup without deriving the tunnel first. L2TPv2 sessions are held in their own IDR to avoid potential key collisions with L2TPv3 sessions. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: store l2tpv3 sessions in per-net IDRJames Chapman4-74/+188
L2TPv3 sessions are currently held in one of two fixed-size hash lists: either a per-net hashlist (IP-encap), or a per-tunnel hashlist (UDP-encap), keyed by the L2TPv3 32-bit session_id. In order to lookup L2TPv3 sessions in UDP-encap tunnels efficiently without finding the tunnel first via sk_user_data, UDP sessions are now kept in a per-net session list, keyed by session ID. Convert the existing per-net hashlist to use an IDR for better performance when there are many sessions and have L2TPv3 UDP sessions use the same IDR. Although the L2TPv3 RFC states that the session ID alone identifies the session, our implementation has allowed the same session ID to be used in different L2TP UDP tunnels. To retain support for this, a new per-net session hashtable is used, keyed by the sock and session ID. If on creating a new session, a session already exists with that ID in the IDR, the colliding sessions are added to the new hashtable and the existing IDR entry is flagged. When looking up sessions, the approach is to first check the IDR and if no unflagged match is found, check the new hashtable. The sock is made available to session getters where session ID collisions are to be considered. In this way, the new hashtable is used only for session ID collisions so can be kept small. For managing session removal, we need a list of colliding sessions matching a given ID in order to update or remove the IDR entry of the ID. This is necessary to detect session ID collisions when future sessions are created. The list head is allocated on first collision of a given ID and refcounted. Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21l2tp: remove unused list_head member in l2tp_tunnelJames Chapman2-3/+0
Remove an unused variable in struct l2tp_tunnel which was left behind by commit c4d48a58f32c5 ("l2tp: convert l2tp_tunnel_list to idr"). Signed-off-by: James Chapman <jchapman@katalix.com> Reviewed-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21octeontx2-pf: Add ucast filter count configurability via devlink.Sai Krishna5-14/+95
The existing method of reserving unicast filter count leads to wasted MCAM entries if the functionality is not used or fewer entries are used. Furthermore, the amount of MCAM entries differs amongst Octeon SoCs. We implemented a means to adjust the UC filter count via devlink, allowing for better use of MCAM entries across Netdev apps. commands: To get the current unicast filter count # devlink dev param show pci/0002:02:00.0 name unicast_filter_count To change/set the unicast filter count # devlink dev param set pci/0002:02:00.0 name unicast_filter_count value 5 cmode runtime Signed-off-by: Sai Krishna <saikrishnag@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21docs: net: document guidance of implementing the SR-IOV NDOsJakub Kicinski2-0/+26
New drivers were prevented from adding ndo_set_vf_* callbacks over the last few years. This was expected to result in broader switchdev adoption, but seems to have had little effect. Based on recent netdev meeting there is broad support for allowing adding those ops. There is a problem with the current API supporting a limited number of VFs (100+, which is less than some modern HW supports). We can try to solve it by adding similar functionality on devlink ports, but that'd be another API variation to maintain. So a netlink attribute reshuffling is a more likely outcome. Document the guidance, make it clear that the API is frozen. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21net: dsa: ksz_common: Allow only up to two HSR HW offloaded ports for KSZ9477Lukasz Majewski1-0/+7
The KSZ9477 allows HSR in-HW offloading for any of two selected ports. This patch adds check if one tries to use more than two ports with HSR offloading enabled. The problem is with RedBox configuration (HSR-SAN) - when configuring: ip link add name hsr0 type hsr slave1 lan1 slave2 lan2 interlink lan3 \ supervision 45 version 1 The lan1 (port0) and lan2 (port1) are correctly configured as ports, which can use HSR offloading on ksz9477. However, when we do already have two bits set in hsr_ports, we need to return (-ENOTSUPP), so the interlink port (lan3) would be used with SW based HSR RedBox support. Otherwise, I do see some strange network behavior, as some HSR frames are visible on non-HSR network and vice versa. This causes the switch connected to interlink port (lan3) to drop frames and no communication is possible. Moreover, conceptually - the interlink (i.e. HSR-SAN port - lan3/port2) shall be only supported in software as it is also possible to use ksz9477 with only SW based HSR (i.e. port0/1 -> hsr0 with offloading, port2 -> HSR-SAN/interlink, port4/5 -> hsr1 with SW based HSR). Fixes: 5055cccfc2d1 ("net: hsr: Provide RedBox support (HSR-SAN)") Signed-off-by: Lukasz Majewski <lukma@denx.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21net: fec: Fix FEC_ECR_EN1588 being cleared on link-downCsókás, Bence1-0/+6
FEC_ECR_EN1588 bit gets cleared after MAC reset in `fec_stop()`, which makes all 1588 functionality shut down, and all the extended registers disappear, on link-down, making the adapter fall back to compatibility "dumb mode". However, some functionality needs to be retained (e.g. PPS) even without link. Fixes: 6605b730c061 ("FEC: Add time stamping code and a PTP hardware clock") Cc: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/netdev/5fa9fadc-a89d-467a-aae9-c65469ff5fe1@lunn.ch/ Signed-off-by: Csókás, Bence <csokas.bence@prolan.hu> Reviewed-by: Wei Fang <wei.fang@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21Merge branch 'bnxt_en-netdev_queue_mgmt_ops'David S. Miller1-107/+468
David Wei says: ==================== bnxt_en: implement netdev_queue_mgmt_ops Implement netdev_queue_mgmt_ops for bnxt added in [1]. This will be used in the io_uring ZC Rx patchset to configure queues with a custom page pool w/ a special memory provider for zero copy support. The first two patches prep the driver, while the final patch adds the implementation. Any arbitrary Rx queue can be reset without affecting other queues. V2 and prior of this patchset was thought to only support resetting queues not in the main RSS context. Upon further testing I realised moving queues out and calling bnxt_hwrm_vnic_update() wasn't necessary. I didn't include the netdev core API using this netdev_queue_mgmt_ops because Mina is adding it in his devmem TCP series [2]. But I'm happy to include it if folks want to include a user with this series. I tested this series on BCM957504-N1100FY4 with FW 229.1.123.0. I manually injected failures at all the places that can return an errno and confirmed that the device/queue is never left in a broken state. [1]: https://lore.kernel.org/netdev/20240501232549.1327174-2-shailend@google.com/ [2]: https://lore.kernel.org/netdev/20240607005127.3078656-2-almasrymina@google.com/ v3: - tested w/o bnxt_hwrm_vnic_update() and it works on any queue - removed unneeded code v2: - fix broken build - remove unused var in bnxt_init_one_rx_ring() ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21bnxt_en: implement netdev_queue_mgmt_opsDavid Wei1-0/+275
Implement netdev_queue_mgmt_ops for bnxt added in [1]. Two bnxt_rx_ring_info structs are allocated to hold the new/old queue memory. Queue memory is copied from/to the main bp->rx_ring[idx] bnxt_rx_ring_info. Queue memory is pre-allocated in bnxt_queue_mem_alloc() into a clone, and then copied into bp->rx_ring[idx] in bnxt_queue_mem_start(). Similarly, when bp->rx_ring[idx] is stopped its queue memory is copied into a clone, and then freed later in bnxt_queue_mem_free(). I tested this patchset with netdev_rx_queue_restart(), including inducing errors in all places that returns an error code. In all cases, the queue is left in a good working state. Rx queues are created/destroyed using bnxt_hwrm_rx_ring_alloc() and bnxt_hwrm_rx_ring_free(), which issue HWRM_RING_ALLOC and HWRM_RING_FREE commands respectively to the firmware. By the time a HWRM_RING_FREE response is received, there won't be any more completions from that queue. Thanks to Somnath for helping me with this patch. With their permission I've added them as Acked-by. [1]: https://lore.kernel.org/netdev/20240501232549.1327174-2-shailend@google.com/ Acked-by: Somnath Kotur <somnath.kotur@broadcom.com> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21bnxt_en: split rx ring helpers out from ring helpersDavid Wei1-107/+193
To prepare for queue API implementation, split rx ring functions out from ring helpers. These new helpers will be called from queue API implementation. Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21Merge branch 'net-cleanup-arc-emac'David S. Miller6-154/+2
Johan Jonker says: ==================== cleanup arc emac The Rockchip emac binding for rk3036/rk3066/rk3188 has been converted to YAML with the ethernet-phy node in a mdio node. This requires some driver fixes by someone that can do hardware testing. In order to make a future fix easier make the driver 'Rockchip only' by removing the obsolete part of the arc emac driver. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-06-21dt-bindings: net: remove arc_emac.txtJohan Jonker1-46/+0
The last real user nSIM_700 of the "snps,arc-emac" compatible string in a driver was removed in 2019. The use of this string in the combined DT of rk3066a/rk3188 as place holder has also been replaced, so remove arc_emac.txt Signed-off-by: Johan Jonker <jbx6244@gmail.com> Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>