aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-08-16net: qualcomm: Remove redundant of_match_ptr()Ruan Jinjie1-1/+1
The driver depends on CONFIG_OF, it is not necessary to use of_match_ptr() here. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: gemini: Remove redundant of_match_ptr()Ruan Jinjie1-2/+2
The driver depends on CONFIG_OF, it is not necessary to use of_match_ptr() here. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: dsa: rzn1-a5psw: Remove redundant of_match_ptr()Ruan Jinjie1-1/+1
The driver depends on CONFIG_OF, it is not necessary to use of_match_ptr() here. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: dsa: realtek: Remove redundant of_match_ptr()Ruan Jinjie2-2/+2
The driver depends on CONFIG_OF, it is not necessary to use of_match_ptr() here. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16nfc: virtual_ncidev: Use module_misc_device macro to simplify the codeLi Zetao1-12/+1
Use the module_misc_device macro to simplify the code, which is the same as declaring with module_init() and module_exit(). Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16Merge branch 'hns3-ethtool'David S. Miller12-681/+875
Jijie Shao says: ==================== hns3: refactor registers information for ethtool -d refactor registers information for ethtool -d ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: hns3: fix wrong rpu tln reg issueJijie Shao4-2/+71
In the original RPU query command, the status register values of multiple RPU tunnels are accumulated by default, which is unreasonable. This patch Fix it by querying the specified tunnel ID. The tunnel number of the device can be obtained from firmware during initialization. Fixes: ddb54554fa51 ("net: hns3: add DFX registers information for ethtool -d") Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: hns3: Support tlv in regs data for HNS3 VF driverJijie Shao1-24/+61
The dump register function is being refactored. The third step in refactoring is to support tlv info in regs data for HNS3 PF driver. Currently, if we use "ethtool -d" to dump regs value, the output is as follows: offset1: 00 01 02 03 04 05 ... offset2:10 11 12 13 14 15 ... ...... We can't get the value of a register directly. This patch deletes the original separator information and add tag_len_value information in regs data. ethtool can parse register data in key-value format by -d command. a patch will be added to the ethtool to parse regs data in the following format: reg1 : value2 reg2 : value2 ...... Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: hns3: Support tlv in regs data for HNS3 PF driverJijie Shao1-65/+102
The dump register function is being refactored. The second step in refactoring is to support tlv info in regs data for HNS3 PF driver. Currently, if we use "ethtool -d" to dump regs value, the output is as follows: offset1: 00 01 02 03 04 05 ... offset2:10 11 12 13 14 15 ... ...... We can't get the value of a register directly. This patch deletes the original separator information and add tag_len_value information in regs data. ethtool can parse register data in key-value format by -d command. a patch will be added to the ethtool to parse regs data in the following format: reg1 : value2 reg2 : value2 ...... Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: hns3: move dump regs function to a separate fileJijie Shao10-680/+731
The dump register function is being refactored. The first step in refactoring is put the dump regs function into a separate file. Signed-off-by: Jijie Shao <shaojijie@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16Merge branch 'fec-XDP_TX'David S. Miller2-61/+132
Wei Fang says: ==================== net: fec: add XDP_TX feature support This patch set is to support the XDP_TX feature of FEC driver, the first patch is add initial XDP_TX support, and the second patch improves the performance of XDP_TX by not using xdp_convert_buff_to_frame(). Please refer to the commit message of each patch for more details. ==================== Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: fec: improve XDP_TX performanceWei Fang2-70/+75
As suggested by Jesper and Alexander, we can avoid converting xdp_buff to xdp_frame in case of XDP_TX to save a bunch of CPU cycles, so that we can further improve the XDP_TX performance. Before this patch on i.MX8MP-EVK board, the performance shows as follows. root@imx8mpevk:~# ./xdp2 eth0 proto 17: 353918 pkt/s proto 17: 352923 pkt/s proto 17: 353900 pkt/s proto 17: 352672 pkt/s proto 17: 353912 pkt/s proto 17: 354219 pkt/s After applying this patch, the performance is improved. root@imx8mpevk:~# ./xdp2 eth0 proto 17: 369261 pkt/s proto 17: 369267 pkt/s proto 17: 369206 pkt/s proto 17: 369214 pkt/s proto 17: 369126 pkt/s proto 17: 369272 pkt/s Signed-off-by: Wei Fang <wei.fang@nxp.com> Suggested-by: Alexander Lobakin <aleksander.lobakin@intel.com> Suggested-by: Jesper Dangaard Brouer <hawk@kernel.org> Reviewed-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16net: fec: add XDP_TX feature supportWei Fang2-21/+87
The XDP_TX feature is not supported before, and all the frames which are deemed to do XDP_TX action actually do the XDP_DROP action. So this patch adds the XDP_TX support to FEC driver. I tested the performance of XDP_TX in XDP_DRV mode and XDP_SKB mode respectively on i.MX8MP-EVK platform, and as suggested by Jesper, I also tested the performance of XDP_REDIRECT on the same platform. And the test steps and results are as follows. XDP_TX test: Step 1: One board is used as generator and connects to switch,and the FEC port of DUT also connects to the switch. Both boards with flow control off. Then the generator runs the pktgen_sample03_burst_single_flow.sh script to generate and send burst traffic to DUT. Note that the size of packet was set to 64 bytes and the procotol of packet was UDP in my test scenario. In addition, the SMAC of the packet need to be different from the MAC of the generator, because the xdp2 program will swap the DMAC and SMAC of the packet and send it back to the generator. If the SMAC of the generated packet is the MAC of the generator, the generator will receive the returned traffic which increase the CPU loading and significantly degrade the transmit speed of the generator, and finally it affects the test of XDP_TX performance. Step 2: The DUT runs the xdp2 program to transmit received UDP packets back out on the same port where they were received. root@imx8mpevk:~# ./xdp2 eth0 proto 17: 353918 pkt/s proto 17: 352923 pkt/s proto 17: 353900 pkt/s proto 17: 352672 pkt/s proto 17: 353912 pkt/s proto 17: 354219 pkt/s root@imx8mpevk:~# ./xdp2 -S eth0 proto 17: 160604 pkt/s proto 17: 160708 pkt/s proto 17: 160564 pkt/s proto 17: 160684 pkt/s proto 17: 160640 pkt/s proto 17: 160720 pkt/s The above results show that the XDP_TX performance of XDP_DRV mode is much better than XDP_SKB mode, more than twice that of XDP_SKB mode, which is in line with our expectation. XDP_REDIRECT test: Step1: Both the generator and the FEC port of the DUT connet to the switch port. All the ports with flow control off, then the generator runs the pktgen script to generate and send burst traffic to DUT. Note that the size of packet was set to 64 bytes and the procotol of packet was UDP in my test scenario. Step2: The DUT runs the xdp_redirect program to redirect the traffic from the FEC port to the FEC port itself. root@imx8mpevk:~# ./xdp_redirect eth0 eth0 Redirecting from eth0 (ifindex 2; driver fec) to eth0 (ifindex 2; driver fec) Summary 232,302 rx/s 0 err,drop/s 232,344 xmit/s Summary 234,579 rx/s 0 err,drop/s 234,577 xmit/s Summary 235,548 rx/s 0 err,drop/s 235,549 xmit/s Summary 234,704 rx/s 0 err,drop/s 234,703 xmit/s Summary 235,504 rx/s 0 err,drop/s 235,504 xmit/s Summary 235,223 rx/s 0 err,drop/s 235,224 xmit/s Summary 234,509 rx/s 0 err,drop/s 234,507 xmit/s Summary 235,481 rx/s 0 err,drop/s 235,482 xmit/s Summary 234,684 rx/s 0 err,drop/s 234,683 xmit/s Summary 235,520 rx/s 0 err,drop/s 235,520 xmit/s Summary 235,461 rx/s 0 err,drop/s 235,461 xmit/s Summary 234,627 rx/s 0 err,drop/s 234,627 xmit/s Summary 235,611 rx/s 0 err,drop/s 235,611 xmit/s Packets received : 3,053,753 Average packets/s : 234,904 Packets transmitted : 3,053,792 Average transmit/s : 234,907 Compared the performance of XDP_TX with XDP_REDIRECT, XDP_TX is also much better than XDP_REDIRECT. It's also in line with our expectation. Signed-off-by: Wei Fang <wei.fang@nxp.com> Suggested-by: Jesper Dangaard Brouer <hawk@kernel.org> Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Larysa Zaremba <larysa.zaremba@intel.com> Acked-by: Jesper Dangaard Brouer <hawk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-16selftests: bonding: remove redundant delete action of device link1_1Zhengchao Shao1-1/+0
When run command "ip netns delete client", device link1_1 has been deleted. So, it is no need to delete link1_1 again. Remove it. Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-15Merge tag 'mlx5-updates-2023-08-14' of ↵Jakub Kicinski25-569/+672
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-08-14 1) Handle PTP out of order CQEs issue 2) Check FW status before determining reset successful 3) Expose maximum supported SFs via devlink resource 4) MISC cleanups * tag 'mlx5-updates-2023-08-14' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Don't query MAX caps twice net/mlx5: Remove unused MAX HCA capabilities net/mlx5: Remove unused CAPs net/mlx5: Fix error message in mlx5_sf_dev_state_change_handler() net/mlx5: Remove redundant check of mlx5_vhca_event_supported() net/mlx5: Use mlx5_sf_start_function_id() helper instead of directly calling MLX5_CAP_GEN() net/mlx5: Remove redundant SF supported check from mlx5_sf_hw_table_init() net/mlx5: Use auxiliary_device_uninit() instead of device_put() net/mlx5: E-switch, Add checking for flow rule destinations net/mlx5: Check with FW that sync reset completed successfully net/mlx5: Expose max possible SFs via devlink resource net/mlx5e: Add recovery flow for tx devlink health reporter for unhealthy PTP SQ net/mlx5e: Make tx_port_ts logic resilient to out-of-order CQEs net/mlx5: Consolidate devlink documentation in devlink/mlx5.rst ==================== Link: https://lore.kernel.org/r/20230814214144.159464-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15Merge branch 'net-warn-about-attempts-to-register-negative-ifindex'Jakub Kicinski3-3/+35
Jakub Kicinski says: ==================== net: warn about attempts to register negative ifindex Follow up to the recently posted fix for OvS lacking input validation: https://lore.kernel.org/all/20230814203840.2908710-1-kuba@kernel.org/ Warn about negative ifindex more explicitly and misc YNL updates. ==================== Link: https://lore.kernel.org/r/20230814205627.2914583-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15tools: ynl: add more info to KeyErrors on missing attrsJakub Kicinski1-3/+12
When developing specs its useful to know which attr space YNL was trying to find an attribute in on key error. Instead of printing: KeyError: 0 add info about the space: Exception: Space 'vport' has no attribute with value '0' Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230814205627.2914583-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15netlink: specs: add ovs_vport new commandJakub Kicinski1-0/+18
Add NEW to the spec, it was useful testing the fix for OvS input validation. Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://lore.kernel.org/r/20230814205627.2914583-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15net: warn about attempts to register negative ifindexJakub Kicinski1-0/+5
Since the xarray changes we mix returning valid ifindex and negative errno in a single int returned from dev_index_reserve(). This depends on the fact that ifindexes can't be negative. Otherwise we may insert into the xarray and return a very large negative value. This in turn may break ERR_PTR(). OvS is susceptible to this problem and lacking validation (fix posted separately for net). Reject negative ifindex explicitly. Add a warning because the input validation is better handled by the caller. Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230814205627.2914583-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15eth: r8152: try to use a normal budgetJakub Kicinski1-2/+1
Mario reports that loading r8152 on his system leads to a: netif_napi_add_weight() called with weight 256 warning getting printed. We don't have any solid data on why such high budget was chosen, and it may cause stalls in processing other softirqs and rt threads. So try to switch back to the default (64) weight. If this slows down someone's system we should investigate which part of stopping starting the NAPI poll in this driver are expensive. Reported-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/all/0bfd445a-81f7-f702-08b0-bd5a72095e49@amd.com/ Acked-by: Hayes Wang <hayeswang@realtek.com> Link: https://lore.kernel.org/r/20230814153521.2697982-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15net: e1000e: Remove unused declarationsYue Haibing1-2/+0
Commit bdfe2da6aefd ("e1000e: cosmetic move of function prototypes to the new mac.h") declared but never implemented them. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20230814135821.4808-1-yuehaibing@huawei.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15qed: remove unused 'resp_size' calculationArnd Bergmann1-28/+17
Newer versions of clang warn about this variable being assigned but never used: drivers/net/ethernet/qlogic/qed/qed_vf.c:63:67: error: parameter 'resp_size' set but not used [-Werror,-Wunused-but-set-parameter] There is no indication in the git history on how this was ever meant to be used, so just remove the entire calculation and argument passing for it to avoid the warning. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20230814074512.1067715-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15net: phy: mediatek-ge-soc: support PHY LEDsDaniel Golle1-9/+426
Implement netdev trigger and primitive bliking offloading as well as simple set_brigthness function for both PHY LEDs of the in-SoC PHYs found in MT7981 and MT7988. For MT7988, read boottrap register and apply LED polarities accordingly to get uniform behavior from all LEDs on MT7988. This requires syscon phandle 'mediatek,pio' present in parenting MDIO bus which should point to the syscon holding the boottrap register. Signed-off-by: Daniel Golle <daniel@makrotopia.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/dc324d48c00cd7350f3a506eaa785324cae97372.1691977904.git.daniel@makrotopia.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15Merge branch 'nexthop-various-cleanups'Jakub Kicinski1-6/+0
Ido Schimmel says: ==================== nexthop: Various cleanups Benefit from recent bug fixes and simplify the nexthop dump code. No regressions in existing tests: # ./fib_nexthops.sh [...] Tests passed: 234 Tests failed: 0 ==================== Link: https://lore.kernel.org/r/20230813164856.2379822-1-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15nexthop: Do not increment dump sentinel at the end of the dumpIdo Schimmel1-1/+0
The nexthop and nexthop bucket dump callbacks previously returned a positive return code even when the dump was complete, prompting the core netlink code to invoke the callback again, until returning zero. Zero was only returned by these callbacks when no information was filled in the provided skb, which was achieved by incrementing the dump sentinel at the end of the dump beyond the ID of the last nexthop. This is no longer necessary as when the dump is complete these callbacks return zero. Remove the unnecessary increment. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230813164856.2379822-3-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15nexthop: Simplify nexthop bucket dumpIdo Schimmel1-5/+0
Before commit f10d3d9df49d ("nexthop: Make nexthop bucket dump more efficient"), rtm_dump_nexthop_bucket_nh() returned a non-zero return code for each resilient nexthop group whose buckets it dumped, regardless if it encountered an error or not. This meant that the sentinel ('dd->ctx->nh.idx') used by the function that walked the different nexthops could not be used as a sentinel for the bucket dump, as otherwise buckets from the same group would be dumped over and over again. This was dealt with by adding another sentinel ('dd->ctx->done_nh_idx') that was incremented by rtm_dump_nexthop_bucket_nh() after successfully dumping all the buckets from a given group. After the previously mentioned commit this sentinel is no longer necessary since the function no longer returns a non-zero return code when successfully dumping all the buckets from a given group. Remove this sentinel and simplify the code. Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230813164856.2379822-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15Merge branch 'seg6-add-next-c-sid-support-for-srv6-end-x-behavior'Jakub Kicinski3-20/+1302
Andrea Mayer says: ==================== seg6: add NEXT-C-SID support for SRv6 End.X behavior In the Segment Routing (SR) architecture a list of instructions, called segments, can be added to the packet headers to influence the forwarding and processing of the packets in an SR enabled network. Considering the Segment Routing over IPv6 data plane (SRv6) [1], the segment identifiers (SIDs) are IPv6 addresses (128 bits) and the segment list (SID List) is carried in the Segment Routing Header (SRH). A segment may correspond to a "behavior" that is executed by a node when the packet is received. The Linux kernel currently supports a large subset of the behaviors described in [2] (e.g., End, End.X, End.T and so on). In some SRv6 scenarios, the number of segments carried by the SID List may increase dramatically, reducing the MTU (Maximum Transfer Unit) size and/or limiting the processing power of legacy hardware devices (due to longer IPv6 headers). The NEXT-C-SID mechanism [3] extends the SRv6 architecture by providing several ways to efficiently represent the SID List. By leveraging the NEXT-C-SID, it is possible to encode several SRv6 segments within a single 128 bit SID address (also referenced as Compressed SID Container). In this way, the length of the SID List can be drastically reduced. The NEXT-C-SID mechanism is built upon the "flavors" framework defined in [2]. This framework is already supported by the Linux SRv6 subsystem and is used to modify and/or extend a subset of existing behaviors. In this patchset, we extend the SRv6 End.X behavior in order to support the NEXT-C-SID mechanism. In details, the patchset is made of: - patch 1/2: add NEXT-C-SID support for SRv6 End.X behavior; - patch 2/2: add selftest for NEXT-C-SID in SRv6 End.X behavior. From the user space perspective, we do not need to change the iproute2 code to support the NEXT-C-SID flavor for the SRv6 End.X behavior. However, we will update the man page considering the NEXT-C-SID flavor applied to the SRv6 End.X behavior in a separate patch. [1] - https://datatracker.ietf.org/doc/html/rfc8754 [2] - https://datatracker.ietf.org/doc/html/rfc8986 [3] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression ==================== Link: https://lore.kernel.org/r/20230812180926.16689-1-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15selftests: seg6: add selftest for NEXT-C-SID flavor in SRv6 End.X behaviorPaolo Lungaroni2-0/+1214
This selftest is designed for testing the support of NEXT-C-SID flavor for SRv6 End.X behavior. It instantiates a virtual network composed of several nodes: hosts and SRv6 routers. Each node is realized using a network namespace that is properly interconnected to others through veth pairs, according to the topology depicted in the selftest script file. The test considers SRv6 routers implementing IPv4/IPv6 L3 VPNs leveraged by hosts for communicating with each other. Such routers i) apply different SRv6 Policies to the traffic received from connected hosts, considering the IPv4 or IPv6 protocols; ii) use the NEXT-C-SID compression mechanism for encoding several SRv6 segments within a single 128-bit SID address, referred to as a Compressed SID (C-SID) container. The NEXT-C-SID is provided as a "flavor" of the SRv6 End.X behavior, enabling it to properly process the C-SID containers. The correct execution of the enabled NEXT-C-SID SRv6 End.X behavior is verified through reachability tests carried out between hosts belonging to the same VPN. Signed-off-by: Paolo Lungaroni <paolo.lungaroni@uniroma2.it> Co-developed-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230812180926.16689-3-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15seg6: add NEXT-C-SID support for SRv6 End.X behaviorAndrea Mayer1-20/+88
The NEXT-C-SID mechanism described in [1] offers the possibility of encoding several SRv6 segments within a single 128 bit SID address. Such a SID address is called a Compressed SID (C-SID) container. In this way, the length of the SID List can be drastically reduced. A SID instantiated with the NEXT-C-SID flavor considers an IPv6 address logically structured in three main blocks: i) Locator-Block; ii) Locator-Node Function; iii) Argument. C-SID container +------------------------------------------------------------------+ | Locator-Block |Loc-Node| Argument | | |Function| | +------------------------------------------------------------------+ <--------- B -----------> <- NF -> <------------- A ---------------> (i) The Locator-Block can be any IPv6 prefix available to the provider; (ii) The Locator-Node Function represents the node and the function to be triggered when a packet is received on the node; (iii) The Argument carries the remaining C-SIDs in the current C-SID container. This patch leverages the NEXT-C-SID mechanism previously introduced in the Linux SRv6 subsystem [2] to support SID compression capabilities in the SRv6 End.X behavior [3]. An SRv6 End.X behavior with NEXT-C-SID flavor works as an End.X behavior but it is capable of processing the compressed SID List encoded in C-SID containers. An SRv6 End.X behavior with NEXT-C-SID flavor can be configured to support user-provided Locator-Block and Locator-Node Function lengths. In this implementation, such lengths must be evenly divisible by 8 (i.e. must be byte-aligned), otherwise the kernel informs the user about invalid values with a meaningful error code and message through netlink_ext_ack. If Locator-Block and/or Locator-Node Function lengths are not provided by the user during configuration of an SRv6 End.X behavior instance with NEXT-C-SID flavor, the kernel will choose their default values i.e., 32-bit Locator-Block and 16-bit Locator-Node Function. [1] - https://datatracker.ietf.org/doc/html/draft-ietf-spring-srv6-srh-compression [2] - https://lore.kernel.org/all/20220912171619.16943-1-andrea.mayer@uniroma2.it/ [3] - https://datatracker.ietf.org/doc/html/rfc8986#name-endx-l3-cross-connect Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230812180926.16689-2-andrea.mayer@uniroma2.it Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15Merge branch 'genetlink-provide-struct-genl_info-to-dumps'Jakub Kicinski45-180/+236
Jakub Kicinski says: ==================== genetlink: provide struct genl_info to dumps One of the biggest (which is not to say only) annoyances with genetlink handling today is that doit and dumpit need some of the same information, but it is passed to them in completely different structs. The implementations commonly end up writing a _fill() method which populates a message and have to pass at least 6 parameters. 3 of which are extracted manually from request info. After a lot of umming and ahing I decided to populate struct genl_info for dumps, without trying to factor out only the common parts. This makes the adoption easiest. In the future we may add a new version of dump which takes struct genl_info *info as the second argument, instead of struct netlink_callback *cb. For now developers have to call genl_info_dump(cb) to get the info. Typical genetlink families no longer get exposed to netlink protocol internals like pid and seq numbers. v3: - correct the condition in ethtool code (patch 10) v2: https://lore.kernel.org/all/20230810233845.2318049-1-kuba@kernel.org/ - replace the GENL_INFO_NTF() macro with init helper - fix the commit messages v1: https://lore.kernel.org/all/20230809182648.1816537-1-kuba@kernel.org/ ==================== Link: https://lore.kernel.org/r/20230814214723.2924989-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15ethtool: netlink: always pass genl_info to .prepare_dataJakub Kicinski25-47/+47
We had a number of bugs in the past because developers forgot to fully test dumps, which pass NULL as info to .prepare_data. .prepare_data implementations would try to access info->extack leading to a null-deref. Now that dumps and notifications can access struct genl_info we can pass it in, and remove the info null checks. Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com> # pause Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-11-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15ethtool: netlink: simplify arguments to ethnl_default_parse()Jakub Kicinski1-12/+9
Pass struct genl_info directly instead of its members. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-10-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15netdev-genl: use struct genl_info for reply constructionJakub Kicinski1-9/+8
Use the just added APIs to make the code simpler. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-9-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15genetlink: add genlmsg_iput() APIJakub Kicinski1-1/+53
Add some APIs and helpers required for convenient construction of replies and notifications based on struct genl_info. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-8-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15genetlink: add a family pointer to struct genl_infoJakub Kicinski2-11/+14
Having family in struct genl_info is quite useful. It cuts down the number of arguments which need to be passed to helpers which already take struct genl_info. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15genetlink: use attrs from struct genl_infoJakub Kicinski14-23/+22
Since dumps carry struct genl_info now, use the attrs pointer from genl_info and remove the one in struct genl_dumpit_info. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-6-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15genetlink: add struct genl_info to struct genl_dumpit_infoJakub Kicinski2-2/+22
Netlink GET implementations must currently juggle struct genl_info and struct netlink_callback, depending on whether they were called from doit or dumpit. Add genl_info to the dump state and populate the fields. This way implementations can simply pass struct genl_info around. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-5-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15genetlink: remove userhdr from struct genl_infoJakub Kicinski7-27/+33
Only three families use info->userhdr today and going forward we discourage using fixed headers in new families. So having the pointer to user header in struct genl_info is an overkill. Compute the header pointer at runtime. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Aaron Conole <aconole@redhat.com> Link: https://lore.kernel.org/r/20230814214723.2924989-4-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15genetlink: make genl_info->nlhdr constJakub Kicinski3-3/+3
struct netlink_callback has a const nlh pointer, make the pointer in struct genl_info const as well, to make copying between the two easier. Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15genetlink: push conditional locking into dumpit/doneJakub Kicinski1-55/+35
Add helpers which take/release the genl mutex based on family->parallel_ops. Remove the separation between handling of ops in locked and parallel families. Future patches would make the duplicated code grow even more. Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/20230814214723.2924989-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-15net: Fix slab-out-of-bounds in inet[6]_steal_sockLorenz Bauer2-2/+2
Kumar reported a KASAN splat in tcp_v6_rcv: bash-5.2# ./test_progs -t btf_skc_cls_ingress ... [ 51.810085] BUG: KASAN: slab-out-of-bounds in tcp_v6_rcv+0x2d7d/0x3440 [ 51.810458] Read of size 2 at addr ffff8881053f038c by task test_progs/226 The problem is that inet[6]_steal_sock accesses sk->sk_protocol without accounting for request or timewait sockets. To fix this we can't just check sock_common->skc_reuseport since that flag is present on timewait sockets. Instead, add a fullsock check to avoid the out of bands access of sk_protocol. Fixes: 9c02bec95954 ("bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign") Reported-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Lorenz Bauer <lmb@isovalent.com> Link: https://lore.kernel.org/r/20230815-bpf-next-v2-1-95126eaa4c1b@isovalent.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-14Merge branch 'Update and document struct_ops'Martin KaFai Lau2-6/+56
David Vernet says: ==================== The struct bpf_struct_ops structure in BPF is a framework that allows subsystems to extend themselves using BPF. In commit 68b04864ca425 ("bpf: Create links for BPF struct_ops maps") and commit aef56f2e918bf ("bpf: Update the struct_ops of a bpf_link"), the structure was updated to include new ->validate() and ->update() callbacks respectively in support of allowing struct_ops maps to be created with BPF_F_LINK. The intention was that struct bpf_struct_ops implementations could support map updates through the link. Because map validation and registration would take place in two separate steps for struct_ops maps managed by the link (the first in map update elem, and the latter in link create), the ->validate() callback was added, and any struct_ops implementation that wished to use BPF_F_LINK, even just for lifetime management, would then be required to define both it and ->update(). Not all struct_ops implementations can or will support update, however. For example, the sched_ext struct_ops implementation proposed in [0] will not be able to support atomic map updates because it can race with sysrq, has to cycle tasks through various states in order to safely transition, etc. It can, however, benefit from letting the BPF link automatically evict the struc_ops map when the application exits (e.g. if it crashes). This patch set therefore: 1. Updates the struct_ops implementation to support default values for ->validate() and ->update() so that struct_ops implementations can benefit from BPF_F_LINK management even if they can't support updates. 2. Documents struct bpf_struct_ops so that the semantics are clear and well defined. --- v2: https://lore.kernel.org/bpf/0f5ea3de-c6e7-490f-b5ec-b5c7cd288687@gmail.com/T/ Changes from v2 -> v3: - Add patch 2/2 that documents the struct bpf_struct_ops structure. - Add Kui-Feng's Acked-by tag to patch 1/2. v1: https://lore.kernel.org/lkml/20230811150934.GA542801@maniforge/ Changes from v1 -> v2: - Move the if (!st_map->st_ops->update) check outside of the critical section before we acquire the update_mutex. ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-14bpf: Document struct bpf_struct_ops fieldsDavid Vernet1-0/+47
Subsystems that want to implement a struct bpf_struct_ops structure to enable struct_ops maps must currently reverse engineer how the structure works. Given that this is meant to be a way for subsystem maintainers to extend their subsystems using BPF, let's document it to make it a bit easier on them. Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230814185908.700553-3-void@manifault.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-14bpf: Support default .validate() and .update() behavior for struct_ops linksDavid Vernet1-6/+9
Currently, if a struct_ops map is loaded with BPF_F_LINK, it must also define the .validate() and .update() callbacks in its corresponding struct bpf_struct_ops in the kernel. Enabling struct_ops link is useful in its own right to ensure that the map is unloaded if an application crashes. For example, with sched_ext, we want to automatically unload the host-wide scheduler if the application crashes. We would likely never support updating elements of a sched_ext struct_ops map, so we'd have to implement these callbacks showing that they _can't_ support element updates just to benefit from the basic lifetime management of struct_ops links. Let's enable struct_ops maps to work with BPF_F_LINK even if they haven't defined these callbacks, by assuming that a struct_ops map element cannot be updated by default. Acked-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20230814185908.700553-2-void@manifault.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-14selftests/bpf: Add various more tcx test casesDaniel Borkmann3-0/+462
Add several new tcx test cases to improve test coverage. This also includes a few new tests with ingress instead of clsact qdisc, to cover the fix from commit dc644b540a2d ("tcx: Fix splat in ingress_destroy upon tcx_entry_free"). # ./test_progs -t tc [...] #234 tc_links_after:OK #235 tc_links_append:OK #236 tc_links_basic:OK #237 tc_links_before:OK #238 tc_links_chain_classic:OK #239 tc_links_chain_mixed:OK #240 tc_links_dev_cleanup:OK #241 tc_links_dev_mixed:OK #242 tc_links_ingress:OK #243 tc_links_invalid:OK #244 tc_links_prepend:OK #245 tc_links_replace:OK #246 tc_links_revision:OK #247 tc_opts_after:OK #248 tc_opts_append:OK #249 tc_opts_basic:OK #250 tc_opts_before:OK #251 tc_opts_chain_classic:OK #252 tc_opts_chain_mixed:OK #253 tc_opts_delete_empty:OK #254 tc_opts_demixed:OK #255 tc_opts_detach:OK #256 tc_opts_detach_after:OK #257 tc_opts_detach_before:OK #258 tc_opts_dev_cleanup:OK #259 tc_opts_invalid:OK #260 tc_opts_mixed:OK #261 tc_opts_prepend:OK #262 tc_opts_replace:OK #263 tc_opts_revision:OK [...] Summary: 44/38 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/8699efc284b75ccdc51ddf7062fa2370330dc6c0.1692029283.git.daniel@iogearbox.net Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-08-14net: dsa: mv88e6060: add phylink_get_caps implementationRussell King (Oracle)1-0/+45
Add a phylink_get_caps implementation for Marvell 88e6060 DSA switch. This is a fast ethernet switch, with internal PHYs for ports 0 through 4. Port 4 also supports MII, REVMII, REVRMII and SNI. Port 5 supports MII, REVMII, REVRMII and SNI without an internal PHY. Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/E1qUkx7-003dMX-9b@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-14net/mlx5: Don't query MAX caps twiceShay Drory1-6/+6
Whenever mlx5 driver is probed or reloaded, it queries some capabilities in MAX mode via set_hca_cap() API. Afterwards, the driver queries all capabilities in MAX mode via mlx5_query_hca_caps() API. Since MAX caps are read only caps, querying them twice is redundant. Hence, delete the second query. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-14net/mlx5: Remove unused MAX HCA capabilitiesShay Drory5-74/+30
Each device cap has two modes: MAX and CUR. The driver maintains a cache of both modes of the capabilities. For most device caps, the MAX cap mode is never used. Hence, remove all driver queries of the MAX mode of the said caps as well as their helper MACROs. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-14net/mlx5: Remove unused CAPsShay Drory4-74/+1
mlx5 driver queries the device for VECTOR_CALC and SHAMPO caps, but there isn't any user who requires them. As well as, MLX5_MCAM_REGS_0x9080_0x90FF is queried but not used. Thus, drop all usages and definitions of the mentioned caps above. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-14net/mlx5: Fix error message in mlx5_sf_dev_state_change_handler()Jiri Pirko1-1/+1
sw_function_id contains sfnum, so fix the error message to name the value properly. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>