aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-02-29Merge branch 'net-dsa-mv88e6xxx-add-amethyst-specific-smi-gpio-function'Paolo Abeni3-4/+40
Robert Marko says: ==================== net: dsa: mv88e6xxx: add Amethyst specific SMI GPIO function Amethyst family (MV88E6191X/6193X/6393X) has a simplified SMI GPIO setting via the Scratch and Misc register so it requires family specific function. In the v1 review, Andrew pointed out that it would make sense to rename the existing mv88e6xxx_g2_scratch_gpio_set_smi as it only works on the MV6390 family. Changes in v2: * Add rename of mv88e6xxx_g2_scratch_gpio_set_smi to mv88e6390_g2_scratch_gpio_set_smi ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-02-29net: dsa: mv88e6xxx: add Amethyst specific SMI GPIO functionRobert Marko3-1/+37
The existing mv88e6390_g2_scratch_gpio_set_smi() cannot be used on the 88E6393X as it requires certain P0_MODE, it also checks the CPU mode as it impacts the bit setting value. This is all irrelevant for Amethyst (MV88E6191X/6193X/6393X) as only the default value of the SMI_PHY Config bit is set to CPU_MGD bootstrap pin value but it can be changed without restrictions so that GPIO pins 9 and 10 are used as SMI pins. So, introduce Amethyst specific function and call that if the Amethyst family wants to setup the external PHY. Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Robert Marko <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-29net: dsa: mv88e6xxx: rename mv88e6xxx_g2_scratch_gpio_set_smiRobert Marko3-4/+4
The name mv88e6xxx_g2_scratch_gpio_set_smi is a bit ambiguous as it appears to only be applicable to the 6390 family, so lets rename it to mv88e6390_g2_scratch_gpio_set_smi to make it more obvious. Signed-off-by: Robert Marko <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-02-28inet6: expand rcu_read_lock() scope in inet6_dump_addr()Eric Dumazet1-4/+4
I missed that inet6_dump_addr() is calling in6_dump_addrs() from two points. First one under RTNL protection, and second one under rcu_read_lock(). Since we want to remove RTNL use from inet6_dump_addr() very soon, no longer assume in6_dump_addrs() is protected by RTNL (even if this is still the case). Use rcu_read_lock() earlier to fix this lockdep splat: WARNING: suspicious RCU usage 6.8.0-rc5-syzkaller-01618-gf8cbf6bde4c8 #0 Not tainted net/ipv6/addrconf.c:5317 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by syz-executor.2/8834: #0: ffff88802f554678 (nlk_cb_mutex-ROUTE){+.+.}-{3:3}, at: __netlink_dump_start+0x119/0x780 net/netlink/af_netlink.c:2338 #1: ffffffff8f377a88 (rtnl_mutex){+.+.}-{3:3}, at: netlink_dump+0x676/0xda0 net/netlink/af_netlink.c:2265 #2: ffff88807e5f0580 (&ndev->lock){++--}-{2:2}, at: in6_dump_addrs+0xb8/0x1de0 net/ipv6/addrconf.c:5279 stack backtrace: CPU: 1 PID: 8834 Comm: syz-executor.2 Not tainted 6.8.0-rc5-syzkaller-01618-gf8cbf6bde4c8 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x1e7/0x2e0 lib/dump_stack.c:106 lockdep_rcu_suspicious+0x220/0x340 kernel/locking/lockdep.c:6712 in6_dump_addrs+0x1b47/0x1de0 net/ipv6/addrconf.c:5317 inet6_dump_addr+0x1597/0x1690 net/ipv6/addrconf.c:5428 netlink_dump+0x6a6/0xda0 net/netlink/af_netlink.c:2266 __netlink_dump_start+0x59d/0x780 net/netlink/af_netlink.c:2374 netlink_dump_start include/linux/netlink.h:340 [inline] rtnetlink_rcv_msg+0xcf7/0x10d0 net/core/rtnetlink.c:6555 netlink_rcv_skb+0x1e3/0x430 net/netlink/af_netlink.c:2547 netlink_unicast_kernel net/netlink/af_netlink.c:1335 [inline] netlink_unicast+0x7ea/0x980 net/netlink/af_netlink.c:1361 netlink_sendmsg+0x8e0/0xcb0 net/netlink/af_netlink.c:1902 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x221/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2584 ___sys_sendmsg net/socket.c:2638 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2667 Fixes: c3718936ec47 ("ipv6: anycast: complete RCU handling of struct ifacaddr6") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28net: call skb_defer_free_flush() from __napi_busy_loop()Eric Dumazet1-21/+22
skb_defer_free_flush() is currently called from net_rx_action() and napi_threaded_poll(). We should also call it from __napi_busy_loop() otherwise there is the risk the percpu queue can grow until an IPI is forced from skb_attempt_defer_free() adding a latency spike. Signed-off-by: Eric Dumazet <[email protected]> Cc: Samiullah Khawaja <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tcp: remove some holes in struct tcp_sockEric Dumazet1-4/+4
By moving some fields around, this patch shrinks holes size from 56 to 32, saving 24 bytes on 64bit arches. After the patch pahole gives the following for 'struct tcp_sock': /* size: 2304, cachelines: 36, members: 162 */ /* sum members: 2234, holes: 6, sum holes: 32 */ /* sum bitfield members: 34 bits, bit holes: 5, sum bit holes: 14 bits */ /* padding: 32 */ /* paddings: 3, sum paddings: 10 */ /* forced alignments: 1, forced holes: 1, sum forced holes: 12 */ Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28igb: extend PTP timestamp adjustments to i211Oleksij Rempel1-2/+3
The i211 requires the same PTP timestamp adjustments as the i210, according to its datasheet. To ensure consistent timestamping across different platforms, this change extends the existing adjustments to include the i211. The adjustment result are tested and comparable for i210 and i211 based systems. Fixes: 3f544d2a4d5c ("igb: adjust PTP timestamps for Tx/Rx latency") Signed-off-by: Oleksij Rempel <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Tested-by: Pucha Himasekhar Reddy <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28net: bridge: Exit if multicast_init_stats failsBreno Leitao1-1/+2
If br_multicast_init_stats() fails, there is no need to set lockdep classes. Just return from the error path. Signed-off-by: Breno Leitao <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28net: bridge: Do not allocate stats in the driverBreno Leitao1-11/+2
With commit 34d21de99cea9 ("net: Move {l,t,d}stats allocation to core and convert veth & vrf"), stats allocation could be done on net core instead of this driver. With this new approach, the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc). This is core responsibility now. Remove the allocation in the bridge driver and leverage the network core allocation. Signed-off-by: Breno Leitao <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28selftests: vxlan_mdb: Avoid duplicate test namesIdo Schimmel1-18/+18
Rename some test cases to avoid overlapping test names which is problematic for the kernel test robot. No changes in the test's logic. Suggested-by: Yujie Liu <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28rtnetlink: fix error logic of IFLA_BRIDGE_FLAGS writing backLin Ma1-6/+5
In the commit d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length"), an adjustment was made to the old loop logic in the function `rtnl_bridge_setlink` to enable the loop to also check the length of the IFLA_BRIDGE_MODE attribute. However, this adjustment removed the `break` statement and led to an error logic of the flags writing back at the end of this function. if (have_flags) memcpy(nla_data(attr), &flags, sizeof(flags)); // attr should point to IFLA_BRIDGE_FLAGS NLA !!! Before the mentioned commit, the `attr` is granted to be IFLA_BRIDGE_FLAGS. However, this is not necessarily true fow now as the updated loop will let the attr point to the last NLA, even an invalid NLA which could cause overflow writes. This patch introduces a new variable `br_flag` to save the NLA pointer that points to IFLA_BRIDGE_FLAGS and uses it to resolve the mentioned error logic. Fixes: d73ef2d69c0d ("rtnetlink: let rtnl_bridge_setlink checks IFLA_BRIDGE_MODE length") Signed-off-by: Lin Ma <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28netlabel: remove impossible return value in netlbl_bitmap_walkZhengchao Shao3-9/+3
Since commit 446fda4f2682 ("[NetLabel]: CIPSOv4 engine"), *bitmap_walk function only returns -1. Nearly 18 years have passed, -2 scenes never come up, so there's no need to consider it. Signed-off-by: Zhengchao Shao <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Acked-by: Paul Moore <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28Merge branch 'inet-implement-lockless-rtm_getnetconf-ops'Jakub Kicinski5-90/+81
Eric Dumazet says: ==================== inet: implement lockless RTM_GETNETCONF ops This series removes RTNL use for RTM_GETNETCONF operations on AF_INET. - Annotate data-races to avoid possible KCSAN splats. - "ip -4 netconf show dev XXX" can be implemented without RTNL [1] - "ip -4 netconf" dumps can be implemented using RCU instead of RTNL [1] [1] This only refers to RTM_GETNETCONF operation, "ip" command also uses RTM_GETLINK dumps which are using RTNL at this moment. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28inet: use xa_array iterator to implement inet_netconf_dump_devconf()Eric Dumazet1-58/+43
1) inet_netconf_dump_devconf() can run under RCU protection instead of RTNL. 2) properly return 0 at the end of a dump, avoiding an an extra recvmsg() system call. 3) Do not use inet_base_seq() anymore, for_each_netdev_dump() has nice properties. Restarting a GETDEVCONF dump if a device has been added/removed or if net->ipv4.dev_addr_genid has changed is moot. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28inet: do not use RTNL in inet_netconf_get_devconf()Eric Dumazet1-12/+15
"ip -4 netconf show dev XXXX" no longer acquires RTNL. Return -ENODEV instead of -EINVAL if no netdev or idev can be found. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28inet: annotate devconf data-racesEric Dumazet5-21/+24
Add READ_ONCE() in ipv4_devconf_get() and corresponding WRITE_ONCE() in ipv4_devconf_set() Add IPV4_DEVCONF_RO() and IPV4_DEVCONF_ALL_RO() macros, and use them when reading devconf fields. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28net: phy: dp83826: disable WOL at initCatalin Popescu1-1/+1
Commit d1d77120bc28 ("net: phy: dp83826: support TX data voltage tuning") introduced a regression in that WOL is not disabled by default for DP83826. WOL should normally be enabled through ethtool. Fixes: d1d77120bc28 ("net: phy: dp83826: support TX data voltage tuning") Signed-off-by: Catalin Popescu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28net: remove SLAB_MEM_SPREAD flag usageChengming Zhou1-1/+1
The SLAB_MEM_SPREAD flag used to be implemented in SLAB, which was removed as of v6.8-rc1, so it became a dead flag since the commit 16a1d968358a ("mm/slab: remove mm/slab.c and slab_def.h"). And the series[1] went on to mark it obsolete to avoid confusion for users. Here we can just remove all its users, which has no functional change. [1] https://lore.kernel.org/all/[email protected]/ Signed-off-by: Chengming Zhou <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28Merge branch 'tools-ynl-stop-using-libmnl'Jakub Kicinski5-268/+543
Jakub Kicinski says: ==================== tools: ynl: stop using libmnl There is no strong reason to stop using libmnl in ynl but there are a few small ones which add up. First (as I remembered immediately after hitting send on v1), C++ compilers do not like the libmnl for_each macros. I haven't tried it myself, but having all the code directly in YNL makes it easier for folks porting to C++ to modify them and/or make YNL more C++ friendly. Second, we do much more advanced netlink level parsing in ynl than libmnl so it's hard to say that libmnl abstracts much from us. The fact that this series, removing the libmnl dependency, only adds <300 LoC shows that code savings aren't huge. OTOH when new types are added (e.g. auto-int) we need to add compatibility to deal with older version of libmnl (in fact, even tho patches have been sent months ago, auto-ints are still not supported in libmnl.git). Thrid, the dependency makes ynl less self contained, and harder to vendor in. Whether vendoring libraries into projects is a good idea is a separate discussion, nonetheless, people want to do it. Fourth, there are small annoyances with the libmnl APIs which are hard to fix in backward-compatible ways. See the last patch for example. All in all, libmnl is a great library, but with all the code generation and structured parsing, ynl is better served by going its own way. v1: https://lore.kernel.org/all/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: use MSG_DONTWAIT for getting notificationsJakub Kicinski1-15/+14
To stick to libmnl wrappers in the past we had to use poll() to check if there are any outstanding notifications on the socket. This is no longer necessary, we can use MSG_DONTWAIT. Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: remove the libmnl dependencyJakub Kicinski4-6/+2
We don't use libmnl any more. Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: stop using mnl socket helpersJakub Kicinski3-22/+42
Most libmnl socket helpers can be replaced by direct calls to the underlying libc API. We need portid, the netlink manpage suggests we bind() address of zero. Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: switch away from MNL_CB_*Jakub Kicinski3-34/+40
Create a local version of the MNL_CB_* parser control values. Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: switch away from mnl_cb_tJakub Kicinski3-18/+21
All YNL parsing callbacks take struct ynl_parse_arg as the argument. Make that official by using a local callback type instead of mnl_cb_t. Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: stop using mnl_cb_run2()Jakub Kicinski2-19/+45
There's only one set of callbacks in YNL, for netlink control messages, and most of them are trivial. So implement the message walking directly without depending on mnl_cb_run2(). Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: use ynl_sock_read_msgs() for ACK handlingJakub Kicinski2-23/+14
ynl_recv_ack() is simple and it's the only user of mnl_cb_run(). Now that ynl_sock_read_msgs() exists it's actually less code to use ynl_sock_read_msgs() instead of being special. Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: wrap recv() + mnl_cb_run2() into a single helperJakub Kicinski1-38/+18
All callers to mnl_cb_run2() call mnl_socket_recvfrom() right before. Wrap the two in a helper, take typed arguments (struct ynl_parse_arg), instead of hoping that all callers remember that parser error handling requires yarg. In case of ynl_sock_read_family() we will no longer check for kernel returning no data, but that would be a kernel bug, not worth complicating the code to catch this. Calling mnl_cb_run2() on an empty buffer is legal and results in STOP (1). Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl-gen: remove unused parse codeJakub Kicinski3-11/+1
Commit f2ba1e5e2208 ("tools: ynl-gen: stop generating common notification handlers") removed the last caller of the parse_cb_run() helper. We no longer need to export ynl_cb_array. Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: make yarg the first member of struct ynl_dump_stateJakub Kicinski3-7/+6
All YNL parsing code expects a pointer to struct ynl_parse_arg AKA yarg. For dump was pass in struct ynl_dump_state, which works fine, because struct ynl_dump_state and struct ynl_parse_arg have identical layout for the members that matter.. but it's a bit hacky. Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: create local ARRAY_SIZE() helperJakub Kicinski2-2/+5
libc doesn't have an ARRAY_SIZE() create one locally. Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: create local nlmsg access helpersJakub Kicinski3-17/+52
Create helpers for accessing payloads of struct nlmsg. Use them instead of the libmnl ones. Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: create local for_each helpersJakub Kicinski3-10/+57
Create ynl_attr_for_each*() iteration helpers. Use them instead of the mnl ones. Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: create local attribute helpersJakub Kicinski3-74/+227
Don't use mnl attr helpers, we're trying to remove the libmnl dependency. Create both signed and unsigned helpers, libmnl had unsigned helpers, so code generator no longer needs the mnl_type() hack. The new helpers are written from first principles, but are hopefully not too buggy. Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: give up on libmnl for auto-intsJakub Kicinski1-9/+36
The temporary auto-int helpers are not really correct. We can't treat signed and unsigned ints the same when determining whether we need full 8B. I realized this before sending the patch to add support in libmnl. Unfortunately, that patch has not been merged, so time to fix our local helpers. Use the mnl* name for now, subsequent patches will address that. Acked-by: Nicolas Dichtel <[email protected]> Reviewed-by: Donald Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: fix handling of multiple mcast groupsJakub Kicinski1-0/+1
We never increment the group number iterator, so all groups get recorded into index 0 of the mcast_groups[] array. As a result YNL can only handle using the last group. For example using the "netdev" sample on kernel with page pool commands results in: $ ./samples/netdev YNL: Multicast group 'mgmt' not found Most families have only one multicast group, so this hasn't been noticed. Plus perhaps developers usually test the last group which would have worked. Fixes: 86878f14d71a ("tools: ynl: user space helpers") Reviewed-by: Donald Hunter <[email protected]> Acked-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-28tools: ynl: protect from old OvS headersJakub Kicinski1-0/+3
Since commit 7c59c9c8f202 ("tools: ynl: generate code for ovs families") we need relatively recent OvS headers to get YNL to compile. Add the direct include workaround to fix compilation on less up-to-date OSes like CentOS 9. Reviewed-by: Donald Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-02-29selftests: netfilter: add bridge conntrack + multicast test caseFlorian Westphal2-1/+190
Add test case for multicast packet confirm race. Without preceding patch, this should result in: WARNING: CPU: 0 PID: 38 at net/netfilter/nf_conntrack_core.c:1198 __nf_conntrack_confirm+0x3ed/0x5f0 Workqueue: events_unbound macvlan_process_broadcast RIP: 0010:__nf_conntrack_confirm+0x3ed/0x5f0 ? __nf_conntrack_confirm+0x3ed/0x5f0 nf_confirm+0x2ad/0x2d0 nf_hook_slow+0x36/0xd0 ip_local_deliver+0xce/0x110 __netif_receive_skb_one_core+0x4f/0x70 process_backlog+0x8c/0x130 [..] Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-02-29netfilter: bridge: confirm multicast packets before passing them up the stackFlorian Westphal4-0/+128
conntrack nf_confirm logic cannot handle cloned skbs referencing the same nf_conn entry, which will happen for multicast (broadcast) frames on bridges. Example: macvlan0 | br0 / \ ethX ethY ethX (or Y) receives a L2 multicast or broadcast packet containing an IP packet, flow is not yet in conntrack table. 1. skb passes through bridge and fake-ip (br_netfilter)Prerouting. -> skb->_nfct now references a unconfirmed entry 2. skb is broad/mcast packet. bridge now passes clones out on each bridge interface. 3. skb gets passed up the stack. 4. In macvlan case, macvlan driver retains clone(s) of the mcast skb and schedules a work queue to send them out on the lower devices. The clone skb->_nfct is not a copy, it is the same entry as the original skb. The macvlan rx handler then returns RX_HANDLER_PASS. 5. Normal conntrack hooks (in NF_INET_LOCAL_IN) confirm the orig skb. The Macvlan broadcast worker and normal confirm path will race. This race will not happen if step 2 already confirmed a clone. In that case later steps perform skb_clone() with skb->_nfct already confirmed (in hash table). This works fine. But such confirmation won't happen when eb/ip/nftables rules dropped the packets before they reached the nf_confirm step in postrouting. Pablo points out that nf_conntrack_bridge doesn't allow use of stateful nat, so we can safely discard the nf_conn entry and let inet call conntrack again. This doesn't work for bridge netfilter: skb could have a nat transformation. Also bridge nf prevents re-invocation of inet prerouting via 'sabotage_in' hook. Work around this problem by explicit confirmation of the entry at LOCAL_IN time, before upper layer has a chance to clone the unconfirmed entry. The downside is that this disables NAT and conntrack helpers. Alternative fix would be to add locking to all code parts that deal with unconfirmed packets, but even if that could be done in a sane way this opens up other problems, for example: -m physdev --physdev-out eth0 -j SNAT --snat-to 1.2.3.4 -m physdev --physdev-out eth1 -j SNAT --snat-to 1.2.3.5 For multicast case, only one of such conflicting mappings will be created, conntrack only handles 1:1 NAT mappings. Users should set create a setup that explicitly marks such traffic NOTRACK (conntrack bypass) to avoid this, but we cannot auto-bypass them, ruleset might have accept rules for untracked traffic already, so user-visible behaviour would change. Suggested-by: Pablo Neira Ayuso <[email protected]> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217777 Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-02-28netfilter: nf_tables: allow NFPROTO_INET in nft_(match/target)_validate()Ignat Korchagin1-0/+20
Commit d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family") added some validation of NFPROTO_* families in the nft_compat module, but it broke the ability to use legacy iptables modules in dual-stack nftables. While with legacy iptables one had to independently manage IPv4 and IPv6 tables, with nftables it is possible to have dual-stack tables sharing the rules. Moreover, it was possible to use rules based on legacy iptables match/target modules in dual-stack nftables. As an example, the program from [2] creates an INET dual-stack family table using an xt_bpf based rule, which looks like the following (the actual output was generated with a patched nft tool as the current nft tool does not parse dual stack tables with legacy match rules, so consider it for illustrative purposes only): table inet testfw { chain input { type filter hook prerouting priority filter; policy accept; bytecode counter packets 0 bytes 0 accept } } After d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family") we get EOPNOTSUPP for the above program. Fix this by allowing NFPROTO_INET for nft_(match/target)_validate(), but also restrict the functions to classic iptables hooks. Changes in v3: * clarify that upstream nft will not display such configuration properly and that the output was generated with a patched nft tool * remove example program from commit description and link to it instead * no code changes otherwise Changes in v2: * restrict nft_(match/target)_validate() to classic iptables hooks * rewrite example program to use unmodified libnftnl Fixes: d0009effa886 ("netfilter: nf_tables: validate NFPROTO_* family") Link: https://lore.kernel.org/all/Zc1PfoWN38UuFJRI@calendula/T/#mc947262582c90fec044c7a3398cc92fac7afea72 [1] Link: https://lore.kernel.org/all/[email protected]/ [2] Reported-by: Jordan Griege <[email protected]> Signed-off-by: Ignat Korchagin <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-02-28Merge branch 'bpf-arm64-use-bpf-prog-pack-allocator-in-bpf-jit'Alexei Starovoitov3-24/+192
Puranjay Mohan says: ==================== bpf, arm64: use BPF prog pack allocator in BPF JIT Changes in V8 => V9: V8: https://lore.kernel.org/bpf/[email protected]/ 1. Rebased on bpf-next/master 2. Added Acked-by: Catalin Marinas <[email protected]> Changes in V7 => V8: V7: https://lore.kernel.org/bpf/[email protected]/ 1. Rebase on bpf-next/master 2. Fix __text_poke() by removing usage of 'ret' that was never set. Changes in V6 => V7: V6: https://lore.kernel.org/all/[email protected]/ 1. Rebase on bpf-next/master. Changes in V5 => V6: V5: https://lore.kernel.org/all/[email protected]/ 1. Implement a text poke api to reduce code repeatition. 2. Use flush_icache_range() in place of caches_clean_inval_pou() in the functions that modify code. 3. Optimize the bpf_jit_free() by not copying the all instructions on the rw image to the ro_image Changes in V4 => v5: 1. Remove the patch for making prog pack allocator portable as it will come through the RISCV tree[1]. 2. Add a new function aarch64_insn_set() to be used in bpf_arch_text_invalidate() for putting illegal instructions after a program is removed. The earlier implementation of bpf_arch_text_invalidate() was calling aarch64_insn_patch_text_nosync() in a loop and making it slow because each call invalidated the cache. Here is test_tag now: [root@ip-172-31-6-176 bpf]# time ./test_tag test_tag: OK (40945 tests) real 0m19.695s user 0m1.514s sys 0m17.841s test_tag without these patches: [root@ip-172-31-6-176 bpf]# time ./test_tag test_tag: OK (40945 tests) real 0m21.487s user 0m1.647s sys 0m19.106s test_tag in the previous version was really slow > 2 minutes. see [2] 3. Add cache invalidation in aarch64_insn_copy() so other users can call the function without worrying about the cache. Currently only bpf_arch_text_copy() is using it, but there might be more users in the future. Chanes in V3 => V4: Changes only in 3rd patch 1. Fix the I-cache maintenance: Clean the data cache and invalidate the i-Cache only *after* the instructions have been copied to the ROX region. Chanes in V2 => V3: Changes only in 3rd patch 1. Set prog = orig_prog; in the failure path of bpf_jit_binary_pack_finalize() call. 2. Add comments explaining the usage of the offsets in the exception table. Changes in v1 => v2: 1. Make the naming consistent in the 3rd patch: ro_image and image ro_header and header ro_image_ptr and image_ptr 2. Use names dst/src in place of addr/opcode in second patch. 3. Add Acked-by: Song Liu <[email protected]> in 1st and 2nd patch. BPF programs currently consume a page each on ARM64. For systems with many BPF programs, this adds significant pressure to instruction TLB. High iTLB pressure usually causes slow down for the whole system. Song Liu introduced the BPF prog pack allocator[3] to mitigate the above issue. It packs multiple BPF programs into a single huge page. It is currently only enabled for the x86_64 BPF JIT. This patch series enables the BPF prog pack allocator for the ARM64 BPF JIT. ==================================================== Performance Analysis of prog pack allocator on ARM64 ==================================================== To test the performance of the BPF prog pack allocator on ARM64, a stresser tool[4] was built. This tool loads 8 BPF programs on the system and triggers 5 of them in an infinite loop by doing system calls. The runner script starts 20 instances of the above which loads 8*20=160 BPF programs on the system, 5*20=100 of which are being constantly triggered. In the above environment we try to build Python-3.8.4 and try to find different iTLB metrics for the compilation done by gcc-12.2.0. The source code[5] is configured with the following command: ./configure --enable-optimizations --with-ensurepip=install Then the runner script is executed with the following command: ./run.sh "perf stat -e ITLB_WALK,L1I_TLB,INST_RETIRED,iTLB-load-misses -a make -j32" This builds Python while 160 BPF programs are loaded and 100 are being constantly triggered and measures iTLB related metrics. The output of the above command is discussed below before and after enabling the BPF prog pack allocator. The tests were run on qemu-system-aarch64 with 32 cpus, 4G memory, -machine virt, -cpu host, and -enable-kvm. Results ------- Before enabling prog pack allocator: ------------------------------------ Performance counter stats for 'system wide': 333278635 ITLB_WALK 6762692976558 L1I_TLB 25359571423901 INST_RETIRED 15824054789 iTLB-load-misses 189.029769053 seconds time elapsed After enabling prog pack allocator: ----------------------------------- Performance counter stats for 'system wide': 190333544 ITLB_WALK 6712712386528 L1I_TLB 25278233304411 INST_RETIRED 5716757866 iTLB-load-misses 185.392650561 seconds time elapsed Improvements in metrics ----------------------- Compilation time ---> 1.92% faster iTLB-load-misses/Sec (Less is better) ---> 63.16% decrease ITLB_WALK/1000 INST_RETIRED (Less is better) ---> 42.71% decrease ITLB_Walk/L1I_TLB (Less is better) ---> 42.47% decrease [1] https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git/commit/?h=for-next&id=20e490adea279d49d57b800475938f5b67926d98 [2] https://lore.kernel.org/all/CANk7y0gcP3dF2mESLp5JN1+9iDfgtiWRFGqLkCgZD6wby1kQOw@mail.gmail.com/ [3] https://lore.kernel.org/bpf/[email protected]/ [4] https://github.com/puranjaymohan/BPF-Allocator-Bench [5] https://www.python.org/ftp/python/3.8.4/Python-3.8.4.tgz ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-02-28bpf, arm64: use bpf_prog_pack for memory managementPuranjay Mohan1-24/+115
Use bpf_jit_binary_pack_alloc for memory management of JIT binaries in ARM64 BPF JIT. The bpf_jit_binary_pack_alloc creates a pair of RW and RX buffers. The JIT writes the program into the RW buffer. When the JIT is done, the program is copied to the final RX buffer with bpf_jit_binary_pack_finalize. Implement bpf_arch_text_copy() and bpf_arch_text_invalidate() for ARM64 JIT as these functions are required by bpf_jit_binary_pack allocator. Signed-off-by: Puranjay Mohan <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-02-28arm64: patching: implement text_poke APIPuranjay Mohan2-0/+77
The text_poke API is used to implement functions like memcpy() and memset() for instruction memory (RO+X). The implementation is similar to the x86 version. This will be used by the BPF JIT to write and modify BPF programs. There could be more users of this in the future. Signed-off-by: Puranjay Mohan <[email protected]> Acked-by: Catalin Marinas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2024-02-28Merge tag 'acpi-6.8-rc7' of ↵Linus Torvalds1-46/+66
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Revert a recent EC driver change that introduced an unexpected and undesirable user-visible difference in behavior (Rafael Wysocki)" * tag 'acpi-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Revert "ACPI: EC: Use a spin lock without disabing interrupts"
2024-02-28Merge tag 'pm-6.8-rc7' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fix from Rafael Wysocki: "Fix a latent bug in the intel-pstate cpufreq driver that has been exposed by the recent schedutil governor changes (Doug Smythies)" * tag 'pm-6.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq: intel_pstate: fix pstate limits enforcement for adjust_perf call back
2024-02-28Merge tag 'spi-fix-v6.8-rc5' of ↵Linus Torvalds2-19/+28
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi Pull spi fixes from Mark Brown: "There's two things here - the big one is a batch of fixes for the power management in the Cadence QuadSPI driver which had some serious issues with runtime PM and there's also a revert of one of the last batch of fixes for ppc4xx which has a dependency on -next but was in between two mainline fixes so the -next dependency got missed. The ppc4xx driver is not currently included in any defconfig and has dependencies that exclude it from allmodconfigs so none of the CI systems catch issues with it, hence the need for the earlier fixes series. There's some updates to the PowerPC configs to address this" * tag 'spi-fix-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi: spi: Drop mismerged fix spi: cadence-qspi: add system-wide suspend and resume callbacks spi: cadence-qspi: put runtime in runtime PM hooks names spi: cadence-qspi: remove system-wide suspend helper calls from runtime PM hooks spi: cadence-qspi: fix pointer reference in runtime PM hooks
2024-02-28Merge tag 'regulator-fix-v6.8-rc5' of ↵Linus Torvalds2-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator Pull regulator fixes from Mark Brown: "Two small fixes, one small update for the max5970 driver bringing the driver and DT binding documentation into sync plus a missed update to the patterns in MAINTAINERS after a DT binding YAML conversion" * tag 'regulator-fix-v6.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator: regulator: max5970: Fix regulator child node name MAINTAINERS: repair entry for MICROCHIP MCP16502 PMIC DRIVER
2024-02-28Merge tag 'v6.8-p5' of ↵Linus Torvalds2-4/+13
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes a regression in lskcipher and an out-of-bound access in arm64/neonbs" * tag 'v6.8-p5' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: arm64/neonbs - fix out-of-bounds access on short input crypto: lskcipher - Copy IV in lskcipher glue code always
2024-02-28Bluetooth: qca: Fix triggering coredump implementationZijun Hu1-5/+4
hci_coredump_qca() uses __hci_cmd_sync() to send a vendor-specific command to trigger firmware coredump, but the command does not have any event as its sync response, so it is not suitable to use __hci_cmd_sync(), fixed by using __hci_cmd_send(). Fixes: 06d3fdfcdf5c ("Bluetooth: hci_qca: Add qcom devcoredump support") Signed-off-by: Zijun Hu <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2024-02-28Bluetooth: hci_qca: Set BDA quirk bit if fwnode exists in DTJanaki Ramaiah Thota1-1/+12
BT adapter going into UNCONFIGURED state during BT turn ON when devicetree has no local-bd-address node. Bluetooth will not work out of the box on such devices, to avoid this problem, added check to set HCI_QUIRK_USE_BDADDR_PROPERTY based on local-bd-address node entry. When this quirk is not set, the public Bluetooth address read by host from controller though HCI Read BD Address command is considered as valid. Fixes: e668eb1e1578 ("Bluetooth: hci_core: Don't stop BT if the BD address missing in dts") Signed-off-by: Janaki Ramaiah Thota <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2024-02-28Bluetooth: qca: Fix wrong event type for patch config commandZijun Hu1-1/+1
Vendor-specific command patch config has HCI_Command_Complete event as response, but qca_send_patch_config_cmd() wrongly expects vendor-specific event for the command, fixed by using right event type. Btmon log for the vendor-specific command are shown below: < HCI Command: Vendor (0x3f|0x0000) plen 5 28 01 00 00 00 > HCI Event: Command Complete (0x0e) plen 5 Vendor (0x3f|0x0000) ncmd 1 Status: Success (0x00) 28 Fixes: 4fac8a7ac80b ("Bluetooth: btqca: sequential validation") Signed-off-by: Zijun Hu <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>