aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2021-03-24dsa: slave: add support for TC_SETUP_FTPablo Neira Ayuso1-1/+19
The dsa infrastructure provides a well-defined hierarchy of devices, pass up the call to set up the flow block to the master device. From the software dataplane, the netfilter infrastructure uses the dsa slave devices to refer to the input and output device for the given skbuff. Similarly, the flowtable definition in the ruleset refers to the dsa slave port devices. This patch adds the glue code to call ndo_setup_tc with TC_SETUP_FT with the master device via the dsa slave devices. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24netfilter: flowtable: support for FLOW_ACTION_PPPOE_PUSHPablo Neira Ayuso1-3/+12
Add a PPPoE push action if layer 2 protocol is ETH_P_PPP_SES to add PPPoE flowtable hardware offload support. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24netfilter: flowtable: bridge vlan hardware offload and switchdevFelix Fietkau5-1/+15
The switch might have already added the VLAN tag through PVID hardware offload. Keep this extra VLAN in the flowtable but skip it on egress. Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24netfilter: nft_flow_offload: use direct xmit if hardware offload is enabledPablo Neira Ayuso3-3/+21
If there is a forward path to reach an ethernet device and hardware offload is enabled, then use the direct xmit path. Moreover, store the real device in the direct xmit path info since software datapath uses dev_hard_header() to push the layer encapsulation headers while hardware offload refers to the real device. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24netfilter: flowtable: add offload support for xmit path typesPablo Neira Ayuso1-42/+124
When the flow tuple xmit_type is set to FLOW_OFFLOAD_XMIT_DIRECT, the dst_cache pointer is not valid, and the h_source/h_dest/ifidx out fields need to be used. This patch also adds the FLOW_ACTION_VLAN_PUSH action to pass the VLAN tag to the driver. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24netfilter: flowtable: add dsa supportPablo Neira Ayuso1-0/+5
Replace the master ethernet device by the dsa slave port. Packets coming in from the software ingress path use the dsa slave port as input device. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24netfilter: flowtable: add pppoe supportPablo Neira Ayuso2-7/+51
Add the PPPoE protocol and session id to the flow tuple using the encap fields to uniquely identify flows from the receive path. For the transmit path, dev_hard_header() on the vlan device push the headers. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24netfilter: flowtable: add bridge vlan filtering supportPablo Neira Ayuso1-0/+12
Add the vlan tag based when PVID is set on. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24netfilter: flowtable: add vlan supportPablo Neira Ayuso3-26/+128
Add the vlan id and protocol to the flow tuple to uniquely identify flows from the receive path. For the transmit path, dev_hard_header() on the vlan device push the headers. This patch includes support for two vlan headers (QinQ) from the ingress path. Add a generic encap field to the flowtable entry which stores the protocol and the tag id. This allows to reuse these fields in the PPPoE support coming in a later patch. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24netfilter: flowtable: use dev_fill_forward_path() to obtain egress devicePablo Neira Ayuso3-35/+123
The egress device in the tuple is obtained from route. Use dev_fill_forward_path() instead to provide the real egress device for this flow whenever this is available. The new FLOW_OFFLOAD_XMIT_DIRECT type uses dev_queue_xmit() to transmit ethernet frames. Cache the source and destination hardware address to use dev_queue_xmit() to transfer packets. The FLOW_OFFLOAD_XMIT_DIRECT replaces FLOW_OFFLOAD_XMIT_NEIGH if dev_fill_forward_path() finds a direct transmit path. In case of topology updates, if peer is moved to different bridge port, the connection will time out, reconnect will result in a new entry with the correct path. Snooping fdb updates would allow for cleaning up stale flowtable entries. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24netfilter: flowtable: use dev_fill_forward_path() to obtain ingress devicePablo Neira Ayuso2-5/+100
Obtain the ingress device in the tuple from the route in the reply direction. Use dev_fill_forward_path() instead to get the real ingress device for this flow. Fall back to use the ingress device that the IP forwarding route provides if: - dev_fill_forward_path() finds no real ingress device. - the ingress device that is obtained is not part of the flowtable devices. - this route has a xfrm policy. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24netfilter: flowtable: add xmit path typesPablo Neira Ayuso3-8/+27
Add the xmit_type field that defines the two supported xmit paths in the flowtable data plane, which are the neighbour and the xfrm xmit paths. This patch prepares for new flowtable xmit path types to come. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24net: dsa: resolve forwarding path for dsa slave portsFelix Fietkau1-0/+16
Add .ndo_fill_forward_path for dsa slave port devices Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24net: bridge: resolve forwarding path for VLAN tag actions in bridge devicesFelix Fietkau4-1/+101
Depending on the VLAN settings of the bridge and the port, the bridge can either add or remove a tag. When vlan filtering is enabled, the fdb lookup also needs to know the VLAN tag/proto for the destination address To provide this, keep track of the stack of VLAN tags for the path in the lookup context Signed-off-by: Felix Fietkau <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24net: bridge: resolve forwarding path for bridge devicesPablo Neira Ayuso1-0/+27
Add .ndo_fill_forward_path for bridge devices. Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24net: 8021q: resolve forwarding path for vlan devicesPablo Neira Ayuso1-0/+15
Add .ndo_fill_forward_path for vlan devices. For instance, assuming the following topology: IP forwarding / \ eth0.100 eth0 | eth0 . . . ethX ab:cd:ef:ab:cd:ef For packets going through IP forwarding to eth0.100 whose destination MAC address is ab:cd:ef:ab:cd:ef, dev_fill_forward_path() provides the following path: eth0.100 -> eth0 Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24net: resolve forwarding path from virtual netdevice and HW destination addressPablo Neira Ayuso1-0/+46
This patch adds dev_fill_forward_path() which resolves the path to reach the real netdevice from the IP forwarding side. This function takes as input the netdevice and the destination hardware address and it walks down the devices calling .ndo_fill_forward_path() for each device until the real device is found. For instance, assuming the following topology: IP forwarding / \ br0 eth0 / \ eth1 eth2 . . . ethX ab:cd:ef:ab:cd:ef where eth1 and eth2 are bridge ports and eth0 provides WAN connectivity. ethX is the interface in another box which is connected to the eth1 bridge port. For packets going through IP forwarding to br0 whose destination MAC address is ab:cd:ef:ab:cd:ef, dev_fill_forward_path() provides the following path: br0 -> eth1 .ndo_fill_forward_path for br0 looks up at the FDB for the bridge port from the destination MAC address to get the bridge port eth1. This information allows to create a fast path that bypasses the classic bridge and IP forwarding paths, so packets go directly from the bridge port eth1 to eth0 (wan interface) and vice versa. fast path .------------------------. / \ | IP forwarding | | / \ \/ | br0 eth0 . / \ -> eth1 eth2 . . . ethX ab:cd:ef:ab:cd:ef Signed-off-by: Pablo Neira Ayuso <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24net: bridge: Fix missing return assignment from br_vlan_replay_one callColin Ian King1-1/+1
The call to br_vlan_replay_one is returning an error return value but this is not being assigned to err and the following check on err is currently always false because err was initialized to zero. Fix this by assigning err. Addresses-Coverity: ("'Constant' variable guards dead code") Fixes: 22f67cdfae6a ("net: bridge: add helper to replay VLANs installed on port") Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24bridge: mrp: Disable roles before deleting the MRP instanceHoratiu Vultur1-0/+7
When an MRP instance was created, the driver was notified that the instance is created and then in a different callback about role of the instance. But when the instance was deleted the driver was notified only that the MRP instance is deleted and not also that the role is disabled. This patch make sure that the driver is notified that the role is changed to disabled before the MRP instance is deleted to have similar callbacks with the creating of the instance. In this way it would simplify the logic in the drivers. Signed-off-by: Horatiu Vultur <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-24xfrm: BEET mode doesn't support fragments for inner packetsXin Long1-0/+13
BEET mode replaces the IP(6) Headers with new IP(6) Headers when sending packets. However, when it's a fragment before the replacement, currently kernel keeps the fragment flag and replace the address field then encaps it with ESP. It would cause in RX side the fragments to get reassembled before decapping with ESP, which is incorrect. In Xiumei's testing, these fragments went over an xfrm interface and got encapped with ESP in the device driver, and the traffic was broken. I don't have a good way to fix it, but only to warn this out in dmesg. Reported-by: Xiumei Mu <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2021-03-24Bluetooth: Remove trailing semicolon in macrosMeng Yu1-2/+2
Macros should not use a trailing semicolon. Signed-off-by: Meng Yu <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2021-03-23net: make unregister netdev warning timeout configurableDmitry Vyukov2-1/+15
netdev_wait_allrefs() issues a warning if refcount does not drop to 0 after 10 seconds. While 10 second wait generally should not happen under normal workload in normal environment, it seems to fire falsely very often during fuzzing and/or in qemu emulation (~10x slower). At least it's not possible to understand if it's really a false positive or not. Automated testing generally bumps all timeouts to very high values to avoid flake failures. Add net.core.netdev_unregister_timeout_secs sysctl to make the timeout configurable for automated testing systems. Lowering the timeout may also be useful for e.g. manual bisection. The default value matches the current behavior. Signed-off-by: Dmitry Vyukov <[email protected]> Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=211877 Cc: [email protected] Cc: [email protected] Signed-off-by: David S. Miller <[email protected]>
2021-03-23net: dsa: sync up switchdev objects and port attributes when joining the bridgeVladimir Oltean3-2/+51
If we join an already-created bridge port, such as a bond master interface, then we can miss the initial switchdev notifications emitted by the bridge for this port, while it wasn't offloaded by anybody. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-23net: dsa: inherit the actual bridge port flags at join timeVladimir Oltean1-41/+82
DSA currently assumes that the bridge port starts off with this constellation of bridge port flags: - learning on - unicast flooding on - multicast flooding on - broadcast flooding on just by virtue of code copy-pasta from the bridge layer (new_nbp). This was a simple enough strategy thus far, because the 'bridge join' moment always coincided with the 'bridge port creation' moment. But with sandwiched interfaces, such as: br0 | bond0 | swp0 it may happen that the user has had time to change the bridge port flags of bond0 before enslaving swp0 to it. In that case, swp0 will falsely assume that the bridge port flags are those determined by new_nbp, when in fact this can happen: ip link add br0 type bridge ip link add bond0 type bond ip link set bond0 master br0 ip link set bond0 type bridge_slave learning off ip link set swp0 master br0 Now swp0 has learning enabled, bond0 has learning disabled. Not nice. Fix this by "dumpster diving" through the actual bridge port flags with br_port_flag_is_set, at bridge join time. We use this opportunity to split dsa_port_change_brport_flags into two distinct functions called dsa_port_inherit_brport_flags and dsa_port_clear_brport_flags, now that the implementation for the two cases is no longer similar. This patch also creates two functions called dsa_port_switchdev_sync and dsa_port_switchdev_unsync which collect what we have so far, even if that's asymmetrical. More is going to be added in the next patch. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-23net: dsa: pass extack to dsa_port_{bridge,lag}_joinVladimir Oltean3-7/+14
This is a pretty noisy change that was broken out of the larger change for replaying switchdev attributes and objects at bridge join time, which is when these extack objects are actually used. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Tobias Waldekranz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-23net: dsa: call dsa_port_bridge_join when joining a LAG that is already in a ↵Vladimir Oltean1-4/+18
bridge DSA can properly detect and offload this sequence of operations: ip link add br0 type bridge ip link add bond0 type bond ip link set swp0 master bond0 ip link set bond0 master br0 But not this one: ip link add br0 type bridge ip link add bond0 type bond ip link set bond0 master br0 ip link set swp0 master bond0 Actually the second one is more complicated, due to the elapsed time between the enslavement of bond0 and the offloading of it via swp0, a lot of things could have happened to the bond0 bridge port in terms of switchdev objects (host MDBs, VLANs, altered STP state etc). So this is a bit of a can of worms, and making sure that the DSA port's state is in sync with this already existing bridge port is handled in the next patches. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Tobias Waldekranz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-23net: bridge: add helper to replay VLANs installed on portVladimir Oltean1-0/+73
Currently this simple setup with DSA: ip link add br0 type bridge vlan_filtering 1 ip link add bond0 type bond ip link set bond0 master br0 ip link set swp0 master bond0 will not work because the bridge has created the PVID in br_add_if -> nbp_vlan_init, and it has notified switchdev of the existence of VLAN 1, but that was too early, since swp0 was not yet a lower of bond0, so it had no reason to act upon that notification. We need a helper in the bridge to replay the switchdev VLAN objects that were notified since the bridge port creation, because some of them may have been missed. As opposed to the br_mdb_replay function, the vg->vlan_list write side protection is offered by the rtnl_mutex which is sleepable, so we don't need to queue up the objects in atomic context, we can replay them right away. Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-23net: bridge: add helper to replay port and local fdb entriesVladimir Oltean1-0/+50
When a switchdev port starts offloading a LAG that is already in a bridge and has an FDB entry pointing to it: ip link set bond0 master br0 bridge fdb add dev bond0 00:01:02:03:04:05 master static ip link set swp0 master bond0 the switchdev driver will have no idea that this FDB entry is there, because it missed the switchdev event emitted at its creation. Ido Schimmel pointed this out during a discussion about challenges with switchdev offloading of stacked interfaces between the physical port and the bridge, and recommended to just catch that condition and deny the CHANGEUPPER event: https://lore.kernel.org/netdev/[email protected]/ But in fact, we might need to deal with the hard thing anyway, which is to replay all FDB addresses relevant to this port, because it isn't just static FDB entries, but also local addresses (ones that are not forwarded but terminated by the bridge). There, we can't just say 'oh yeah, there was an upper already so I'm not joining that'. So, similar to the logic for replaying MDB entries, add a function that must be called by individual switchdev drivers and replays local FDB entries as well as ones pointing towards a bridge port. This time, we use the atomic switchdev notifier block, since that's what FDB entries expect for some reason. Reported-by: Ido Schimmel <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-23net: bridge: add helper to replay port and host-joined mdb entriesVladimir Oltean1-17/+131
I have a system with DSA ports, and udhcpcd is configured to bring interfaces up as soon as they are created. I create a bridge as follows: ip link add br0 type bridge As soon as I create the bridge and udhcpcd brings it up, I also have avahi which automatically starts sending IPv6 packets to advertise some local services, and because of that, the br0 bridge joins the following IPv6 groups due to the code path detailed below: 33:33:ff:6d:c1:9c vid 0 33:33:00:00:00:6a vid 0 33:33:00:00:00:fb vid 0 br_dev_xmit -> br_multicast_rcv -> br_ip6_multicast_add_group -> __br_multicast_add_group -> br_multicast_host_join -> br_mdb_notify This is all fine, but inside br_mdb_notify we have br_mdb_switchdev_host hooked up, and switchdev will attempt to offload the host joined groups to an empty list of ports. Of course nobody offloads them. Then when we add a port to br0: ip link set swp0 master br0 the bridge doesn't replay the host-joined MDB entries from br_add_if, and eventually the host joined addresses expire, and a switchdev notification for deleting it is emitted, but surprise, the original addition was already completely missed. The strategy to address this problem is to replay the MDB entries (both the port ones and the host joined ones) when the new port joins the bridge, similar to what vxlan_fdb_replay does (in that case, its FDB can be populated and only then attached to a bridge that you offload). However there are 2 possibilities: the addresses can be 'pushed' by the bridge into the port, or the port can 'pull' them from the bridge. Considering that in the general case, the new port can be really late to the party, and there may have been many other switchdev ports that already received the initial notification, we would like to avoid delivering duplicate events to them, since they might misbehave. And currently, the bridge calls the entire switchdev notifier chain, whereas for replaying it should just call the notifier block of the new guy. But the bridge doesn't know what is the new guy's notifier block, it just knows where the switchdev notifier chain is. So for simplification, we make this a driver-initiated pull for now, and the notifier block is passed as an argument. To emulate the calling context for mdb objects (deferred and put on the blocking notifier chain), we must iterate under RCU protection through the bridge's mdb entries, queue them, and only call them once we're out of the RCU read-side critical section. There was some opportunity for reuse between br_mdb_switchdev_host_port, br_mdb_notify and the newly added br_mdb_queue_one in how the switchdev mdb object is created, so a helper was created. Suggested-by: Ido Schimmel <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-23net: bridge: add helper to retrieve the current ageing timeVladimir Oltean1-0/+13
The SWITCHDEV_ATTR_ID_BRIDGE_AGEING_TIME attribute is only emitted from: sysfs/ioctl/netlink -> br_set_ageing_time -> __set_ageing_time therefore not at bridge port creation time, so: (a) switchdev drivers have to hardcode the initial value for the address ageing time, because they didn't get any notification (b) that hardcoded value can be out of sync, if the user changes the ageing time before enslaving the port to the bridge We need a helper in the bridge, such that switchdev drivers can query the current value of the bridge ageing time when they start offloading it. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Tobias Waldekranz <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-23net: bridge: add helper for retrieving the current bridge port STP stateVladimir Oltean1-0/+14
It may happen that we have the following topology with DSA or any other switchdev driver with LAG offload: ip link add br0 type bridge stp_state 1 ip link add bond0 type bond ip link set bond0 master br0 ip link set swp0 master bond0 ip link set swp1 master bond0 STP decides that it should put bond0 into the BLOCKING state, and that's that. The ports that are actively listening for the switchdev port attributes emitted for the bond0 bridge port (because they are offloading it) and have the honor of seeing that switchdev port attribute can react to it, so we can program swp0 and swp1 into the BLOCKING state. But if then we do: ip link set swp2 master bond0 then as far as the bridge is concerned, nothing has changed: it still has one bridge port. But this new bridge port will not see any STP state change notification and will remain FORWARDING, which is how the standalone code leaves it in. We need a function in the bridge driver which retrieves the current STP state, such that drivers can synchronize to it when they may have missed switchdev events. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Tobias Waldekranz <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-23net: bridge: don't notify switchdev for local FDB addressesVladimir Oltean1-0/+2
As explained in this discussion: https://lore.kernel.org/netdev/20210117193009.io3nungdwuzmo5f7@skbuf/ the switchdev notifiers for FDB entries managed to have a zero-day bug. The bridge would not say that this entry is local: ip link add br0 type bridge ip link set swp0 master br0 bridge fdb add dev swp0 00:01:02:03:04:05 master local and the switchdev driver would be more than happy to offload it as a normal static FDB entry. This is despite the fact that 'local' and non-'local' entries have completely opposite directions: a local entry is locally terminated and not forwarded, whereas a static entry is forwarded and not locally terminated. So, for example, DSA would install this entry on swp0 instead of installing it on the CPU port as it should. There is an even sadder part, which is that the 'local' flag is implicit if 'static' is not specified, meaning that this command produces the same result of adding a 'local' entry: bridge fdb add dev swp0 00:01:02:03:04:05 master I've updated the man pages for 'bridge', and after reading it now, it should be pretty clear to any user that the commands above were broken and should have never resulted in the 00:01:02:03:04:05 address being forwarded (this behavior is coherent with non-switchdev interfaces): https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/ If you're a user reading this and this is what you want, just use: bridge fdb add dev swp0 00:01:02:03:04:05 master static Because switchdev should have given drivers the means from day one to classify FDB entries as local/non-local, but didn't, it means that all drivers are currently broken. So we can just as well omit the switchdev notifications for local FDB entries, which is exactly what this patch does to close the bug in stable trees. For further development work where drivers might want to trap the local FDB entries to the host, we can add a 'bool is_local' to br_switchdev_fdb_call_notifiers(), and selectively make drivers act upon that bit, while all the others ignore those entries if the 'is_local' bit is set. Fixes: 6b26b51b1d13 ("net: bridge: Add support for notifying devices about FDB add/del") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-23net/sched: act_ct: clear post_ct if doing ct_clearMarcelo Ricardo Leitner1-2/+4
Invalid detection works with two distinct moments: act_ct tries to find a conntrack entry and set post_ct true, indicating that that was attempted. Then, when flow dissector tries to dissect CT info and no entry is there, it knows that it was tried and no entry was found, and synthesizes/sets key->ct_state = TCA_FLOWER_KEY_CT_FLAGS_TRACKED | TCA_FLOWER_KEY_CT_FLAGS_INVALID; mimicing what OVS does. OVS has this a bit more streamlined, as it recomputes the key after trying to find a conntrack entry for it. Issue here is, when we have 'tc action ct clear', it didn't clear post_ct, causing a subsequent match on 'ct_state -trk' to fail, due to the above. The fix, thus, is to clear it. Reproducer rules: tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 0 \ protocol ip flower ip_proto tcp ct_state -trk \ action ct zone 1 pipe \ action goto chain 2 tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 2 \ protocol ip flower \ action ct clear pipe \ action goto chain 4 tc filter add dev enp130s0f0np0_0 ingress prio 1 chain 4 \ protocol ip flower ct_state -trk \ action mirred egress redirect dev enp130s0f1np1_0 With the fix, the 3rd rule matches, like it does with OVS kernel datapath. Fixes: 7baf2429a1a9 ("net/sched: cls_flower add CT_FLAGS_INVALID flag support") Signed-off-by: Marcelo Ricardo Leitner <[email protected]> Reviewed-by: wenxu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-23net: lapb: Make "lapb_t1timer_running" able to detect an already running timerXie He2-9/+14
Problem: The "lapb_t1timer_running" function in "lapb_timer.c" is used in only one place: in the "lapb_kick" function in "lapb_out.c". "lapb_kick" calls "lapb_t1timer_running" to check if the timer is already pending, and if it is not, schedule it to run. However, if the timer has already fired and is running, and is waiting to get the "lapb->lock" lock, "lapb_t1timer_running" will not detect this, and "lapb_kick" will then schedule a new timer. The old timer will then abort when it sees a new timer pending. I think this is not right. The purpose of "lapb_kick" should be ensuring that the actual work of the timer function is scheduled to be done. If the timer function is already running but waiting for the lock, "lapb_kick" should not abort and reschedule it. Changes made: I added a new field "t1timer_running" in "struct lapb_cb" for "lapb_t1timer_running" to use. "t1timer_running" will accurately reflect whether the actual work of the timer is pending. If the timer has fired but is still waiting for the lock, "t1timer_running" will still correctly reflect whether the actual work is waiting to be done. The old "t1timer_stop" field, whose only responsibility is to ask a timer (that is already running but waiting for the lock) to abort, is no longer needed, because the new "t1timer_running" field can fully take over its responsibility. Therefore "t1timer_stop" is deleted. "t1timer_running" is not simply a negation of the old "t1timer_stop". At the end of the timer function, if it does not reschedule itself, "t1timer_running" is set to false to indicate that the timer is stopped. For consistency of the code, I also added "t2timer_running" and deleted "t2timer_stop". Signed-off-by: Xie He <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-23batman-adv: Drop unused header preempt.hSven Eckelmann1-1/+0
The commit b1de0f01b011 ("batman-adv: Use netif_rx_any_context().") removed the last user for a function declaration from linux/preempt.h. The include should therefore be cleaned up. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2021-03-23batman-adv: Fix order of kernel doc in batadv_privLinus Lüssing1-5/+5
During the inlining process of kerneldoc in commit 8b84cc4fb556 ("batman-adv: Use inline kernel-doc for enum/struct"), some comments were placed at the wrong struct members. Fixing this by reordering the comments. Signed-off-by: Linus Lüssing <[email protected]> Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2021-03-23Bluetooth: Remove trailing semicolon in macrosMeng Yu1-2/+2
remove trailing semicolon in macros and coding style fix. Signed-off-by: Meng Yu <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2021-03-23Bluetooth: check for zapped sk before connectingArchie Pusaka1-0/+8
There is a possibility of receiving a zapped sock on l2cap_sock_connect(). This could lead to interesting crashes, one such case is tearing down an already tore l2cap_sock as is happened with this call trace: __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0xc4/0x118 lib/dump_stack.c:56 register_lock_class kernel/locking/lockdep.c:792 [inline] register_lock_class+0x239/0x6f6 kernel/locking/lockdep.c:742 __lock_acquire+0x209/0x1e27 kernel/locking/lockdep.c:3105 lock_acquire+0x29c/0x2fb kernel/locking/lockdep.c:3599 __raw_spin_lock_bh include/linux/spinlock_api_smp.h:137 [inline] _raw_spin_lock_bh+0x38/0x47 kernel/locking/spinlock.c:175 spin_lock_bh include/linux/spinlock.h:307 [inline] lock_sock_nested+0x44/0xfa net/core/sock.c:2518 l2cap_sock_teardown_cb+0x88/0x2fb net/bluetooth/l2cap_sock.c:1345 l2cap_chan_del+0xa3/0x383 net/bluetooth/l2cap_core.c:598 l2cap_chan_close+0x537/0x5dd net/bluetooth/l2cap_core.c:756 l2cap_chan_timeout+0x104/0x17e net/bluetooth/l2cap_core.c:429 process_one_work+0x7e3/0xcb0 kernel/workqueue.c:2064 worker_thread+0x5a5/0x773 kernel/workqueue.c:2196 kthread+0x291/0x2a6 kernel/kthread.c:211 ret_from_fork+0x4e/0x80 arch/x86/entry/entry_64.S:604 Signed-off-by: Archie Pusaka <[email protected]> Reported-by: [email protected] Reviewed-by: Alain Michaud <[email protected]> Reviewed-by: Abhishek Pandit-Subedi <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2021-03-22net: dsa: don't assign an error value to tag_opsGeorge McCollister1-4/+7
Use a temporary variable to hold the return value from dsa_tag_driver_get() instead of assigning it to dst->tag_ops. Leaving an error value in dst->tag_ops can result in deferencing an invalid pointer when a deferred switch configuration happens later. Fixes: 357f203bb3b5 ("net: dsa: keep a copy of the tagging protocol in the DSA switch tree") Signed-off-by: George McCollister <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller6-205/+161
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following batch contains Netfilter updates for net-next: 1) Split flowtable workqueues per events, from Oz Shlomo. 2) fall-through warnings for clang, from Gustavo A. R. Silva 3) Remove unused declaration in conntrack, from YueHaibing. 4) Consolidate skb_try_make_writable() in flowtable datapath, simplify some of the existing codebase. 5) Call dst_check() to fall back to static classic forwarding path. 6) Update table flags from commit phase. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-03-22net: set initial device refcount to 1Eric Dumazet1-3/+6
When adding CONFIG_PCPU_DEV_REFCNT, I forgot that the initial net device refcount was 0. When CONFIG_PCPU_DEV_REFCNT is not set, this means the first dev_hold() triggers an illegal refcount operation (addition on 0) refcount_t: addition on 0; use-after-free. WARNING: CPU: 0 PID: 1 at lib/refcount.c:25 refcount_warn_saturate+0x128/0x1a4 Fix is to change initial (and final) refcount to be 1. Also add a missing kerneldoc piece, as reported by Stephen Rothwell. Fixes: 919067cc845f ("net: add CONFIG_PCPU_DEV_REFCNT") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Guenter Roeck <[email protected]> Tested-by: Guenter Roeck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-22net: bridge: when suppression is enabled exclude RARP packetsNikolay Aleksandrov1-1/+3
Recently we had an interop issue where RARP packets got suppressed with bridge neigh suppression enabled, but the check in the code was meant to suppress GARP. Exclude RARP packets from it which would allow some VMWare setups to work, to quote the report: "Those RARP packets usually get generated by vMware to notify physical switches when vMotion occurs. vMware may use random sip/tip or just use sip=tip=0. So the RARP packet sometimes get properly flooded by the vtep and other times get dropped by the logic" Reported-by: Amer Abdalamer <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-22net-sysfs: remove possible sleep from an RCU read-side critical sectionAntoine Tenart1-1/+1
xps_queue_show is mostly made of an RCU read-side critical section and calls bitmap_zalloc with GFP_KERNEL in the middle of it. That is not allowed as this call may sleep and such behaviours aren't allowed in RCU read-side critical sections. Fix this by using GFP_NOWAIT instead. Fixes: 5478fcd0f483 ("net: embed nr_ids in the xps maps") Reported-by: kernel test robot <[email protected]> Suggested-by: Matthew Wilcox <[email protected]> Signed-off-by: Antoine Tenart <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-22net: l2tp: Fix a typoBhaskar Chowdhury1-1/+1
s/verifed/verified/ Signed-off-by: Bhaskar Chowdhury <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-22net: move the ptype_all and ptype_base declarations to include/linux/netdevice.hVladimir Oltean1-3/+0
ptype_all and ptype_base are declared in net/core/dev.c as non-static, because they are used by net-procfs.c too. However, a "make W=1" build complains that there was no previous declaration of ptype_all and ptype_base in a header file, so this way of declaring things constitutes a violation of coding style. Let's move the extern declarations of ptype_all and ptype_base to the linux/netdevice.h file, which is included by net-procfs.c too. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-22net: make xps_needed and xps_rxqs_needed staticVladimir Oltean1-4/+2
Since their introduction in commit 04157469b7b8 ("net: Use static_key for XPS maps"), xps_needed and xps_rxqs_needed were never used outside net/core/dev.c, so I don't really understand why they were exported as symbols in the first place. This is needed in order to silence a "make W=1" warning about these static keys not being declared as static variables, but not having a previous declaration in a header file nonetheless. Cc: Amritha Nambiar <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-22net: bridge: declare br_vlan_tunnel_lookup argument tunnel_id as __be64Vladimir Oltean1-1/+1
The only caller of br_vlan_tunnel_lookup, br_handle_ingress_vlan_tunnel, extracts the tunnel_id from struct ip_tunnel_info::struct ip_tunnel_key:: tun_id which is a __be64 value. The exact endianness does not seem to matter, because the tunnel id is just used as a lookup key for the VLAN group's tunnel hash table, and the value is not interpreted directly per se. Moreover, rhashtable_lookup_fast treats the key argument as a const void *. Therefore, there is no functional change associated with this patch, just one to silence "make W=1" builds. Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-22openvswitch: Fix a typoBhaskar Chowdhury1-1/+1
s/subsytem/subsystem/ Signed-off-by: Bhaskar Chowdhury <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-22net: ipconfig: ic_dev can be NULL in ic_close_devsVladimir Oltean1-6/+8
ic_close_dev contains a generalization of the logic to not close a network interface if it's the host port for a DSA switch. This logic is disguised behind an iteration through the lowers of ic_dev in ic_close_dev. When no interface for ipconfig can be found, ic_dev is NULL, and ic_close_dev: - dereferences a NULL pointer when assigning selected_dev - would attempt to search through the lower interfaces of a NULL net_device pointer So we should protect against that case. The "lower_dev" iterator variable was shortened to "lower" in order to keep the 80 character limit. Fixes: f68cbaed67cb ("net: ipconfig: avoid use-after-free in ic_close_devs") Fixes: 46acf7bdbc72 ("Revert "net: ipv4: handle DSA enabled master network devices"") Signed-off-by: Vladimir Oltean <[email protected]> Tested-by: Heiko Thiery <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-22net/sched: cls_flower: use nla_get_be32 for TCA_FLOWER_KEY_FLAGSVladimir Oltean1-2/+2
The existing code is functionally correct: iproute2 parses the ip_flags argument for tc-flower and really packs it as big endian into the TCA_FLOWER_KEY_FLAGS netlink attribute. But there is a problem in the fact that W=1 builds complain: net/sched/cls_flower.c:1047:15: warning: cast to restricted __be32 This is because we should use the dedicated helper for obtaining a __be32 pointer to the netlink attribute, not a u32 one. This ensures type correctness for be32_to_cpu. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>