aboutsummaryrefslogtreecommitdiff
path: root/include
AgeCommit message (Collapse)AuthorFilesLines
2021-04-08net: phy: add constants for 2.5G and 5G speed in PCS speed registerMarek Behún1-0/+2
Add constants for 2.5G and 5G speed in PCS speed register into mdio.h. Signed-off-by: Marek Behún <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-08net: phy: marvell10g: add separate structure for 88X3340Marek Behún1-1/+5
The 88X3340 contains 4 cores similar to 88X3310, but there is a difference: it does not support xaui host mode. Instead the corresponding MACTYPE means rxaui / 5gbase-r / 2500base-x / sgmii without AN Signed-off-by: Marek Behún <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-07net: remove the new_ifindex argument from dev_change_net_namespaceAndrei Vagin1-1/+7
Here is only one place where we want to specify new_ifindex. In all other cases, callers pass 0 as new_ifindex. It looks reasonable to add a low-level function with new_ifindex and to convert dev_change_net_namespace to a static inline wrapper. Fixes: eeb85a14ee34 ("net: Allow to specify ifindex when device is moved to another namespace") Suggested-by: Jakub Kicinski <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Acked-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-07Merge tag 'mlx5-updates-2021-04-06' of ↵David S. Miller1-5/+4
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-04-06 Introduce TC sample offload Background ---------- The tc sample action allows user to sample traffic matched by tc classifier. The sampling consists of choosing packets randomly and sampling them using psample module. The tc sample parameters include group id, sampling rate and packet's truncation (to save kernel-user traffic). Sample in TC SW --------------- User must specify rate and group id for sample action, truncate is optional. tc filter add dev enp4s0f0_0 ingress protocol ip prio 1 flower \ src_mac 02:25:d0:14:01:02 dst_mac 02:25:d0:14:01:03 \ action sample rate 10 group 5 trunc 60 \ action mirred egress redirect dev enp4s0f0_1 The tc sample action kernel module 'act_sample' will call another kernel module 'psample' to send sampled packets to userspace. MLX5 sample HW offload - MLX5 driver patches -------------------------------------------- The sample action is translated to a goto flow table object destination which samples packets according to the provided sample ratio. Sampled packets are duplicated. One copy is processed by a termination table, named the sample table, which sends the packet to the eswitch manager port (that will be processed by software). The second copy is processed by the default table which executes the subsequent actions. The default table is created per <vport, chain, prio> tuple as rules with different prios and chains may overlap. For example, for the following typical flow table: +-------------------------------+ + original flow table + +-------------------------------+ + original match + +-------------------------------+ + sample action + other actions + +-------------------------------+ We translate the tc filter with sample action to the following HW model: +---------------------+ + original flow table + +---------------------+ + original match + +---------------------+ | v +------------------------------------------------+ + Flow Sampler Object + +------------------------------------------------+ + sample ratio + +------------------------------------------------+ + sample table id | default table id + +------------------------------------------------+ | | v v +-----------------------------+ +----------------------------------------+ + sample table + + default table per <vport, chain, prio> + +-----------------------------+ +----------------------------------------+ + forward to management vport + + original match + +-----------------------------+ +----------------------------------------+ + other actions + +----------------------------------------+ Flow sampler object ------------------- Hardware introduces flow sampler object to do sample. It is a new destination type. Driver needs to specify two flow table ids in it. One is sample table id. The other one is the default table id. Sample table samples the packets according to the sample rate and forward the sampled packets to eswitch manager port. Default table finishes the subsequent actions. Group id and reg_c0 ------------------- Userspace program will take different actions for sampled packets according to tc sample action group id. So hardware must pass group id to software for each sampled packets. In Paul Blakey's "Introduce connection tracking offload" patch set, reg_c0 lower 16 bits are used for miss packet chain id restore. We convert reg_c0 lower 16 bits to a common object pool, so other features can also use it. Since sample group id is 32 bits, create a 16 bits object id to map the group id and write the object id to reg_c0 lower 16 bits. reg_c0 can only be used for matching. Write reg_c0 to flow_tag, so software can get the object id via flow_tag and find group id via the common object pool. Sampler restore handle ---------------------- Use common object pool to create an object id to map sample parameters. Allocate a modify header action to write the object id to reg_c0 lower 16 bits. Create a restore rule to pass the object id to software. So software can identify sampled packets via the object id and send it to userspace. Aggregate the modify header action, restore rule and object id to a sample restore handle. Re-use identical sample restore handle for the same object id. Send sampled packets to userspace --------------------------------- The destination for sampled packets is eswitch manager port, so representors can receive sampled packets together with the group id. Driver will send sampled packets and group id to userspace via psample. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-04-07ethtool: document PHY tunable callbacksJakub Kicinski1-0/+2
Add missing kdoc for phy tunable callbacks. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-07mptcp: use mptcp_addr_info in mptcp_out_optionsGeliang Tang1-8/+13
This patch moved the mptcp_addr_info struct from protocol.h to mptcp.h, added a new struct mptcp_addr_info member addr in struct mptcp_out_options, and dropped the original addr, addr6, addr_id and port fields in it. Then we can use opts->addr to get the adding address from PM directly using mptcp_pm_add_addr_signal. Since the port number became big-endian now, use ntohs to convert it before sending it out with the ADD_ADDR suboption. Also convert it when passing it to add_addr_generate_hmac or printing it out. Co-developed-by: Matthieu Baerts <[email protected]> Signed-off-by: Matthieu Baerts <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-06net/mlx5: Map register values to restore objectsChris Mi1-5/+4
Currently reg_c0 lower 16 bits and reg_b are used to store the chain id that missed in FDB and NIC tables accordingly. However, the registers' values may index a restore object, rather than a single u32 value. Different object types can be used to restore mutually exclusive contexts such as chain id and sample group id. Use the mapping object to associate an index with a restore object as a prestep for supporting additional restore types. Signed-off-by: Chris Mi <[email protected]> Reviewed-by: Oz Shlomo <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-04-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller13-76/+74
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following batch contains Netfilter/IPVS updates for your net-next tree: 1) Simplify log infrastructure modularity: Merge ipv4, ipv6, bridge, netdev and ARP families to nf_log_syslog.c. Add module softdeps. This fixes a rare deadlock condition that might occur when log module autoload is required. From Florian Westphal. 2) Moves part of netfilter related pernet data from struct net to net_generic() infrastructure. All of these users can be modules, so if they are not loaded there is no need to waste space. Size reduction is 7 cachelines on x86_64, also from Florian. 2) Update nftables audit support to report events once per table, to get it aligned with iptables. From Richard Guy Briggs. 3) Check for stale routes from the flowtable garbage collector path. This is fixing IPv6 which breaks due missing check for the dst_cookie. 4) Add a nfnl_fill_hdr() function to simplify netlink + nfnetlink headers setup. 5) Remove documentation on several statified functions. 6) Remove printk on netns creation for the FTP IPVS tracker, from Florian Westphal. 7) Remove unnecessary nf_tables_destroy_list_lock spinlock initialization, from Yang Yingliang. 7) Remove a duplicated forward declaration in ipset, from Wan Jiabing. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-04-06time64.h: Consolidated PSEC_PER_SEC definitionAndy Shevchenko2-2/+1
We have currently three users of the PSEC_PER_SEC each of them defining it individually. Instead, move it to time64.h to be available for everyone. There is a new user coming with the same constant in use. It will also make its life easier. Signed-off-by: Andy Shevchenko <[email protected]> Acked-by: Heiko Stuebner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-06usbnet: add method for reporting speed without MIIOliver Neukum1-2/+5
The old method for reporting link speed assumed a driver uses the generic phy (mii) MDIO read/write functions. CDC devices don't expose the phy. Add a primitive internal version reporting back directly what the CDC notification/status operations recorded. v2: rebased on upstream v3: changed names and made clear which units are used v4: moved hunks to correct patch; rewrote commmit messages Signed-off-by: Oliver Neukum <[email protected]> Tested-by: Roland Dreier <[email protected]> Reviewed-by: Grant Grundler <[email protected]> Tested-by: Grant Grundler <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-06usbnet: add _mii suffix to usbnet_set/get_link_ksettingsOliver Neukum1-2/+2
The generic functions assumed devices provided an MDIO interface (accessed via older mii code, not phylib). This is true only for genuine ethernet. Devices with a higher level of abstraction or based on different technologies do not have MDIO. To support this case, first rename the existing functions with _mii suffix. v2: rebased on changed upstream v3: changed names to clearly say that this does NOT use phylib v4: moved hunks to correct patch; reworded commmit messages Signed-off-by : Oliver Neukum <[email protected]> Tested-by: Roland Dreier <[email protected]> Reviewed-by: Grant Grundler <[email protected]> Tested-by: Grant Grundler <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-06net: remove obsolete members from struct netFlorian Westphal5-27/+0
all have been moved to generic_net infra. On x86_64, this reduces struct net size from 70 to 63 cache lines (4480 to 4032 byte). Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-04-06netfilter: conntrack: move ecache dwork to net_generic infraFlorian Westphal2-21/+16
dwork struct is large (>128 byte) and not needed when conntrack module is not loaded. Place it in net_generic data instead. The struct net dwork member is now obsolete and will be removed in a followup patch. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-04-06netfilter: conntrack: move sysctl pointer to net_generic infraFlorian Westphal1-0/+3
No need to keep this in struct net, place it in the net_generic data. The sysctl pointer is removed from struct net in a followup patch. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-04-06netfilter: nf_tables: use net_generic infra for transaction dataFlorian Westphal1-0/+11
This moves all nf_tables pernet data from struct net to a net_generic extension, with the exception of the gencursor. The latter is used in the data path and also outside of the nf_tables core. All others are only used from the configuration plane. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-04-06netfilter: nf_defrag_ipv6: use net_generic infraFlorian Westphal1-0/+6
This allows followup patch to remove these members from struct net. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-04-06netfilter: nfnetlink: add and use nfnetlink_broadcastFlorian Westphal1-0/+2
This removes the only reference of net->nfnl outside of the nfnetlink module. This allows to move net->nfnl to net_generic infra. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-04-05net: Allow to specify ifindex when device is moved to another namespaceAndrei Vagin1-1/+2
Currently, we can specify ifindex on link creation. This change allows to specify ifindex when a device is moved to another network namespace. Even now, a device ifindex can be changed if there is another device with the same ifindex in the target namespace. So this change doesn't introduce completely new behavior, it adds more control to the process. CRIU users want to restore containers with pre-created network devices. A user will provide network devices and instructions where they have to be restored, then CRIU will restore network namespaces and move devices into them. The problem is that devices have to be restored with the same indexes that they have before C/R. Cc: Alexander Mikhalitsyn <[email protected]> Suggested-by: Christian Brauner <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-04Merge tag 'mlx5-updates-2021-04-02' of ↵David S. Miller1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2021-04-02 This series provides trivial updates and cleanup to mlx5 driver 1) Support for matching on ct_state inv and rel flag in connection tracking 2) Reject TC rules that redirect from a VF to itself 3) Parav provided some E-Switch cleanups that could be summarized to: 3.1) Packing and Reduce structure sizes 3.2) Dynamic allocation of rate limit tables and structures 4) Vu Makes the netdev arfs and vlan tables allocation dynamic. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-04-02net/mlx5: Allocate rate limit table when rate is configuredParav Pandit1-0/+1
A device supports 128 rate limiters. A static table allocation consumes 8KB of memory even when rate is not configured. Instead, allocate the table when at least one rate is configured. Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-04-02net/mlx5: Pack mlx5_rl_entry structureParav Pandit1-1/+1
mlx5_rl_entry structure is not properly packed as shown below. Due to this an array of size 9144 bytes allocated which is aligned to 16Kbytes. Hence, pack the structure and avoid the wastage. This offers 8Kbytes of saving per mlx5_core_dev struct. pahole -C mlx5_rl_entry drivers/net/ethernet/mellanox/mlx5/core/en_main.o Existing layout: struct mlx5_rl_entry { u8 rl_raw[48]; /* 0 48 */ u16 index; /* 48 2 */ /* XXX 6 bytes hole, try to pack */ u64 refcount; /* 56 8 */ /* --- cacheline 1 boundary (64 bytes) --- */ u16 uid; /* 64 2 */ u8 dedicated:1; /* 66: 0 1 */ /* size: 72, cachelines: 2, members: 5 */ /* sum members: 60, holes: 1, sum holes: 6 */ /* sum bitfield members: 1 bits (0 bytes) */ /* padding: 5 */ /* bit_padding: 7 bits */ /* last cacheline: 8 bytes */ }; After alignment: struct mlx5_rl_entry { u8 rl_raw[48]; /* 0 48 */ u64 refcount; /* 48 8 */ u16 index; /* 56 2 */ u16 uid; /* 58 2 */ u8 dedicated:1; /* 60: 0 1 */ /* size: 64, cachelines: 1, members: 5 */ /* padding: 3 */ /* bit_padding: 7 bits */ }; Signed-off-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-04-02tcp: reorder tcp_congestion_ops for better cache localityEric Dumazet1-15/+27
Group all the often used fields in the first cache line, to reduce cache line misses. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-02net: reorganize fields in netns_mibEric Dumazet1-10/+20
Order fields to increase locality for most used protocols. udplite and icmp are moved at the end. Same for proc_net_devsnmp6 which is not used in fast path. This potentially saves one cache line miss for typical TCP/UDP over IPv4/IPv6. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-02mptcp: add mptcp reset option supportFlorian Westphal2-2/+27
The MPTCP reset option allows to carry a mptcp-specific error code that provides more information on the nature of a connection reset. Reset option data received gets stored in the subflow context so it can be sent to userspace via the 'subflow closed' netlink event. When a subflow is closed, the desired error code that should be sent to the peer is also placed in the subflow context structure. If a reset is sent before subflow establishment could complete, e.g. on HMAC failure during an MP_JOIN operation, the mptcp skb extension is used to store the reset information. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller11-48/+179
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-04-01 The following pull-request contains BPF updates for your *net-next* tree. We've added 68 non-merge commits during the last 7 day(s) which contain a total of 70 files changed, 2944 insertions(+), 1139 deletions(-). The main changes are: 1) UDP support for sockmap, from Cong. 2) Verifier merge conflict resolution fix, from Daniel. 3) xsk selftests enhancements, from Maciej. 4) Unstable helpers aka kernel func calling, from Martin. 5) Batches ops for LPM map, from Pedro. 6) Fix race in bpf_get_local_storage, from Yonghong. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-04-01include: net: Remove repeated struct declarationWan Jiabing1-1/+0
struct ctl_table_header is declared twice. One is declared at 46th line. The blew one is not needed. Remove the duplicate. Signed-off-by: Wan Jiabing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-01skmsg: Extract __tcp_bpf_recvmsg() and tcp_bpf_wait_data()Cong Wang2-2/+4
Although these two functions are only used by TCP, they are not specific to TCP at all, both operate on skmsg and ingress_msg, so fit in net/core/skmsg.c very well. And we will need them for non-TCP, so rename and move them to skmsg.c and export them to modules. Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-01udp: Implement ->read_sock() for sockmapCong Wang1-0/+2
This is similar to tcp_read_sock(), except we do not need to worry about connections, we just need to retrieve skb from UDP receive queue. Note, the return value of ->read_sock() is unused in sk_psock_verdict_data_ready(), and UDP still does not support splice() due to lack of ->splice_read(), so users can not reach udp_read_sock() directly. Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-01sock: Introduce sk->sk_prot->psock_update_sk_prot()Cong Wang4-15/+8
Currently sockmap calls into each protocol to update the struct proto and replace it. This certainly won't work when the protocol is implemented as a module, for example, AF_UNIX. Introduce a new ops sk->sk_prot->psock_update_sk_prot(), so each protocol can implement its own way to replace the struct proto. This also helps get rid of symbol dependencies on CONFIG_INET. Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-01sock_map: Introduce BPF_SK_SKB_VERDICTCong Wang2-0/+3
Reusing BPF_SK_SKB_STREAM_VERDICT is possible but its name is confusing and more importantly we still want to distinguish them from user-space. So we can just reuse the stream verdict code but introduce a new type of eBPF program, skb_verdict. Users are not allowed to attach stream_verdict and skb_verdict programs to the same map. Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-01skmsg: Use rcu work for destroying psockCong Wang1-4/+1
The RCU callback sk_psock_destroy() only queues work psock->gc, so we can just switch to rcu work to simplify the code. Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-01skmsg: Avoid lock_sock() in sk_psock_backlog()Cong Wang1-0/+2
We do not have to lock the sock to avoid losing sk_socket, instead we can purge all the ingress queues when we close the socket. Sending or receiving packets after orphaning socket makes no sense. We do purge these queues when psock refcnt reaches zero but here we want to purge them explicitly in sock_map_close(). There are also some nasty race conditions on testing bit SK_PSOCK_TX_ENABLED and queuing/canceling the psock work, we can expand psock->ingress_lock a bit to protect them too. As noticed by John, we still have to lock the psock->work, because the same work item could be running concurrently on different CPU's. Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-01net: Introduce skb_send_sock() for sock_mapCong Wang1-0/+1
We only have skb_send_sock_locked() which requires callers to use lock_sock(). Introduce a variant skb_send_sock() which locks on its own, callers do not need to lock it any more. This will save us from adding a ->sendmsg_locked for each protocol. To reuse the code, pass function pointers to __skb_send_sock() and build skb_send_sock() and skb_send_sock_locked() on top. Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-04-01skmsg: Introduce a spinlock to protect ingress_msgCong Wang1-0/+46
Currently we rely on lock_sock to protect ingress_msg, it is too big for this, we can actually just use a spinlock to protect this list like protecting other skb queues. __tcp_bpf_recvmsg() is still special because of peeking, it still has to use lock_sock. Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Jakub Sitnicki <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-03-31ipv6: move ip6_dst_ops first in netns_ipv6Eric Dumazet1-1/+3
ip6_dst_ops have cache line alignement. Moving it at beginning of netns_ipv6 removes a 48 byte hole, and shrinks netns_ipv6 from 12 to 11 cache lines. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31ipv6: convert elligible sysctls to u8Eric Dumazet1-12/+12
Convert most sysctls that can fit in a byte. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31tcp: convert tcp_comp_sack_nr sysctl to u8Eric Dumazet1-1/+1
tcp_comp_sack_nr max value was already 255. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31ipv4: convert igmp_link_local_mcast_reports sysctl to u8Eric Dumazet1-1/+1
This sysctl is a bool, can use less storage. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31ipv4: convert fib_multipath_{use_neigh|hash_policy} sysctls to u8Eric Dumazet1-2/+2
Make room for better packing of netns_ipv4 Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31ipv4: convert udp_l3mdev_accept sysctl to u8Eric Dumazet1-1/+1
Reduce footprint of sysctls. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31ipv4: convert fib_notify_on_flag_change sysctl to u8Eric Dumazet1-1/+1
Reduce footprint of sysctls. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31inet: shrink netns_ipv4 by another cache lineEric Dumazet1-3/+3
By shuffling around some fields to remove 8 bytes of hole, we can save one cache line. pahole result before/after the patch : /* size: 768, cachelines: 12, members: 139 */ /* sum members: 673, holes: 11, sum holes: 39 */ /* padding: 56 */ /* paddings: 2, sum paddings: 7 */ /* forced alignments: 1 */ -> /* size: 704, cachelines: 11, members: 139 */ /* sum members: 673, holes: 10, sum holes: 31 */ /* paddings: 2, sum paddings: 7 */ /* forced alignments: 1 */ Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31inet: shrink inet_timewait_death_row by 48 bytesEric Dumazet1-2/+5
struct inet_timewait_death_row uses two cache lines, because we want tw_count to use a full cache line to avoid false sharing. Rework its definition and placement in netns_ipv4 so that: 1) We add 60 bytes of padding after tw_count to avoid false sharing, knowing that tcp_death_row will have ____cacheline_aligned_in_smp attribute. 2) We do not risk padding before tcp_death_row, because we move it at the beginning of netns_ipv4, even if new fields are added later. 3) We do not waste 48 bytes of padding after it. Note that I have not changed dccp. pahole result for struct netns_ipv4 before/after the patch : /* size: 832, cachelines: 13, members: 139 */ /* sum members: 721, holes: 12, sum holes: 95 */ /* padding: 16 */ /* paddings: 2, sum paddings: 55 */ -> /* size: 768, cachelines: 12, members: 139 */ /* sum members: 673, holes: 11, sum holes: 39 */ /* padding: 56 */ /* paddings: 2, sum paddings: 7 */ /* forced alignments: 1 */ Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31ethtool: support FEC settings over netlinkJakub Kicinski1-0/+17
Add FEC API to netlink. This is not a 1-to-1 conversion. FEC settings already depend on link modes to tell user which modes are supported. Take this further an use link modes for manual configuration. Old struct ethtool_fecparam is still used to talk to the drivers, so we need to translate back and forth. We can revisit the internal API if number of FEC encodings starts to grow. Enforce only one active FEC bit (by using a bit position rather than another mask). Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31netfilter: add helper function to set up the nfnetlink header and use itPablo Neira Ayuso1-0/+27
This patch adds a helper function to set up the netlink and nfnetlink headers. Update existing codebase to use it. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: flowtable: dst_check() from garbage collector pathPablo Neira Ayuso1-1/+4
Move dst_check() to the garbage collector path. Stale routes trigger the flow entry teardown state which makes affected flows go back to the classic forwarding path to re-evaluate flow offloading. IPv6 requires the dst cookie to work, store it in the flow_tuple, otherwise dst_check() always fails. Fixes: e5075c0badaa ("netfilter: flowtable: call dst_check() to fall back to classic forwarding") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: ipset: Remove duplicate declarationWan Jiabing1-2/+0
struct ip_set is declared twice. One is declared at 79th line, so remove the duplicate. Signed-off-by: Wan Jiabing <[email protected]> Acked-by: Jozsef Kadlecsik <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: nft_log: perform module load from nf_tablesFlorian Westphal1-0/+5
modprobe calls from the nf_logger_find_get() API causes deadlock in very special cases because they occur with the nf_tables transaction mutex held. In the specific case of nf_log, deadlock is via: A nf_tables -> transaction mutex -> nft_log -> modprobe -> nf_log_syslog \ -> pernet_ops rwsem -> wait for C B netlink event -> rtnl_mutex -> nf_tables transaction mutex -> wait for A C close() -> ip6mr_sk_done -> rtnl_mutex -> wait for B Earlier patch added NFLOG/xt_LOG module softdeps to avoid the need to load the backend module during a transaction. For nft_log we would have to add a softdep for both nfnetlink_log or nf_log_syslog, since we do not know in advance which of the two backends are going to be configured. This defers the modprobe op until after the transaction mutex is released. Tested-by: Phil Sutter <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: nf_log_common: merge with nf_log_syslogFlorian Westphal1-24/+0
Remove nf_log_common. Now that all per-af modules have been merged there is no longer a need to provide a helper module. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: nf_log_bridge: merge with nf_log_syslogFlorian Westphal1-1/+0
Provide bridge log support from nf_log_syslog. After the merge there is no need to load the "real packet loggers", all of them now reside in the same module. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>