aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2020-03-18netfilter: Rename ingress hook include fileLukas Wunner1-1/+1
Prepare for addition of a netfilter egress hook by renaming <linux/netfilter_ingress.h> to <linux/netfilter_netdev.h>. The egress hook also necessitates a refactoring of the include file, but that is done in a separate commit to ease reviewing. No functional change intended. Signed-off-by: Lukas Wunner <[email protected]> Cc: Daniel Borkmann <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-16tcp: fix stretch ACK bugs in YeahPengcheng Yang1-30/+11
Change Yeah to properly handle stretch ACKs in additive increase mode by passing in the count of ACKed packets to tcp_cong_avoid_ai(). In addition, we re-implemented the scalable path using tcp_cong_avoid_ai() and removed the pkts_acked variable. Signed-off-by: Pengcheng Yang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-16tcp: fix stretch ACK bugs in VenoPengcheng Yang1-4/+5
Change Veno to properly handle stretch ACKs in additive increase mode by passing in the count of ACKed packets to tcp_cong_avoid_ai(). Signed-off-by: Pengcheng Yang <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-16tcp: stretch ACK fixes in Veno prepPengcheng Yang1-21/+23
No code logic has been changed in this patch. Signed-off-by: Pengcheng Yang <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-16tcp: fix stretch ACK bugs in ScalablePengcheng Yang1-8/+9
Change Scalable to properly handle stretch ACKs in additive increase mode by passing in the count of ACKed packets to tcp_cong_avoid_ai(). In addition, because we are now precisely accounting for stretch ACKs, including delayed ACKs, we can now change TCP_SCALABLE_AI_CNT to 100. Signed-off-by: Pengcheng Yang <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-16tcp: fix stretch ACK bugs in BICPengcheng Yang1-5/+6
Changes BIC to properly handle stretch ACKs in additive increase mode by passing in the count of ACKed packets to tcp_cong_avoid_ai(). Signed-off-by: Pengcheng Yang <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-16net: kcm: kcmproc.c: Fix RCU list suspicious usage warningMadhuparna Bhowmik1-1/+1
This path fixes the suspicious RCU usage warning reported by kernel test robot. net/kcm/kcmproc.c:#RCU-list_traversed_in_non-reader_section There is no need to use list_for_each_entry_rcu() in kcm_stats_seq_show() as the list is always traversed under knet->mutex held. Reported-by: kernel test robot <[email protected]> Signed-off-by: Madhuparna Bhowmik <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-16net: sched: set the hw_stats_type in pedit loopJiri Pirko1-0/+1
For a single pedit action, multiple offload entries may be used. Set the hw_stats_type to all of them. Fixes: 44f865801741 ("sched: act: allow user to specify type of HW stats for a filter") Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-15net: dsa: warn if phylink_mac_link_state returns errorRussell King1-1/+6
Issue a warning to the kernel log if phylink_mac_link_state() returns an error. This should not occur, but let's make it visible. Signed-off-by: Russell King <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-15netfilter: conntrack: re-visit sysctls in unprivileged namespacesFlorian Westphal1-11/+8
since commit b884fa46177659 ("netfilter: conntrack: unify sysctl handling") conntrack no longer exposes most of its sysctls (e.g. tcp timeouts settings) to network namespaces that are not owned by the initial user namespace. This patch exposes all sysctls even if the namespace is unpriviliged. compared to a 4.19 kernel, the newly visible and writeable sysctls are: net.netfilter.nf_conntrack_acct net.netfilter.nf_conntrack_timestamp .. to allow to enable accouting and timestamp extensions. net.netfilter.nf_conntrack_events .. to turn off conntrack event notifications. net.netfilter.nf_conntrack_checksum .. to disable checksum validation. net.netfilter.nf_conntrack_log_invalid .. to enable logging of packets deemed invalid by conntrack. newly visible sysctls that are only exported as read-only: net.netfilter.nf_conntrack_count .. current number of conntrack entries living in this netns. net.netfilter.nf_conntrack_max .. global upperlimit (maximum size of the table). net.netfilter.nf_conntrack_buckets .. size of the conntrack table (hash buckets). net.netfilter.nf_conntrack_expect_max .. maximum number of permitted expectations in this netns. net.netfilter.nf_conntrack_helper .. conntrack helper auto assignment. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: nft_lookup: update element stateful expressionPablo Neira Ayuso1-0/+1
If the set element comes with an stateful expression, update it. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: nf_tables: add nft_set_elem_update_expr() helper functionPablo Neira Ayuso1-7/+1
This helper function runs the eval path of the stateful expression of an existing set element. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: nf_tables: add elements with stateful expressionsPablo Neira Ayuso1-1/+20
Update nft_add_set_elem() to handle the NFTA_SET_ELEM_EXPR netlink attribute. This patch allows users to to add elements with stateful expressions. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: nf_tables: statify nft_expr_init()Pablo Neira Ayuso1-2/+2
Not exposed anymore to modules, statify this function. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: nf_tables: add nft_set_elem_expr_alloc()Pablo Neira Ayuso2-13/+32
Add helper function to create stateful expression. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15nft_set_pipapo: Prepare for single ranged field usageStefano Brivio3-8/+17
A few adjustments in nft_pipapo_init() are needed to allow usage of this set back-end for a single, ranged field. Provide a convenient NFT_PIPAPO_MIN_FIELDS definition that currently makes sure that the rbtree back-end is selected instead, for sets with a single field. This finally allows a fair comparison with rbtree sets, by defining NFT_PIPAPO_MIN_FIELDS as 0 and skipping rbtree back-end initialisation: ---------------.--------------------------.-------------------------. AMD Epyc 7402 | baselines, Mpps | Mpps, % over rbtree | 1 thread |__________________________|_________________________| 3.35GHz | | | | | | 768KiB L1D$ | netdev | hash | rbtree | | pipapo | ---------------| hook | no | single | pipapo |single field| type entries | drop | ranges | field |single field| AVX2 | ---------------|--------|--------|--------|------------|------------| net,port | | | | | | 1000 | 19.0 | 10.4 | 3.8 | 6.0 +58% | 9.6 +153% | ---------------|--------|--------|--------|------------|------------| port,net | | | | | | 100 | 18.8 | 10.3 | 5.8 | 9.1 +57% |11.6 +100% | ---------------|--------|--------|--------|------------|------------| net6,port | | | | | | 1000 | 16.4 | 7.6 | 1.8 | 2.8 +55% | 6.5 +261% | ---------------|--------|--------|--------|------------|------------| port,proto | | | | [1] | [1] | 30000 | 19.6 | 11.6 | 3.9 | 0.9 -77% | 2.7 -31% | ---------------|--------|--------|--------|------------|------------| port,proto | | | | | | 10000 | 19.6 | 11.6 | 4.4 | 2.1 -52% | 5.6 +27% | ---------------|--------|--------|--------|------------|------------| port,proto | | | | | | 4 threads 10000| 77.9 | 45.1 | 17.4 | 8.3 -52% |22.4 +29% | ---------------|--------|--------|--------|------------|------------| net6,port,mac | | | | | | 10 | 16.5 | 5.4 | 4.3 | 4.5 +5% | 8.2 +91% | ---------------|--------|--------|--------|------------|------------| net6,port,mac, | | | | | | proto 1000 | 16.5 | 5.7 | 1.9 | 2.8 +47% | 6.6 +247% | ---------------|--------|--------|--------|------------|------------| net,mac | | | | | | 1000 | 19.0 | 8.4 | 3.9 | 6.0 +54% | 9.9 +154% | ---------------'--------'--------'--------'------------'------------' [1] Causes switch of lookup table buckets for 'port' to 4-bit groups Signed-off-by: Stefano Brivio <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15nft_set_pipapo: Introduce AVX2-based lookup implementationStefano Brivio5-0/+1269
If the AVX2 set is available, we can exploit the repetitive characteristic of this algorithm to provide a fast, vectorised version by using 256-bit wide AVX2 operations for bucket loads and bitwise intersections. In most cases, this implementation consistently outperforms rbtree set instances despite the fact they are configured to use a given, single, ranged data type out of the ones used for performance measurements by the nft_concat_range.sh kselftest. That script, injecting packets directly on the ingoing device path with pktgen, reports, averaged over five runs on a single AMD Epyc 7402 thread (3.35GHz, 768 KiB L1D$, 12 MiB L2$), the figures below. CONFIG_RETPOLINE was not set here. Note that this is not a fair comparison over hash and rbtree set types: non-ranged entries (used to have a reference for hash types) would be matched faster than this, and matching on a single field only (which is the case for rbtree) is also significantly faster. However, it's not possible at the moment to choose this set type for non-ranged entries, and the current implementation also needs a few minor adjustments in order to match on less than two fields. ---------------.-----------------------------------.------------. AMD Epyc 7402 | baselines, Mpps | this patch | 1 thread |___________________________________|____________| 3.35GHz | | | | | | 768KiB L1D$ | netdev | hash | rbtree | | | ---------------| hook | no | single | | pipapo | type entries | drop | ranges | field | pipapo | AVX2 | ---------------|--------|--------|--------|--------|------------| net,port | | | | | | 1000 | 19.0 | 10.4 | 3.8 | 4.0 | 7.5 +87% | ---------------|--------|--------|--------|--------|------------| port,net | | | | | | 100 | 18.8 | 10.3 | 5.8 | 6.3 | 8.1 +29% | ---------------|--------|--------|--------|--------|------------| net6,port | | | | | | 1000 | 16.4 | 7.6 | 1.8 | 2.1 | 4.8 +128% | ---------------|--------|--------|--------|--------|------------| port,proto | | | | | | 30000 | 19.6 | 11.6 | 3.9 | 0.5 | 2.6 +420% | ---------------|--------|--------|--------|--------|------------| net6,port,mac | | | | | | 10 | 16.5 | 5.4 | 4.3 | 3.4 | 4.7 +38% | ---------------|--------|--------|--------|--------|------------| net6,port,mac, | | | | | | proto 1000 | 16.5 | 5.7 | 1.9 | 1.4 | 3.6 +26% | ---------------|--------|--------|--------|--------|------------| net,mac | | | | | | 1000 | 19.0 | 8.4 | 3.9 | 2.5 | 6.4 +156% | ---------------'--------'--------'--------'--------'------------' A similar strategy could be easily reused to implement specialised versions for other SIMD sets, and I plan to post at least a NEON version at a later time. Signed-off-by: Stefano Brivio <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15nft_set_pipapo: Prepare for vectorised implementation: helpersStefano Brivio2-261/+285
Move most macros and helpers to a header file, so that they can be conveniently used by related implementations. No functional changes are intended here. Signed-off-by: Stefano Brivio <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15nft_set_pipapo: Prepare for vectorised implementation: alignmentStefano Brivio1-25/+110
SIMD vector extension sets require stricter alignment than native instruction sets to operate efficiently (AVX, NEON) or for some instructions to work at all (AltiVec). Provide facilities to define arbitrary alignment for lookup tables and scratch maps. By defining byte alignment with NFT_PIPAPO_ALIGN, lt_aligned and scratch_aligned pointers become available. Additional headroom is allocated, and pointers to the possibly unaligned, originally allocated areas are kept so that they can be freed. Signed-off-by: Stefano Brivio <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15nft_set_pipapo: Add support for 8-bit lookup groups and dynamic switchStefano Brivio1-8/+233
While grouping matching bits in groups of four saves memory compared to the more natural choice of 8-bit words (lookup table size is one eighth), it comes at a performance cost, as the number of lookup comparisons is doubled, and those also needs bitshifts and masking. Introduce support for 8-bit lookup groups, together with a mapping mechanism to dynamically switch, based on defined per-table size thresholds and hysteresis, between 8-bit and 4-bit groups, as tables grow and shrink. Empty sets start with 8-bit groups, and per-field tables are converted to 4-bit groups if they get too big. An alternative approach would have been to swap per-set lookup operation functions as needed, but this doesn't allow for different group sizes in the same set, which looks desirable if some fields need significantly more matching data compared to others due to heavier impact of ranges (e.g. a big number of subnets with relatively simple port specifications). Allowing different group sizes for the same lookup functions implies the need for further conditional clauses, whose cost, however, appears to be negligible in tests. The matching rate figures below were obtained for x86_64 running the nft_concat_range.sh "performance" cases, averaged over five runs, on a single thread of an AMD Epyc 7402 CPU, and for aarch64 on a single thread of a BCM2711 (Raspberry Pi 4 Model B 4GB), clocked at a stable 2147MHz frequency: ---------------.-----------------------------------.------------. AMD Epyc 7402 | baselines, Mpps | this patch | 1 thread |___________________________________|____________| 3.35GHz | | | | | | 768KiB L1D$ | netdev | hash | rbtree | | | ---------------| hook | no | single | pipapo | pipapo | type entries | drop | ranges | field | 4 bits | bit switch | ---------------|--------|--------|--------|--------|------------| net,port | | | | | | 1000 | 19.0 | 10.4 | 3.8 | 2.8 | 4.0 +43% | ---------------|--------|--------|--------|--------|------------| port,net | | | | | | 100 | 18.8 | 10.3 | 5.8 | 5.5 | 6.3 +14% | ---------------|--------|--------|--------|--------|------------| net6,port | | | | | | 1000 | 16.4 | 7.6 | 1.8 | 1.3 | 2.1 +61% | ---------------|--------|--------|--------|--------|------------| port,proto | | | | | [1] | 30000 | 19.6 | 11.6 | 3.9 | 0.3 | 0.5 +66% | ---------------|--------|--------|--------|--------|------------| net6,port,mac | | | | | | 10 | 16.5 | 5.4 | 4.3 | 2.6 | 3.4 +31% | ---------------|--------|--------|--------|--------|------------| net6,port,mac, | | | | | | proto 1000 | 16.5 | 5.7 | 1.9 | 1.0 | 1.4 +40% | ---------------|--------|--------|--------|--------|------------| net,mac | | | | | | 1000 | 19.0 | 8.4 | 3.9 | 1.7 | 2.5 +47% | ---------------'--------'--------'--------'--------'------------' [1] Causes switch of lookup table buckets for 'port', not 'proto', to 4-bit groups ---------------.-----------------------------------.------------. BCM2711 | baselines, Mpps | this patch | 1 thread |___________________________________|____________| 2147MHz | | | | | | 32KiB L1D$ | netdev | hash | rbtree | | | ---------------| hook | no | single | pipapo | pipapo | type entries | drop | ranges | field | 4 bits | bit switch | ---------------|--------|--------|--------|--------|------------| net,port | | | | | | 1000 | 1.63 | 1.37 | 0.87 | 0.61 | 0.70 +17% | ---------------|--------|--------|--------|--------|------------| port,net | | | | | | 100 | 1.64 | 1.36 | 1.02 | 0.78 | 0.81 +4% | ---------------|--------|--------|--------|--------|------------| net6,port | | | | | | 1000 | 1.56 | 1.27 | 0.65 | 0.34 | 0.50 +47% | ---------------|--------|--------|--------|--------|------------| port,proto [2] | | | | | | 10000 | 1.68 | 1.43 | 0.84 | 0.30 | 0.40 +13% | ---------------|--------|--------|--------|--------|------------| net6,port,mac | | | | | | 10 | 1.56 | 1.14 | 1.02 | 0.62 | 0.66 +6% | ---------------|--------|--------|--------|--------|------------| net6,port,mac, | | | | | | proto 1000 | 1.56 | 1.12 | 0.64 | 0.27 | 0.40 +48% | ---------------|--------|--------|--------|--------|------------| net,mac | | | | | | 1000 | 1.63 | 1.26 | 0.87 | 0.41 | 0.53 +29% | ---------------'--------'--------'--------'--------'------------' [2] Using 10000 entries instead of 30000 as it would take way too long for the test script to generate all of them Signed-off-by: Stefano Brivio <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15nft_set_pipapo: Generalise group size for bucketsStefano Brivio1-96/+112
Get rid of all hardcoded assumptions that buckets in lookup tables correspond to four-bit groups, and replace them with appropriate calculations based on a variable group size, now stored in struct field. The group size could now be in principle any divisor of eight. Note, though, that lookup and get functions need an implementation intimately depending on the group size, and the only supported size there, currently, is four bits, which is also the initial and only used size at the moment. While at it, drop 'groups' from struct nft_pipapo: it was never used. Signed-off-by: Stefano Brivio <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: flowtable: add tunnel encap/decap action offload supportwenxu1-0/+45
This patch add tunnel encap decap action offload in the flowtable offload. Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: flowtable: add tunnel match offload supportwenxu1-2/+59
This patch support both ipv4 and ipv6 tunnel_id, tunnel_src and tunnel_dst match for flowtable offload Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: flowtable: add indr block setup supportwenxu1-4/+90
Add etfilter flowtable support indr-block setup. It makes flowtable offload vlan and tunnel device. Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: flowtable: add nf_flow_table_block_offload_init()wenxu1-8/+17
Add nf_flow_table_block_offload_init prepare for the indr block offload patch Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: xt_IDLETIMER: clean up some indentingDan Carpenter1-4/+3
These lines were indented wrong so Smatch complained. net/netfilter/xt_IDLETIMER.c:81 idletimer_tg_show() warn: inconsistent indenting Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: bitwise: use more descriptive variable-names.Jeremy Sowden1-7/+7
Name the mask and xor data variables, "mask" and "xor," instead of "d1" and "d2." Signed-off-by: Jeremy Sowden <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: Replace zero-length array with flexible-array memberGustavo A. R. Silva12-17/+17
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] Lastly, fix checkpatch.pl warning WARNING: __aligned(size) is preferred over __attribute__((aligned(size))) in net/bridge/netfilter/ebtables.c This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: nft_set_pipapo: make the symbol 'nft_pipapo_get' staticChen Wandun1-2/+2
Fix the following sparse warning: net/netfilter/nft_set_pipapo.c:739:6: warning: symbol 'nft_pipapo_get' was not declared. Should it be static? Fixes: 3c4287f62044 ("nf_tables: Add set type for arbitrary concatenation of ranges") Signed-off-by: Chen Wandun <[email protected]> Acked-by: Stefano Brivio <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: cleanup unused macroLi RongQing2-3/+0
TEMPLATE_NULLS_VAL is not used after commit 0838aa7fcfcd ("netfilter: fix netns dependencies with conntrack templates") PFX is not used after commit 8bee4bad03c5b ("netfilter: xt extensions: use pr_<level>") Signed-off-by: Li RongQing <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: nf_tables: make all set structs constFlorian Westphal5-24/+8
They do not need to be writeable anymore. v2: remove left-over __read_mostly annotation in set_pipapo.c (Stefano) Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: nf_tables: make sets built-inFlorian Westphal4-74/+15
Placing nftables set support in an extra module is pointless: 1. nf_tables needs dynamic registeration interface for sake of one module 2. nft heavily relies on sets, e.g. even simple rule like "nft ... tcp dport { 80, 443 }" will not work with _SETS=n. IOW, either nftables isn't used or both nf_tables and nf_tables_set modules are needed anyway. With extra module: 307K net/netfilter/nf_tables.ko 79K net/netfilter/nf_tables_set.ko text data bss dec filename 146416 3072 545 150033 nf_tables.ko 35496 1817 0 37313 nf_tables_set.ko This patch: 373K net/netfilter/nf_tables.ko 178563 4049 545 183157 nf_tables.ko Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: nft_tunnel: add support for geneve optsXin Long1-12/+98
Like vxlan and erspan opts, geneve opts should also be supported in nft_tunnel. The difference is geneve RFC (draft-ietf-nvo3-geneve-14) allows a geneve packet to carry multiple geneve opts. So with this patch, nftables/libnftnl would do: # nft add table ip filter # nft add chain ip filter input { type filter hook input priority 0 \; } # nft add tunnel filter geneve_02 { type geneve\; id 2\; \ ip saddr 192.168.1.1\; ip daddr 192.168.1.2\; \ sport 9000\; dport 9001\; dscp 1234\; ttl 64\; flags 1\; \ opts \"1:1:34567890,2:2:12121212,3:3:1212121234567890\"\; } # nft list tunnels table filter table ip filter { tunnel geneve_02 { id 2 ip saddr 192.168.1.1 ip daddr 192.168.1.2 sport 9000 dport 9001 tos 18 ttl 64 flags 1 geneve opts 1:1:34567890,2:2:12121212,3:3:1212121234567890 } } v1->v2: - no changes, just post it separately. Signed-off-by: Xin Long <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: xtables: Add snapshot of hardidletimer targetManoj Basapathi1-12/+235
This is a snapshot of hardidletimer netfilter target. This patch implements a hardidletimer Xtables target that can be used to identify when interfaces have been idle for a certain period of time. Timers are identified by labels and are created when a rule is set with a new label. The rules also take a timeout value (in seconds) as an option. If more than one rule uses the same timer label, the timer will be restarted whenever any of the rules get a hit. One entry for each timer is created in sysfs. This attribute contains the timer remaining for the timer to expire. The attributes are located under the xt_idletimer class: /sys/class/xt_idletimer/timers/<label> When the timer expires, the target module sends a sysfs notification to the userspace, which can then decide what to do (eg. disconnect to save power) Compared to IDLETIMER, HARDIDLETIMER can send notifications when CPU is in suspend too, to notify the timer expiry. v1->v2: Moved all functionality into IDLETIMER module to avoid code duplication per comment from Florian. Signed-off-by: Manoj Basapathi <[email protected]> Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: flowtable: Use nf_flow_offload_tuple for stats as wellPaul Blakey1-17/+9
This patch doesn't change any functionality. Signed-off-by: Paul Blakey <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15mptcp: drop unneeded checksPaolo Abeni2-23/+9
After the previous patch subflow->conn is always != NULL and is never changed. We can drop a bunch of now unneeded checks. v1 -> v2: - rebased on top of commit 2398e3991bda ("mptcp: always include dack if possible.") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-15mptcp: create msk earlyPaolo Abeni4-80/+70
This change moves the mptcp socket allocation from mptcp_accept() to subflow_syn_recv_sock(), so that subflow->conn is now always set for the non fallback scenario. It allows cleaning up a bit mptcp_accept() reducing the additional locking and will allow fourther cleanup in the next patch. Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-15tipc: add NULL pointer check to prevent kernel oopsHoang Le1-2/+10
Calling: tipc_node_link_down()-> - tipc_node_write_unlock()->tipc_mon_peer_down() - tipc_mon_peer_down() just after disabling bearer could be caused kernel oops. Fix this by adding a sanity check to make sure valid memory access. Acked-by: Jon Maloy <[email protected]> Signed-off-by: Hoang Le <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-15tipc: simplify trivial boolean returnHoang Le1-3/+0
Checking and returning 'true' boolean is useless as it will be returning at end of function Signed-off-by: Hoang Le <[email protected]> Acked-by: Ying Xue <[email protected]> Acked-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-14net: sched: RED: Introduce an ECN nodrop modePetr Machata1-6/+25
When the RED Qdisc is currently configured to enable ECN, the RED algorithm is used to decide whether a certain SKB should be marked. If that SKB is not ECN-capable, it is early-dropped. It is also possible to keep all traffic in the queue, and just mark the ECN-capable subset of it, as appropriate under the RED algorithm. Some switches support this mode, and some installations make use of it. To that end, add a new RED flag, TC_RED_NODROP. When the Qdisc is configured with this flag, non-ECT traffic is enqueued instead of being early-dropped. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-14net: sched: Allow extending set of supported RED flagsPetr Machata1-3/+40
The qdiscs RED, GRED, SFQ and CHOKE use different subsets of the same pool of global RED flags. These are passed in tc_red_qopt.flags. However none of these qdiscs validate the flag field, and just copy it over wholesale to internal structures, and later dump it back. (An exception is GRED, which does validate for VQs -- however not for the main setup.) A broken userspace can therefore configure a qdisc with arbitrary unsupported flags, and later expect to see the flags on qdisc dump. The current ABI therefore allows storage of several bits of custom data to qdisc instances of the types mentioned above. How many bits, depends on which flags are meaningful for the qdisc in question. E.g. SFQ recognizes flags ECN and HARDDROP, and the rest is not interpreted. If SFQ ever needs to support ADAPTATIVE, it needs another way of doing it, and at the same time it needs to retain the possibility to store 6 bits of uninterpreted data. Likewise RED, which adds a new flag later in this patchset. To that end, this patch adds a new function, red_get_flags(), to split the passed flags of RED-like qdiscs to flags and user bits, and red_validate_flags() to validate the resulting configuration. It further adds a new attribute, TCA_RED_FLAGS, to pass arbitrary flags. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-14Bluetooth: L2CAP: remove set but not used variable 'credits'YueHaibing1-2/+1
net/bluetooth/l2cap_core.c: In function l2cap_ecred_conn_req: net/bluetooth/l2cap_core.c:5848:6: warning: variable credits set but not used [-Wunused-but-set-variable] commit 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode") involved this unused variable, remove it. Reported-by: Hulk Robot <[email protected]> Signed-off-by: YueHaibing <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-03-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller7-160/+301
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-03-13 The following pull-request contains BPF updates for your *net-next* tree. We've added 86 non-merge commits during the last 12 day(s) which contain a total of 107 files changed, 5771 insertions(+), 1700 deletions(-). The main changes are: 1) Add modify_return attach type which allows to attach to a function via BPF trampoline and is run after the fentry and before the fexit programs and can pass a return code to the original caller, from KP Singh. 2) Generalize BPF's kallsyms handling and add BPF trampoline and dispatcher objects to be visible in /proc/kallsyms so they can be annotated in stack traces, from Jiri Olsa. 3) Extend BPF sockmap to allow for UDP next to existing TCP support in order in order to enable this for BPF based socket dispatch, from Lorenz Bauer. 4) Introduce a new bpftool 'prog profile' command which attaches to existing BPF programs via fentry and fexit hooks and reads out hardware counters during that period, from Song Liu. Example usage: bpftool prog profile id 337 duration 3 cycles instructions llc_misses 4228 run_cnt 3403698 cycles (84.08%) 3525294 instructions # 1.04 insn per cycle (84.05%) 13 llc_misses # 3.69 LLC misses per million isns (83.50%) 5) Batch of improvements to libbpf, bpftool and BPF selftests. Also addition of a new bpf_link abstraction to keep in particular BPF tracing programs attached even when the applicaion owning them exits, from Andrii Nakryiko. 6) New bpf_get_current_pid_tgid() helper for tracing to perform PID filtering and which returns the PID as seen by the init namespace, from Carlos Neira. 7) Refactor of RISC-V JIT code to move out common pieces and addition of a new RV32G BPF JIT compiler, from Luke Nelson. 8) Add gso_size context member to __sk_buff in order to be able to know whether a given skb is GSO or not, from Willem de Bruijn. 9) Add a new bpf_xdp_output() helper which reuses XDP's existing perf RB output implementation but can be called from tracepoint programs, from Eelco Chaudron. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-03-13bpf: Add bpf_trampoline_ name prefix for DECLARE_BPF_DISPATCHERBjörn Töpel1-3/+2
Adding bpf_trampoline_ name prefix for DECLARE_BPF_DISPATCHER, so all the dispatchers have the common name prefix. And also a small '_' cleanup for bpf_dispatcher_nopfunc function name. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2020-03-13ethtool: fix spelling mistake "exceeeds" -> "exceeds"Colin Ian King2-2/+2
There are a couple of spelling mistakes in NL_SET_ERR_MSG_ATTR messages. Fix these. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller39-118/+305
Minor overlapping changes, nothing serious. Signed-off-by: David S. Miller <[email protected]>
2020-03-12bpf: Add bpf_xdp_output() helperEelco Chaudron1-1/+15
Introduce new helper that reuses existing xdp perf_event output implementation, but can be called from raw_tracepoint programs that receive 'struct xdp_buff *' as a tracepoint argument. Signed-off-by: Eelco Chaudron <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Acked-by: Toke Høiland-Jørgensen <[email protected]> Link: https://lore.kernel.org/bpf/158348514556.2239.11050972434793741444.stgit@xdp-tutorial
2020-03-12inet: Use fallthrough;Joe Perches26-43/+41
Convert the various uses of fallthrough comments to fallthrough; Done via script Link: https://lore.kernel.org/lkml/b56602fcf79f849e733e7b521bb0e17895d390fa.1582230379.git.joe@perches.com/ And by hand: net/ipv6/ip6_fib.c has a fallthrough comment outside of an #ifdef block that causes gcc to emit a warning if converted in-place. So move the new fallthrough; inside the containing #ifdef/#endif too. Signed-off-by: Joe Perches <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: add CHANNELS_NTF notificationMichal Kubecek3-1/+10
Send ETHTOOL_MSG_CHANNELS_NTF notification whenever channel counts of a network device are modified using ETHTOOL_MSG_CHANNELS_SET netlink message or ETHTOOL_SCHANNELS ioctl request. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12ethtool: set device channel counts with CHANNELS_SET requestMichal Kubecek6-31/+154
Implement CHANNELS_SET netlink request to set channel counts of a network device. These are traditionally set with ETHTOOL_SCHANNELS ioctl request. Like the ioctl implementation, the generic ethtool code checks if supplied values do not exceed driver defined limits; if they do, first offending attribute is reported using extack. Checks preventing removing channels used for RX indirection table or zerocopy AF_XDP socket are also implemented. Move ethtool_get_max_rxfh_channel() helper into common.c so that it can be used by both ioctl and netlink code. v2: - fix netdev reference leak in error path (found by Jakub Kicinsky) Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>