aboutsummaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2020-03-15nft_set_pipapo: Introduce AVX2-based lookup implementationStefano Brivio1-0/+1
If the AVX2 set is available, we can exploit the repetitive characteristic of this algorithm to provide a fast, vectorised version by using 256-bit wide AVX2 operations for bucket loads and bitwise intersections. In most cases, this implementation consistently outperforms rbtree set instances despite the fact they are configured to use a given, single, ranged data type out of the ones used for performance measurements by the nft_concat_range.sh kselftest. That script, injecting packets directly on the ingoing device path with pktgen, reports, averaged over five runs on a single AMD Epyc 7402 thread (3.35GHz, 768 KiB L1D$, 12 MiB L2$), the figures below. CONFIG_RETPOLINE was not set here. Note that this is not a fair comparison over hash and rbtree set types: non-ranged entries (used to have a reference for hash types) would be matched faster than this, and matching on a single field only (which is the case for rbtree) is also significantly faster. However, it's not possible at the moment to choose this set type for non-ranged entries, and the current implementation also needs a few minor adjustments in order to match on less than two fields. ---------------.-----------------------------------.------------. AMD Epyc 7402 | baselines, Mpps | this patch | 1 thread |___________________________________|____________| 3.35GHz | | | | | | 768KiB L1D$ | netdev | hash | rbtree | | | ---------------| hook | no | single | | pipapo | type entries | drop | ranges | field | pipapo | AVX2 | ---------------|--------|--------|--------|--------|------------| net,port | | | | | | 1000 | 19.0 | 10.4 | 3.8 | 4.0 | 7.5 +87% | ---------------|--------|--------|--------|--------|------------| port,net | | | | | | 100 | 18.8 | 10.3 | 5.8 | 6.3 | 8.1 +29% | ---------------|--------|--------|--------|--------|------------| net6,port | | | | | | 1000 | 16.4 | 7.6 | 1.8 | 2.1 | 4.8 +128% | ---------------|--------|--------|--------|--------|------------| port,proto | | | | | | 30000 | 19.6 | 11.6 | 3.9 | 0.5 | 2.6 +420% | ---------------|--------|--------|--------|--------|------------| net6,port,mac | | | | | | 10 | 16.5 | 5.4 | 4.3 | 3.4 | 4.7 +38% | ---------------|--------|--------|--------|--------|------------| net6,port,mac, | | | | | | proto 1000 | 16.5 | 5.7 | 1.9 | 1.4 | 3.6 +26% | ---------------|--------|--------|--------|--------|------------| net,mac | | | | | | 1000 | 19.0 | 8.4 | 3.9 | 2.5 | 6.4 +156% | ---------------'--------'--------'--------'--------'------------' A similar strategy could be easily reused to implement specialised versions for other SIMD sets, and I plan to post at least a NEON version at a later time. Signed-off-by: Stefano Brivio <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: flowtable: add tunnel match offload supportwenxu1-0/+6
This patch support both ipv4 and ipv6 tunnel_id, tunnel_src and tunnel_dst match for flowtable offload Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: Replace zero-length array with flexible-array memberGustavo A. R. Silva3-5/+5
The current codebase makes use of the zero-length array language extension to the C90 standard, but the preferred mechanism to declare variable-length types such as these ones is a flexible array member[1][2], introduced in C99: struct foo { int stuff; struct boo array[]; }; By making use of the mechanism above, we will get a compiler warning in case the flexible array does not occur last in the structure, which will help us prevent some kind of undefined behavior bugs from being inadvertently introduced[3] to the codebase from now on. Also, notice that, dynamic memory allocations won't be affected by this change: "Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero."[1] Lastly, fix checkpatch.pl warning WARNING: __aligned(size) is preferred over __attribute__((aligned(size))) in net/bridge/netfilter/ebtables.c This issue was found with the help of Coccinelle. [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html [2] https://github.com/KSPP/linux/issues/21 [3] commit 76497732932f ("cxgb3/l2t: Fix undefined behaviour") Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: nf_tables: make all set structs constFlorian Westphal2-10/+6
They do not need to be writeable anymore. v2: remove left-over __read_mostly annotation in set_pipapo.c (Stefano) Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-15netfilter: nf_tables: make sets built-inFlorian Westphal1-6/+0
Placing nftables set support in an extra module is pointless: 1. nf_tables needs dynamic registeration interface for sake of one module 2. nft heavily relies on sets, e.g. even simple rule like "nft ... tcp dport { 80, 443 }" will not work with _SETS=n. IOW, either nftables isn't used or both nf_tables and nf_tables_set modules are needed anyway. With extra module: 307K net/netfilter/nf_tables.ko 79K net/netfilter/nf_tables_set.ko text data bss dec filename 146416 3072 545 150033 nf_tables.ko 35496 1817 0 37313 nf_tables_set.ko This patch: 373K net/netfilter/nf_tables.ko 178563 4049 545 183157 nf_tables.ko Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-03-14net: sched: RED: Introduce an ECN nodrop modePetr Machata2-0/+6
When the RED Qdisc is currently configured to enable ECN, the RED algorithm is used to decide whether a certain SKB should be marked. If that SKB is not ECN-capable, it is early-dropped. It is also possible to keep all traffic in the queue, and just mark the ECN-capable subset of it, as appropriate under the RED algorithm. Some switches support this mode, and some installations make use of it. To that end, add a new RED flag, TC_RED_NODROP. When the Qdisc is configured with this flag, non-ECT traffic is enqueued instead of being early-dropped. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-14net: sched: Allow extending set of supported RED flagsPetr Machata1-0/+33
The qdiscs RED, GRED, SFQ and CHOKE use different subsets of the same pool of global RED flags. These are passed in tc_red_qopt.flags. However none of these qdiscs validate the flag field, and just copy it over wholesale to internal structures, and later dump it back. (An exception is GRED, which does validate for VQs -- however not for the main setup.) A broken userspace can therefore configure a qdisc with arbitrary unsupported flags, and later expect to see the flags on qdisc dump. The current ABI therefore allows storage of several bits of custom data to qdisc instances of the types mentioned above. How many bits, depends on which flags are meaningful for the qdisc in question. E.g. SFQ recognizes flags ECN and HARDDROP, and the rest is not interpreted. If SFQ ever needs to support ADAPTATIVE, it needs another way of doing it, and at the same time it needs to retain the possibility to store 6 bits of uninterpreted data. Likewise RED, which adds a new flag later in this patchset. To that end, this patch adds a new function, red_get_flags(), to split the passed flags of RED-like qdiscs to flags and user bits, and red_validate_flags() to validate the resulting configuration. It further adds a new attribute, TCA_RED_FLAGS, to pass arbitrary flags. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller3-9/+22
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-03-13 The following pull-request contains BPF updates for your *net-next* tree. We've added 86 non-merge commits during the last 12 day(s) which contain a total of 107 files changed, 5771 insertions(+), 1700 deletions(-). The main changes are: 1) Add modify_return attach type which allows to attach to a function via BPF trampoline and is run after the fentry and before the fexit programs and can pass a return code to the original caller, from KP Singh. 2) Generalize BPF's kallsyms handling and add BPF trampoline and dispatcher objects to be visible in /proc/kallsyms so they can be annotated in stack traces, from Jiri Olsa. 3) Extend BPF sockmap to allow for UDP next to existing TCP support in order in order to enable this for BPF based socket dispatch, from Lorenz Bauer. 4) Introduce a new bpftool 'prog profile' command which attaches to existing BPF programs via fentry and fexit hooks and reads out hardware counters during that period, from Song Liu. Example usage: bpftool prog profile id 337 duration 3 cycles instructions llc_misses 4228 run_cnt 3403698 cycles (84.08%) 3525294 instructions # 1.04 insn per cycle (84.05%) 13 llc_misses # 3.69 LLC misses per million isns (83.50%) 5) Batch of improvements to libbpf, bpftool and BPF selftests. Also addition of a new bpf_link abstraction to keep in particular BPF tracing programs attached even when the applicaion owning them exits, from Andrii Nakryiko. 6) New bpf_get_current_pid_tgid() helper for tracing to perform PID filtering and which returns the PID as seen by the init namespace, from Carlos Neira. 7) Refactor of RISC-V JIT code to move out common pieces and addition of a new RV32G BPF JIT compiler, from Luke Nelson. 8) Add gso_size context member to __sk_buff in order to be able to know whether a given skb is GSO or not, from Willem de Bruijn. 9) Add a new bpf_xdp_output() helper which reuses XDP's existing perf RB output implementation but can be called from tracepoint programs, from Eelco Chaudron. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-03-13afs: Fix client call Rx-phase signal handlingDavid Howells1-3/+1
Fix the handling of signals in client rxrpc calls made by the afs filesystem. Ignore signals completely, leaving call abandonment or connection loss to be detected by timeouts inside AF_RXRPC. Allowing a filesystem call to be interrupted after the entire request has been transmitted and an abort sent means that the server may or may not have done the action - and we don't know. It may even be worse than that for older servers. Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Signed-off-by: David Howells <[email protected]>
2020-03-13rxrpc: Fix call interruptibility handlingDavid Howells1-1/+7
Fix the interruptibility of kernel-initiated client calls so that they're either only interruptible when they're waiting for a call slot to come available or they're not interruptible at all. Either way, they're not interruptible during transmission. This should help prevent StoreData calls from being interrupted when writeback is in progress. It doesn't, however, handle interruption during the receive phase. Userspace-initiated calls are still interruptable. After the signal has been handled, sendmsg() will return the amount of data copied out of the buffer and userspace can perform another sendmsg() call to continue transmission. Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals") Signed-off-by: David Howells <[email protected]>
2020-03-13Merge tag 'ieee802154-for-davem-2020-03-13' of ↵David S. Miller1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next Stefan Schmidt says: ==================== pull-request: ieee802154-next 2020-03-13 An update from ieee802154 for *net-next* Two small patches with updates targeting the whole tree. Sergin does update SPI drivers to the new transfer delay handling and Gustavo did one of his zero-length array replacement patches. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-03-12Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller1-0/+1
Minor overlapping changes, nothing serious. Signed-off-by: David S. Miller <[email protected]>
2020-03-13net: drop_monitor: use IS_REACHABLE() to guard net_dm_hw_report()Masahiro Yamada1-1/+1
In net/Kconfig, NET_DEVLINK implies NET_DROP_MONITOR. The original behavior of the 'imply' keyword prevents NET_DROP_MONITOR from being 'm' when NET_DEVLINK=y. With the planned Kconfig change that relaxes the 'imply', the combination of NET_DEVLINK=y and NET_DROP_MONITOR=m would be allowed. Use IS_REACHABLE() to avoid the vmlinux link error for this case. Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Neil Horman <[email protected]>
2020-03-12flow_offload: Add flow_match_ct to get rule ct matchPaul Blakey1-0/+6
Add relevant getter for ct info dissector. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/sched: act_ct: Enable hardware offload of flow table entiresPaul Blakey2-0/+11
Pass the zone's flow table instance on the flow action to the drivers. Thus, allowing drivers to register FT add/del/stats callbacks. Finally, enable hardware offload on the flow table instance. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/sched: act_ct: Support refreshing the flow table entriesPaul Blakey1-0/+3
If driver deleted an FT entry, a FT failed to offload, or registered to the flow table after flows were already added, we still get packets in software. For those packets, while restoring the ct state from the flow table entry, refresh it's hardware offload. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/sched: act_ct: Support restoring conntrack info on skbsPaul Blakey2-0/+8
Provide an API to restore the ct state pointer. This may be used by drivers to restore the ct state if they miss in tc chain after they already did the hardware connection tracking action (ct_metadata action). For example, consider the following rule on chain 0 that is in_hw, however chain 1 is not_in_hw: $ tc filter add dev ... chain 0 ... \ flower ... action ct pipe action goto chain 1 Packets of a flow offloaded (via nf flow table offload) by the driver hit this rule in hardware, will be marked with the ct metadata action (mark, label, zone) that does the equivalent of the software ct action, and when the packet jumps to hardware chain 1, there would be a miss. CT was already processed in hardware. Therefore, the driver's miss handling should restore the ct state on the skb, using the provided API, and continue the packet processing in chain 1. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12net/sched: act_ct: Instantiate flow table entry actionsPaul Blakey2-0/+28
NF flow table API associate 5-tuple rule with an action list by calling the flow table type action() CB to fill the rule's actions. In action CB of act_ct, populate the ct offload entry actions with a new ct_metadata action. Initialize the ct_metadata with the ct mark, label and zone information. If ct nat was performed, then also append the relevant packet mangle actions (e.g. ipv4/ipv6/tcp/udp header rewrites). Drivers that offload the ft entries may match on the 5-tuple and perform the action list. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Reviewed-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12netfilter: flowtable: Add API for registering to flow table eventsPaul Blakey1-0/+6
Let drivers to add their cb allowing them to receive flow offload events of type TC_SETUP_CLSFLOWER (REPLACE/DEL/STATS) for flows managed by the flow table. Signed-off-by: Paul Blakey <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12Merge branch 'ct-offload' of ↵David S. Miller2-0/+16
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
2020-03-12tcp: bind(0) remove the SO_REUSEADDR restriction when ephemeral ports are ↵Kuniyuki Iwashima1-0/+1
exhausted. Commit aacd9289af8b82f5fb01bcdd53d0e3406d1333c7 ("tcp: bind() use stronger condition for bind_conflict") introduced a restriction to forbid to bind SO_REUSEADDR enabled sockets to the same (addr, port) tuple in order to assign ports dispersedly so that we can connect to the same remote host. The change results in accelerating port depletion so that we fail to bind sockets to the same local port even if we want to connect to the different remote hosts. You can reproduce this issue by following instructions below. 1. # sysctl -w net.ipv4.ip_local_port_range="32768 32768" 2. set SO_REUSEADDR to two sockets. 3. bind two sockets to (localhost, 0) and the latter fails. Therefore, when ephemeral ports are exhausted, bind(0) should fallback to the legacy behaviour to enable the SO_REUSEADDR option and make it possible to connect to different remote (addr, port) tuples. This patch allows us to bind SO_REUSEADDR enabled sockets to the same (addr, port) only when net.ipv4.ip_autobind_reuse is set 1 and all ephemeral ports are exhausted. This also allows connect() and listen() to share ports in the following way and may break some applications. So the ip_autobind_reuse is 0 by default and disables the feature. 1. setsockopt(sk1, SO_REUSEADDR) 2. setsockopt(sk2, SO_REUSEADDR) 3. bind(sk1, saddr, 0) 4. bind(sk2, saddr, 0) 5. connect(sk1, daddr) 6. listen(sk2) If it is set 1, we can fully utilize the 4-tuples, but we should use IP_BIND_ADDRESS_NO_PORT for bind()+connect() as possible. The notable thing is that if all sockets bound to the same port have both SO_REUSEADDR and SO_REUSEPORT enabled, we can bind sockets to an ephemeral port and also do listen(). Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-12Revert "net: sched: make newly activated qdiscs visible"Julian Wiedmann1-6/+0
This reverts commit 4cda75275f9f89f9485b0ca4d6950c95258a9bce from net-next. Brown bag time. Michal noticed that this change doesn't work at all when netif_set_real_num_tx_queues() gets called prior to an initial dev_activate(), as for instance igb does. Doing so dies with: [ 40.579142] BUG: kernel NULL pointer dereference, address: 0000000000000400 [ 40.586922] #PF: supervisor read access in kernel mode [ 40.592668] #PF: error_code(0x0000) - not-present page [ 40.598405] PGD 0 P4D 0 [ 40.601234] Oops: 0000 [#1] PREEMPT SMP PTI [ 40.605909] CPU: 18 PID: 1681 Comm: wickedd Tainted: G E 5.6.0-rc3-ethnl.50-default #1 [ 40.616205] Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.R3.27.D685.1305151734 05/15/2013 [ 40.627377] RIP: 0010:qdisc_hash_add.part.22+0x2e/0x90 [ 40.633115] Code: 00 55 53 89 f5 48 89 fb e8 2f 9b fb ff 85 c0 74 44 48 8b 43 40 48 8b 08 69 43 38 47 86 c8 61 c1 e8 1c 48 83 e8 80 48 8d 14 c1 <48> 8b 04 c1 48 8d 4b 28 48 89 53 30 48 89 43 28 48 85 c0 48 89 0a [ 40.654080] RSP: 0018:ffffb879864934d8 EFLAGS: 00010203 [ 40.659914] RAX: 0000000000000080 RBX: ffffffffb8328d80 RCX: 0000000000000000 [ 40.667882] RDX: 0000000000000400 RSI: 0000000000000000 RDI: ffffffffb831faa0 [ 40.675849] RBP: 0000000000000000 R08: ffffa0752c8b9088 R09: ffffa0752c8b9208 [ 40.683816] R10: 0000000000000006 R11: 0000000000000000 R12: ffffa0752d734000 [ 40.691783] R13: 0000000000000008 R14: 0000000000000000 R15: ffffa07113c18000 [ 40.699750] FS: 00007f94548e5880(0000) GS:ffffa0752e980000(0000) knlGS:0000000000000000 [ 40.708782] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 40.715189] CR2: 0000000000000400 CR3: 000000082b6ae006 CR4: 00000000001606e0 [ 40.723156] Call Trace: [ 40.725888] dev_qdisc_set_real_num_tx_queues+0x61/0x90 [ 40.731725] netif_set_real_num_tx_queues+0x94/0x1d0 [ 40.737286] __igb_open+0x19a/0x5d0 [igb] [ 40.741767] __dev_open+0xbb/0x150 [ 40.745567] __dev_change_flags+0x157/0x1a0 [ 40.750240] dev_change_flags+0x23/0x60 [...] Fixes: 4cda75275f9f ("net: sched: make newly activated qdiscs visible") Reported-by: Michal Kubecek <[email protected]> CC: Michal Kubecek <[email protected]> CC: Eric Dumazet <[email protected]> CC: Jamal Hadi Salim <[email protected]> CC: Cong Wang <[email protected]> CC: Jiri Pirko <[email protected]> Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-11net: sched: make newly activated qdiscs visibleJulian Wiedmann1-0/+6
In their .attach callback, mq[prio] only add the qdiscs of the currently active TX queues to the device's qdisc hash list. If a user later increases the number of active TX queues, their qdiscs are not visible via eg. 'tc qdisc show'. Add a hook to netif_set_real_num_tx_queues() that walks all active TX queues and adds those which are missing to the hash list. CC: Eric Dumazet <[email protected]> CC: Jamal Hadi Salim <[email protected]> CC: Cong Wang <[email protected]> CC: Jiri Pirko <[email protected]> Signed-off-by: Julian Wiedmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-11Bluetooth: Pause discovery and advertising during suspendAbhishek Pandit-Subedi1-0/+11
To prevent spurious wake ups, we disable any discovery or advertising when we enter suspend and restore it when we exit suspend. While paused, we disable any management requests to modify discovery or advertising. Signed-off-by: Abhishek Pandit-Subedi <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-03-11Bluetooth: Handle LE devices during suspendAbhishek Pandit-Subedi1-0/+1
To handle LE devices, we must first disable passive scanning and disconnect all connected devices. Once that is complete, we update the whitelist and re-enable scanning Signed-off-by: Abhishek Pandit-Subedi <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-03-11Bluetooth: Handle BR/EDR devices during suspendAbhishek Pandit-Subedi2-7/+20
To handle BR/EDR devices, we first disable page scan and disconnect all connected devices. Once that is complete, we add event filters (for devices that can wake the system) and re-enable page scan. Signed-off-by: Abhishek Pandit-Subedi <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-03-11Bluetooth: Handle PM_SUSPEND_PREPARE and PM_POST_SUSPENDAbhishek Pandit-Subedi1-0/+23
Register for PM_SUSPEND_PREPARE and PM_POST_SUSPEND to make sure the Bluetooth controller is prepared correctly for suspend/resume. Implement the registration, scheduling and task handling portions only in this patch. Signed-off-by: Abhishek Pandit-Subedi <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-03-10flow_offload: restrict driver to pass one allowed bit to ↵Jiri Pirko1-7/+17
flow_action_hw_stats_types_check() The intention of this helper was to allow driver to specify one type that it supports, so not only "any" value would pass. So make the API more strict and allow driver to pass only 1 bit that is going to be checked. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-10flow_offload: turn hw_stats_type into dedicated enumJiri Pirko1-6/+16
Put the values into enum and add an enum to define the bits. Suggested-by: Edward Cree <[email protected]> Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-10flow_offload: fix allowed types checkJiri Pirko1-1/+1
Change the check to see if the passed allowed type bit is enabled. Fixes: 319a1d19471e ("flow_offload: check for basic action hw stats type") Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-10flow_offload: use flow_action_for_each in ↵Jiri Pirko1-5/+6
flow_action_mixed_hw_stats_types_check() Instead of manually iterating over entries, use flow_action_for_each helper. Move the helper and wrap it to fit to 80 cols on the way. Signed-off-by: Jiri Pirko <[email protected]> Acked-by: Pablo Neira Ayuso <[email protected]> Acked-by: Edward Cree <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-10net: abstract out normal and compat msghdr importJens Axboe1-0/+3
This splits it into two parts, one that imports the message, and one that imports the iovec. This allows a caller to only do the first part, and import the iovec manually afterwards. No functional changes in this patch. Acked-by: David Miller <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-03-09bpf: Add sockmap hooks for UDP socketsLorenz Bauer1-0/+5
Add basic psock hooks for UDP sockets. This allows adding and removing sockets, as well as automatic removal on unhash and close. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Jakub Sitnicki <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-03-09bpf: sockmap: Move generic sockmap hooks from BPF TCPLorenz Bauer1-6/+9
The init, close and unhash handlers from TCP sockmap are generic, and can be reused by UDP sockmap. Move the helpers into the sockmap code base and expose them. This requires tcp_bpf_get_proto and tcp_bpf_clone to be conditional on BPF_STREAM_PARSER. The moved functions are unmodified, except that sk_psock_unlink is renamed to sock_map_unlink to better match its behaviour. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-03-09bpf: tcp: Guard declarations with CONFIG_NET_SOCK_MSGLorenz Bauer1-2/+2
tcp_bpf.c is only included in the build if CONFIG_NET_SOCK_MSG is selected. The declaration should therefore be guarded as such. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-03-09bpf: tcp: Move assertions into tcp_bpf_get_protoLorenz Bauer1-1/+0
We need to ensure that sk->sk_prot uses certain callbacks, so that code that directly calls e.g. tcp_sendmsg in certain corner cases works. To avoid spurious asserts, we must to do this only if sk_psock_update_proto has not yet been called. The same invariants apply for tcp_bpf_check_v6_needs_rebuild, so move the call as well. Doing so allows us to merge tcp_bpf_init and tcp_bpf_reinit. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-03-09bpf: sockmap: Only check ULP for TCP socketsLorenz Bauer1-0/+6
The sock map code checks that a socket does not have an active upper layer protocol before inserting it into the map. This requires casting via inet_csk, which isn't valid for UDP sockets. Guard checks for ULP by checking inet_sk(sk)->is_icsk first. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-03-08sched: act: allow user to specify type of HW stats for a filterJiri Pirko1-0/+4
Currently, user who is adding an action expects HW to report stats, however it does not have exact expectations about the stats types. That is aligned with TCA_ACT_HW_STATS_TYPE_ANY. Allow user to specify the type of HW stats for an action and require it. Pass the information down to flow_offload layer. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-08flow_offload: introduce "disabled" HW stats type and allow it in mlxswJiri Pirko1-0/+1
Introduce new type for disabled HW stats and allow the value in mlxsw offload. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-08flow_offload: introduce "delayed" HW stats type and allow it in mlx5Jiri Pirko1-1/+3
Introduce new type for delayed HW stats and allow the value in mlx5 offload. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-08flow_offload: introduce "immediate" HW stats type and allow it in mlxswJiri Pirko1-1/+2
Introduce new type for immediate HW stats and allow the value in mlxsw offload. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-08flow_offload: check for basic action hw stats typeJiri Pirko1-0/+61
Introduce flow_action_basic_hw_stats_types_check() helper and use it in drivers. That sanitizes the drivers which do not have support for action HW stats types. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-08flow_offload: Introduce offload of HW stats typeJiri Pirko1-0/+3
Initially, pass "ANY" (struct is zeroed) to the drivers as that is the current implicit value coming down to flow_offload. Add a bool indicating that entries have mixed HW stats type. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-08Bluetooth: L2CAP: Add module option to enable ECRED modeLuiz Augusto von Dentz1-0/+1
This should make it safe to have the code upstream without affecting stable systems since there are a few details not sort out with ECRED mode e.g: how to initiate multiple connections at once. Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-03-08Bluetooth: L2CAP: Add initial code for Enhanced Credit Based ModeLuiz Augusto von Dentz1-0/+4
This adds the initial code for Enhanced Credit Based Mode which introduces a new socket mode called L2CAP_MODE_EXT_FLOWCTL, which for the most part work the same as L2CAP_MODE_LE_FLOWCTL but uses different PDUs to setup the connections and also works over BR/EDR. Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-03-08Bluetooth: L2CAP: Add definitions for Enhanced Credit Based ModeLuiz Augusto von Dentz1-0/+39
This introduces the definitions for the new L2CAP mode called Enhanced Credit Based Mode. Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-03-08Bluetooth: Enable erroneous data reporting if WBS is supportedAlain Michaud3-2/+19
This change introduces a wide band speech setting which allows higher level clients to query the local controller support for wide band speech as well as set the setting state when the radio is powered off. Internally, this setting controls if erroneous data reporting is enabled on the controller. Signed-off-by: Alain Michaud <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-03-05net: sched: Make FIFO Qdisc offloadablePetr Machata1-0/+15
Invoke ndo_setup_tc() as appropriate to signal init / replacement, destroying and dumping of pFIFO / bFIFO Qdisc. A lot of the FIFO logic is used for pFIFO_head_drop as well, but that's a semantically very different Qdisc that isn't really in the same boat as pFIFO / bFIFO. Split some of the functions to keep the Qdisc intact. Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-04pie: realign commentLeslie Monis1-9/+9
Realign a comment after the change introduced by the previous patch. Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-03-04pie: remove pie_vars->accu_prob_overflowsLeslie Monis1-4/+1
The variable pie_vars->accu_prob is used as an accumulator for probability values. Since probabilty values are scaled using the MAX_PROB macro denoting (2^64 - 1), pie_vars->accu_prob is likely to overflow as it is of type u64. The variable pie_vars->accu_prob_overflows counts the number of times the variable pie_vars->accu_prob overflows. The MAX_PROB macro needs to be equal to at least (2^39 - 1) in order to do precise calculations without any underflow. Thus MAX_PROB can be reduced to (2^56 - 1) without affecting the precision in calculations drastically. Doing so will eliminate the need for the variable pie_vars->accu_prob_overflows as the variable pie_vars->accu_prob will never overflow. Removing the variable pie_vars->accu_prob_overflows also reduces the size of the structure pie_vars to exactly 64 bytes. Signed-off-by: Mohit P. Tahiliani <[email protected]> Signed-off-by: Gautam Ramakrishnan <[email protected]> Signed-off-by: Leslie Monis <[email protected]> Signed-off-by: David S. Miller <[email protected]>