aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2020-01-17net: avoid updating qdisc_xmit_lock_key in netdev_update_lockdep_key()Cong Wang1-12/+0
syzbot reported some bogus lockdep warnings, for example bad unlock balance in sch_direct_xmit(). They are due to a race condition between slow path and fast path, that is qdisc_xmit_lock_key gets re-registered in netdev_update_lockdep_key() on slow path, while we could still acquire the queue->_xmit_lock on fast path in this small window: CPU A CPU B __netif_tx_lock(); lockdep_unregister_key(qdisc_xmit_lock_key); __netif_tx_unlock(); lockdep_register_key(qdisc_xmit_lock_key); In fact, unlike the addr_list_lock which has to be reordered when the master/slave device relationship changes, queue->_xmit_lock is only acquired on fast path and only when NETIF_F_LLTX is not set, so there is likely no nested locking for it. Therefore, we can just get rid of re-registration of qdisc_xmit_lock_key. Reported-by: [email protected] Fixes: ab92d68fc22f ("net: core: add generic lockdep keys") Cc: Taehee Yoo <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Taehee Yoo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-17net/sched: act_ife: initalize ife->metalist earlierEric Dumazet1-4/+3
It seems better to init ife->metalist earlier in tcf_ife_init() to avoid the following crash : kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 10483 Comm: syz-executor216 Not tainted 5.5.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:_tcf_ife_cleanup net/sched/act_ife.c:412 [inline] RIP: 0010:tcf_ife_cleanup+0x6e/0x400 net/sched/act_ife.c:431 Code: 48 c1 ea 03 80 3c 02 00 0f 85 94 03 00 00 49 8b bd f8 00 00 00 48 b8 00 00 00 00 00 fc ff df 4c 8d 67 e8 48 89 fa 48 c1 ea 03 <80> 3c 02 00 0f 85 5c 03 00 00 48 bb 00 00 00 00 00 fc ff df 48 8b RSP: 0018:ffffc90001dc6d00 EFLAGS: 00010246 RAX: dffffc0000000000 RBX: ffffffff864619c0 RCX: ffffffff815bfa09 RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000000 RBP: ffffc90001dc6d50 R08: 0000000000000004 R09: fffff520003b8d8e R10: fffff520003b8d8d R11: 0000000000000003 R12: ffffffffffffffe8 R13: ffff8880a79fc000 R14: ffff88809aba0e00 R15: 0000000000000000 FS: 0000000001b51880(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000563f52cce140 CR3: 0000000093541000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tcf_action_cleanup+0x62/0x1b0 net/sched/act_api.c:119 __tcf_action_put+0xfa/0x130 net/sched/act_api.c:135 __tcf_idr_release net/sched/act_api.c:165 [inline] __tcf_idr_release+0x59/0xf0 net/sched/act_api.c:145 tcf_idr_release include/net/act_api.h:171 [inline] tcf_ife_init+0x97c/0x1870 net/sched/act_ife.c:616 tcf_action_init_1+0x6b6/0xa40 net/sched/act_api.c:944 tcf_action_init+0x21a/0x330 net/sched/act_api.c:1000 tcf_action_add+0xf5/0x3b0 net/sched/act_api.c:1410 tc_ctl_action+0x390/0x488 net/sched/act_api.c:1465 rtnetlink_rcv_msg+0x45e/0xaf0 net/core/rtnetlink.c:5424 netlink_rcv_skb+0x177/0x450 net/netlink/af_netlink.c:2477 rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0x58c/0x7d0 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x91c/0xea0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg+0xd7/0x130 net/socket.c:659 ____sys_sendmsg+0x753/0x880 net/socket.c:2330 ___sys_sendmsg+0x100/0x170 net/socket.c:2384 __sys_sendmsg+0x105/0x1d0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg net/socket.c:2424 [inline] __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2424 do_syscall_64+0xfa/0x790 arch/x86/entry/common.c:294 entry_SYSCALL_64_after_hwframe+0x49/0xbe Fixes: 11a94d7fd80f ("net/sched: act_ife: validate the control action inside init()") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Cc: Davide Caratti <[email protected]> Reviewed-by: Davide Caratti <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfDavid S. Miller5-24/+54
Pablo Neira Ayuso says: ==================== Netfilter updates for net The following patchset contains Netfilter fixes for net: 1) Fix use-after-free in ipset bitmap destroy path, from Cong Wang. 2) Missing init netns in entry cleanup path of arp_tables, from Florian Westphal. 3) Fix WARN_ON in set destroy path due to missing cleanup on transaction error. 4) Incorrect netlink sanity check in tunnel, from Florian Westphal. 5) Missing sanity check for erspan version netlink attribute, also from Florian. 6) Remove WARN in nft_request_module() that can be triggered from userspace, from Florian Westphal. 7) Memleak in NFTA_HOOK_DEVS netlink parser, from Dan Carpenter. 8) List poison from commit path for flowtables that are added and deleted in the same batch, from Florian Westphal. 9) Fix NAT ICMP packet corruption, from Eyal Birger. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-16xdp: Use bulking for non-map XDP_REDIRECT and consolidate code pathsToke Høiland-Jørgensen1-71/+19
Since the bulk queue used by XDP_REDIRECT now lives in struct net_device, we can re-use the bulking for the non-map version of the bpf_redirect() helper. This is a simple matter of having xdp_do_redirect_slow() queue the frame on the bulk queue instead of sending it out with __bpf_tx_xdp(). Unfortunately we can't make the bpf_redirect() helper return an error if the ifindex doesn't exit (as bpf_redirect_map() does), because we don't have a reference to the network namespace of the ingress device at the time the helper is called. So we have to leave it as-is and keep the device lookup in xdp_do_redirect_slow(). Since this leaves less reason to have the non-map redirect code in a separate function, so we get rid of the xdp_do_redirect_slow() function entirely. This does lose us the tracepoint disambiguation, but fortunately the xdp_redirect and xdp_redirect_map tracepoints use the same tracepoint entry structures. This means both can contain a map index, so we can just amend the tracepoint definitions so we always emit the xdp_redirect(_err) tracepoints, but with the map ID only populated if a map is present. This means we retire the xdp_redirect_map(_err) tracepoints entirely, but keep the definitions around in case someone is still listening for them. With this change, the performance of the xdp_redirect sample program goes from 5Mpps to 8.4Mpps (a 68% increase). Since the flush functions are no longer map-specific, rename the flush() functions to drop _map from their names. One of the renamed functions is the xdp_do_flush_map() callback used in all the xdp-enabled drivers. To keep from having to update all drivers, use a #define to keep the old name working, and only update the virtual drivers in this patch. Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-01-16xdp: Move devmap bulk queue into struct net_deviceToke Høiland-Jørgensen1-0/+2
Commit 96360004b862 ("xdp: Make devmap flush_list common for all map instances"), changed devmap flushing to be a global operation instead of a per-map operation. However, the queue structure used for bulking was still allocated as part of the containing map. This patch moves the devmap bulk queue into struct net_device. The motivation for this is reusing it for the non-map variant of XDP_REDIRECT, which will be changed in a subsequent commit. To avoid other fields of struct net_device moving to different cache lines, we also move a couple of other members around. We defer the actual allocation of the bulk queue structure until the NETDEV_REGISTER notification devmap.c. This makes it possible to check for ndo_xdp_xmit support before allocating the structure, which is not possible at the time struct net_device is allocated. However, we keep the freeing in free_netdev() to avoid adding another RCU callback on NETDEV_UNREGISTER. Because of this change, we lose the reference back to the map that originated the redirect, so change the tracepoint to always return 0 as the map ID and index. Otherwise no functional change is intended with this patch. After this patch, the relevant part of struct net_device looks like this, according to pahole: /* --- cacheline 14 boundary (896 bytes) --- */ struct netdev_queue * _tx __attribute__((__aligned__(64))); /* 896 8 */ unsigned int num_tx_queues; /* 904 4 */ unsigned int real_num_tx_queues; /* 908 4 */ struct Qdisc * qdisc; /* 912 8 */ unsigned int tx_queue_len; /* 920 4 */ spinlock_t tx_global_lock; /* 924 4 */ struct xdp_dev_bulk_queue * xdp_bulkq; /* 928 8 */ struct xps_dev_maps * xps_cpus_map; /* 936 8 */ struct xps_dev_maps * xps_rxqs_map; /* 944 8 */ struct mini_Qdisc * miniq_egress; /* 952 8 */ /* --- cacheline 15 boundary (960 bytes) --- */ struct hlist_head qdisc_hash[16]; /* 960 128 */ /* --- cacheline 17 boundary (1088 bytes) --- */ struct timer_list watchdog_timer; /* 1088 40 */ /* XXX last struct has 4 bytes of padding */ int watchdog_timeo; /* 1128 4 */ /* XXX 4 bytes hole, try to pack */ struct list_head todo_list; /* 1136 16 */ /* --- cacheline 18 boundary (1152 bytes) --- */ Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Björn Töpel <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-01-16netfilter: bitwise: add support for shifts.Jeremy Sowden1-0/+77
Hitherto nft_bitwise has only supported boolean operations: NOT, AND, OR and XOR. Extend it to do shifts as well. Signed-off-by: Jeremy Sowden <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: bitwise: add NFTA_BITWISE_DATA attribute.Jeremy Sowden1-0/+5
Add a new bitwise netlink attribute that will be used by shift operations to store the size of the shift. It is not used by boolean operations. Signed-off-by: Jeremy Sowden <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: bitwise: only offload boolean operations.Jeremy Sowden1-0/+3
Only boolean operations supports offloading, so check the type of the operation and return an error for other types. Signed-off-by: Jeremy Sowden <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: bitwise: add helper for dumping boolean operations.Jeremy Sowden1-8/+21
Split the code specific to dumping bitwise boolean operations out into a separate function. A similar function will be added later for shift operations. Signed-off-by: Jeremy Sowden <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: bitwise: add helper for evaluating boolean operations.Jeremy Sowden1-3/+14
Split the code specific to evaluating bitwise boolean operations out into a separate function. Similar functions will be added later for shift operations. Signed-off-by: Jeremy Sowden <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: bitwise: add helper for initializing boolean operations.Jeremy Sowden1-25/+41
Split the code specific to initializing bitwise boolean operations out into a separate function. A similar function will be added later for shift operations. Signed-off-by: Jeremy Sowden <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: bitwise: add NFTA_BITWISE_OP netlink attribute.Jeremy Sowden1-0/+16
Add a new bitwise netlink attribute, NFTA_BITWISE_OP, which is set to a value of a new enum, nft_bitwise_ops. It describes the type of operation an expression contains. Currently, it only has one value: NFT_BITWISE_BOOL. More values will be added later to implement shifts. Signed-off-by: Jeremy Sowden <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: bitwise: replace gotos with returns.Jeremy Sowden1-8/+5
When dumping a bitwise expression, if any of the puts fails, we use goto to jump to a label. However, no clean-up is required and the only statement at the label is a return. Drop the goto's and return immediately instead. Signed-off-by: Jeremy Sowden <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: bitwise: remove NULL comparisons from attribute checks.Jeremy Sowden1-5/+5
In later patches, we will be adding more checks. In order to be consistent and prevent complaints from checkpatch.pl, replace the existing comparisons with NULL with logical NOT operators. Signed-off-by: Jeremy Sowden <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: nf_tables: white-space fixes.Jeremy Sowden3-5/+5
Indentation fixes for the parameters of a few nft functions. Signed-off-by: Jeremy Sowden <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: flowtable: add nf_flow_table_offload_cmd()Pablo Neira Ayuso1-12/+28
Split nf_flow_table_offload_setup() in two functions to make it more maintainable. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: flowtable: add nf_flow_offload_tuple() helperPablo Neira Ayuso1-23/+24
Consolidate code to configure the flow_cls_offload structure into one helper function. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: hashlimit: do not use indirect calls during gcFlorian Westphal1-18/+4
no need, just use a simple boolean to indicate we want to reap all entries. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: flowtable: refresh flow if hardware offload failsPablo Neira Ayuso3-10/+21
If nf_flow_offload_add() fails to add the flow to hardware, then the NF_FLOW_HW_REFRESH flag bit is set and the flow remains in the flowtable software path. If flowtable hardware offload is enabled, this patch enqueues a new request to offload this flow to hardware. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: flowtable: add nf_flowtable_hw_offload() helper functionPablo Neira Ayuso2-3/+3
This function checks for the NF_FLOWTABLE_HW_OFFLOAD flag, meaning that the flowtable hardware offload is enabled. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: flowtable: use atomic bitwise operations for flow flagsPablo Neira Ayuso3-24/+24
Originally, all flow flag bits were set on only from the workqueue. With the introduction of the flow teardown state and hardware offload this is no longer true. Let's be safe and use atomic bitwise operation to operation with flow flags. Fixes: 59c466dd68e7 ("netfilter: nf_flow_table: add a new flow state for tearing down offloading") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: flowtable: remove dying bit, use teardown bit insteadPablo Neira Ayuso1-5/+3
The dying bit removes the conntrack entry if the netdev that owns this flow is going down. Instead, use the teardown mechanism to push back the flow to conntrack to let the classic software path decide what to do with it. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: flowtable: add nf_flow_offload_work_alloc()Pablo Neira Ayuso1-16/+22
Add helper function to allocate and initialize flow offload work and use it to consolidate existing code. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: flowtable: restrict flow dissector match on meta ingress devicePablo Neira Ayuso1-1/+7
Set on FLOW_DISSECTOR_KEY_META meta key using flow tuple ingress interface. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: flowtable: fetch stats only if flow is still alivePablo Neira Ayuso2-5/+3
Do not fetch statistics if flow has expired since it might not in hardware anymore. After this update, remove the FLOW_OFFLOAD_HW_DYING check from nf_flow_offload_stats() since this flag is never set on. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Pablo Neira Ayuso <[email protected]> Acked-by: wenxu <[email protected]>
2020-01-16net/rds: Detect need of On-Demand-Paging memory registrationHans Westgaard Ry1-2/+4
Add code to check if memory intended for RDMA is FS-DAX-memory. RDS will fail with error code EOPNOTSUPP if FS-DAX-memory is detected. Signed-off-by: Hans Westgaard Ry <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]>
2020-01-16netfilter: nat: fix ICMP header corruption on ICMP errorsEyal Birger1-0/+13
Commit 8303b7e8f018 ("netfilter: nat: fix spurious connection timeouts") made nf_nat_icmp_reply_translation() use icmp_manip_pkt() as the l4 manipulation function for the outer packet on ICMP errors. However, icmp_manip_pkt() assumes the packet has an 'id' field which is not correct for all types of ICMP messages. This is not correct for ICMP error packets, and leads to bogus bytes being written the ICMP header, which can be wrongfully regarded as 'length' bytes by RFC 4884 compliant receivers. Fix by assigning the 'id' field only for ICMP messages that have this semantic. Reported-by: Shmulik Ladkani <[email protected]> Fixes: 8303b7e8f018 ("netfilter: nat: fix spurious connection timeouts") Signed-off-by: Eyal Birger <[email protected]> Acked-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: nf_tables: fix flowtable list del corruptionFlorian Westphal1-0/+6
syzbot reported following crash: list_del corruption, ffff88808c9bb000->prev is LIST_POISON2 (dead000000000122) [..] Call Trace: __list_del_entry include/linux/list.h:131 [inline] list_del_rcu include/linux/rculist.h:148 [inline] nf_tables_commit+0x1068/0x3b30 net/netfilter/nf_tables_api.c:7183 [..] The commit transaction list has: NFT_MSG_NEWTABLE NFT_MSG_NEWFLOWTABLE NFT_MSG_DELFLOWTABLE NFT_MSG_DELTABLE A missing generation check during DELTABLE processing causes it to queue the DELFLOWTABLE operation a second time, so we corrupt the list here: case NFT_MSG_DELFLOWTABLE: list_del_rcu(&nft_trans_flowtable(trans)->list); nf_tables_flowtable_notify(&trans->ctx, because we have two different DELFLOWTABLE transactions for the same flowtable. We then call list_del_rcu() twice for the same flowtable->list. The object handling seems to suffer from the same bug so add a generation check too and only queue delete transactions for flowtables/objects that are still active in the next generation. Reported-by: [email protected] Fixes: 3b49e2e94e6eb ("netfilter: nf_tables: add flow table netlink frontend") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: nf_tables: fix memory leak in nf_tables_parse_netdev_hooks()Dan Carpenter1-0/+1
Syzbot detected a leak in nf_tables_parse_netdev_hooks(). If the hook already exists, then the error handling doesn't free the newest "hook". Reported-by: [email protected] Fixes: b75a3e8371bc ("netfilter: nf_tables: allow netdevice to be used only once per flowtable") Signed-off-by: Dan Carpenter <[email protected]> Reviewed-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: nf_tables: remove WARN and add NLA_STRING upper limitsFlorian Westphal1-4/+9
This WARN can trigger because some of the names fed to the module autoload function can be of arbitrary length. Remove the WARN and add limits for all NLA_STRING attributes. Reported-by: [email protected] Fixes: 452238e8d5ffd8 ("netfilter: nf_tables: add and use helper for module autoload") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: nft_tunnel: ERSPAN_VERSION must not be nullFlorian Westphal1-0/+3
Fixes: af308b94a2a4a5 ("netfilter: nf_tables: add tunnel support") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: nft_tunnel: fix null-attribute checkFlorian Westphal1-1/+1
else we get null deref when one of the attributes is missing, both must be non-null. Reported-by: [email protected] Fixes: aaecfdb5c5dd8ba ("netfilter: nf_tables: match on tunnel metadata") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16netfilter: nf_tables: store transaction list locally while requesting modulePablo Neira Ayuso1-9/+10
This patch fixes a WARN_ON in nft_set_destroy() due to missing set reference count drop from the preparation phase. This is triggered by the module autoload path. Do not exercise the abort path from nft_request_module() while preparation phase cleaning up is still pending. WARNING: CPU: 3 PID: 3456 at net/netfilter/nf_tables_api.c:3740 nft_set_destroy+0x45/0x50 [nf_tables] [...] CPU: 3 PID: 3456 Comm: nft Not tainted 5.4.6-arch3-1 #1 RIP: 0010:nft_set_destroy+0x45/0x50 [nf_tables] Code: e8 30 eb 83 c6 48 8b 85 80 00 00 00 48 8b b8 90 00 00 00 e8 dd 6b d7 c5 48 8b 7d 30 e8 24 dd eb c5 48 89 ef 5d e9 6b c6 e5 c5 <0f> 0b c3 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 8b 7f 10 e9 52 RSP: 0018:ffffac4f43e53700 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff99d63a154d80 RCX: 0000000001f88e03 RDX: 0000000001f88c03 RSI: ffff99d6560ef0c0 RDI: ffff99d63a101200 RBP: ffff99d617721de0 R08: 0000000000000000 R09: 0000000000000318 R10: 00000000f0000000 R11: 0000000000000001 R12: ffffffff880fabf0 R13: dead000000000122 R14: dead000000000100 R15: ffff99d63a154d80 FS: 00007ff3dbd5b740(0000) GS:ffff99d6560c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00001cb5de6a9000 CR3: 000000016eb6a004 CR4: 00000000001606e0 Call Trace: __nf_tables_abort+0x3e3/0x6d0 [nf_tables] nft_request_module+0x6f/0x110 [nf_tables] nft_expr_type_request_module+0x28/0x50 [nf_tables] nf_tables_expr_parse+0x198/0x1f0 [nf_tables] nft_expr_init+0x3b/0xf0 [nf_tables] nft_dynset_init+0x1e2/0x410 [nf_tables] nf_tables_newrule+0x30a/0x930 [nf_tables] nfnetlink_rcv_batch+0x2a0/0x640 [nfnetlink] nfnetlink_rcv+0x125/0x171 [nfnetlink] netlink_unicast+0x179/0x210 netlink_sendmsg+0x208/0x3d0 sock_sendmsg+0x5e/0x60 ____sys_sendmsg+0x21b/0x290 Update comment on the code to describe the new behaviour. Reported-by: Marco Oliverio <[email protected]> Fixes: 452238e8d5ff ("netfilter: nf_tables: add and use helper for module autoload") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-16net: dsa: tag_qca: fix doubled Tx statisticsAlexander Lobakin1-3/+0
DSA subsystem takes care of netdev statistics since commit 4ed70ce9f01c ("net: dsa: Refactor transmit path to eliminate duplication"), so any accounting inside tagger callbacks is redundant and can lead to messing up the stats. This bug is present in Qualcomm tagger since day 0. Fixes: cafdc45c949b ("net-next: dsa: add Qualcomm tag RX/TX handler") Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-16net: dsa: tag_gswip: fix typo in tagger nameAlexander Lobakin1-1/+1
The correct name is GSWIP (Gigabit Switch IP). Typo was introduced in 875138f81d71a ("dsa: Move tagger name into its ops structure") while moving tagger names to their structures. Fixes: 875138f81d71a ("dsa: Move tagger name into its ops structure") Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Acked-by: Hauke Mehrtens <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller7-30/+63
Daniel Borkmann says: ==================== pull-request: bpf 2020-01-15 The following pull-request contains BPF updates for your *net* tree. We've added 12 non-merge commits during the last 9 day(s) which contain a total of 13 files changed, 95 insertions(+), 43 deletions(-). The main changes are: 1) Fix refcount leak for TCP time wait and request sockets for socket lookup related BPF helpers, from Lorenz Bauer. 2) Fix wrong verification of ARSH instruction under ALU32, from Daniel Borkmann. 3) Batch of several sockmap and related TLS fixes found while operating more complex BPF programs with Cilium and OpenSSL, from John Fastabend. 4) Fix sockmap to read psock's ingress_msg queue before regular sk_receive_queue() to avoid purging data upon teardown, from Lingpeng Chen. 5) Fix printing incorrect pointer in bpftool's btf_dump_ptr() in order to properly dump a BPF map's value with BTF, from Martin KaFai Lau. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-16Bluetooth: Increment management interface revisionMarcel Holtmann1-1/+1
Increment the mgmt revision due to the recently added commands. Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Johan Hedberg <[email protected]>
2020-01-15bpf: Sockmap/tls, fix pop data with SK_DROP return codeJohn Fastabend2-8/+2
When user returns SK_DROP we need to reset the number of copied bytes to indicate to the user the bytes were dropped and not sent. If we don't reset the copied arg sendmsg will return as if those bytes were copied giving the user a positive return value. This works as expected today except in the case where the user also pops bytes. In the pop case the sg.size is reduced but we don't correctly account for this when copied bytes is reset. The popped bytes are not accounted for and we return a small positive value potentially confusing the user. The reason this happens is due to a typo where we do the wrong comparison when accounting for pop bytes. In this fix notice the if/else is not needed and that we have a similar problem if we push data except its not visible to the user because if delta is larger the sg.size we return a negative value so it appears as an error regardless. Fixes: 7246d8ed4dcce ("bpf: helper to pop data from messages") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Jonathan Lemon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/bpf/[email protected]
2020-01-15bpf: Sockmap/tls, skmsg can have wrapped skmsg that needs extra chainingJohn Fastabend1-0/+6
Its possible through a set of push, pop, apply helper calls to construct a skmsg, which is just a ring of scatterlist elements, with the start value larger than the end value. For example, end start |_0_|_1_| ... |_n_|_n+1_| Where end points at 1 and start points and n so that valid elements is the set {n, n+1, 0, 1}. Currently, because we don't build the correct chain only {n, n+1} will be sent. This adds a check and sg_chain call to correctly submit the above to the crypto and tls send path. Fixes: d3b18ad31f93d ("tls: add bpf support to sk_msg handling") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Jonathan Lemon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/bpf/[email protected]
2020-01-15bpf: Sockmap/tls, tls_sw can create a plaintext buf > encrypt bufJohn Fastabend1-0/+20
It is possible to build a plaintext buffer using push helper that is larger than the allocated encrypt buffer. When this record is pushed to crypto layers this can result in a NULL pointer dereference because the crypto API expects the encrypt buffer is large enough to fit the plaintext buffer. Kernel splat below. To resolve catch the cases this can happen and split the buffer into two records to send individually. Unfortunately, there is still one case to handle where the split creates a zero sized buffer. In this case we merge the buffers and unmark the split. This happens when apply is zero and user pushed data beyond encrypt buffer. This fixes the original case as well because the split allocated an encrypt buffer larger than the plaintext buffer and the merge simply moves the pointers around so we now have a reference to the new (larger) encrypt buffer. Perhaps its not ideal but it seems the best solution for a fixes branch and avoids handling these two cases, (a) apply that needs split and (b) non apply case. The are edge cases anyways so optimizing them seems not necessary unless someone wants later in next branches. [ 306.719107] BUG: kernel NULL pointer dereference, address: 0000000000000008 [...] [ 306.747260] RIP: 0010:scatterwalk_copychunks+0x12f/0x1b0 [...] [ 306.770350] Call Trace: [ 306.770956] scatterwalk_map_and_copy+0x6c/0x80 [ 306.772026] gcm_enc_copy_hash+0x4b/0x50 [ 306.772925] gcm_hash_crypt_remain_continue+0xef/0x110 [ 306.774138] gcm_hash_crypt_continue+0xa1/0xb0 [ 306.775103] ? gcm_hash_crypt_continue+0xa1/0xb0 [ 306.776103] gcm_hash_assoc_remain_continue+0x94/0xa0 [ 306.777170] gcm_hash_assoc_continue+0x9d/0xb0 [ 306.778239] gcm_hash_init_continue+0x8f/0xa0 [ 306.779121] gcm_hash+0x73/0x80 [ 306.779762] gcm_encrypt_continue+0x6d/0x80 [ 306.780582] crypto_gcm_encrypt+0xcb/0xe0 [ 306.781474] crypto_aead_encrypt+0x1f/0x30 [ 306.782353] tls_push_record+0x3b9/0xb20 [tls] [ 306.783314] ? sk_psock_msg_verdict+0x199/0x300 [ 306.784287] bpf_exec_tx_verdict+0x3f2/0x680 [tls] [ 306.785357] tls_sw_sendmsg+0x4a3/0x6a0 [tls] test_sockmap test signature to trigger bug, [TEST]: (1, 1, 1, sendmsg, pass,redir,start 1,end 2,pop (1,2),ktls,): Fixes: d3b18ad31f93d ("tls: add bpf support to sk_msg handling") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Jonathan Lemon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/bpf/[email protected]
2020-01-15bpf: Sockmap/tls, msg_push_data may leave end mark in placeJohn Fastabend1-0/+1
Leaving an incorrect end mark in place when passing to crypto layer will cause crypto layer to stop processing data before all data is encrypted. To fix clear the end mark on push data instead of expecting users of the helper to clear the mark value after the fact. This happens when we push data into the middle of a skmsg and have room for it so we don't do a set of copies that already clear the end flag. Fixes: 6fff607e2f14b ("bpf: sk_msg program helper bpf_msg_push_data") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/bpf/[email protected]
2020-01-15bpf: Sockmap, skmsg helper overestimates push, pull, and pop boundsJohn Fastabend1-5/+5
In the push, pull, and pop helpers operating on skmsg objects to make data writable or insert/remove data we use this bounds check to ensure specified data is valid, /* Bounds checks: start and pop must be inside message */ if (start >= offset + l || last >= msg->sg.size) return -EINVAL; The problem here is offset has already included the length of the current element the 'l' above. So start could be past the end of the scatterlist element in the case where start also points into an offset on the last skmsg element. To fix do the accounting slightly different by adding the length of the previous entry to offset at the start of the iteration. And ensure its initialized to zero so that the first iteration does nothing. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Fixes: 6fff607e2f14b ("bpf: sk_msg program helper bpf_msg_push_data") Fixes: 7246d8ed4dcce ("bpf: helper to pop data from messages") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/bpf/[email protected]
2020-01-15bpf: Sockmap/tls, push write_space updates through ulp updatesJohn Fastabend2-5/+11
When sockmap sock with TLS enabled is removed we cleanup bpf/psock state and call tcp_update_ulp() to push updates to TLS ULP on top. However, we don't push the write_space callback up and instead simply overwrite the op with the psock stored previous op. This may or may not be correct so to ensure we don't overwrite the TLS write space hook pass this field to the ULP and have it fixup the ctx. This completes a previous fix that pushed the ops through to the ULP but at the time missed doing this for write_space, presumably because write_space TLS hook was added around the same time. Fixes: 95fa145479fbc ("bpf: sockmap/tls, close can race with map free") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Acked-by: Jonathan Lemon <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/bpf/[email protected]
2020-01-15bpf: Sockmap, ensure sock lock held during tear downJohn Fastabend2-1/+8
The sock_map_free() and sock_hash_free() paths used to delete sockmap and sockhash maps walk the maps and destroy psock and bpf state associated with the socks in the map. When done the socks no longer have BPF programs attached and will function normally. This can happen while the socks in the map are still "live" meaning data may be sent/received during the walk. Currently, though we don't take the sock_lock when the psock and bpf state is removed through this path. Specifically, this means we can be writing into the ops structure pointers such as sendmsg, sendpage, recvmsg, etc. while they are also being called from the networking side. This is not safe, we never used proper READ_ONCE/WRITE_ONCE semantics here if we believed it was safe. Further its not clear to me its even a good idea to try and do this on "live" sockets while networking side might also be using the socket. Instead of trying to reason about using the socks from both sides lets realize that every use case I'm aware of rarely deletes maps, in fact kubernetes/Cilium case builds map at init and never tears it down except on errors. So lets do the simple fix and grab sock lock. This patch wraps sock deletes from maps in sock lock and adds some annotations so we catch any other cases easier. Fixes: 604326b41a6fb ("bpf, sockmap: convert to generic sk_msg interface") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Song Liu <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/bpf/[email protected]
2020-01-15Merge tag 'batadv-next-for-davem-20200114' of ↵David S. Miller61-77/+85
git://git.open-mesh.org/linux-merge Simon Wunderlich says: ==================== This feature/cleanup patchset includes the following patches: - bump version strings, by Simon Wunderlich - fix typo and kerneldocs, by Sven Eckelmann - use WiFi txbitrate for B.A.T.M.A.N. V as fallback, by René Treffer - silence some endian sparse warnings by adding annotations, by Sven Eckelmann - Update copyright years to 2020, by Sven Eckelmann - Disable deprecated sysfs configuration by default, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-15Merge tag 'batadv-net-for-davem-20200114' of git://git.open-mesh.org/linux-mergeDavid S. Miller1-1/+3
Simon Wunderlich says: ==================== Here is a batman-adv bugfix: - Fix DAT candidate selection on little endian systems, by Sven Eckelmann ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-15tcp: fix marked lost packets not being retransmittedPengcheng Yang1-3/+4
When the packet pointed to by retransmit_skb_hint is unlinked by ACK, retransmit_skb_hint will be set to NULL in tcp_clean_rtx_queue(). If packet loss is detected at this time, retransmit_skb_hint will be set to point to the current packet loss in tcp_verify_retransmit_hint(), then the packets that were previously marked lost but not retransmitted due to the restriction of cwnd will be skipped and cannot be retransmitted. To fix this, when retransmit_skb_hint is NULL, retransmit_skb_hint can be reset only after all marked lost packets are retransmitted (retrans_out >= lost_out), otherwise we need to traverse from tcp_rtx_queue_head in tcp_xmit_retransmit_queue(). Packetdrill to demonstrate: // Disable RACK and set max_reordering to keep things simple 0 `sysctl -q net.ipv4.tcp_recovery=0` +0 `sysctl -q net.ipv4.tcp_max_reordering=3` // Establish a connection +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 +0 bind(3, ..., ...) = 0 +0 listen(3, 1) = 0 +.1 < S 0:0(0) win 32792 <mss 1000,sackOK,nop,nop,nop,wscale 7> +0 > S. 0:0(0) ack 1 <...> +.01 < . 1:1(0) ack 1 win 257 +0 accept(3, ..., ...) = 4 // Send 8 data segments +0 write(4, ..., 8000) = 8000 +0 > P. 1:8001(8000) ack 1 // Enter recovery and 1:3001 is marked lost +.01 < . 1:1(0) ack 1 win 257 <sack 3001:4001,nop,nop> +0 < . 1:1(0) ack 1 win 257 <sack 5001:6001 3001:4001,nop,nop> +0 < . 1:1(0) ack 1 win 257 <sack 5001:7001 3001:4001,nop,nop> // Retransmit 1:1001, now retransmit_skb_hint points to 1001:2001 +0 > . 1:1001(1000) ack 1 // 1001:2001 was ACKed causing retransmit_skb_hint to be set to NULL +.01 < . 1:1(0) ack 2001 win 257 <sack 5001:8001 3001:4001,nop,nop> // Now retransmit_skb_hint points to 4001:5001 which is now marked lost // BUG: 2001:3001 was not retransmitted +0 > . 2001:3001(1000) ack 1 Signed-off-by: Pengcheng Yang <[email protected]> Acked-by: Neal Cardwell <[email protected]> Tested-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-15Bluetooth: Make use of __check_timeout on hci_sched_leLuiz Augusto von Dentz1-8/+3
This reuse __check_timeout on hci_sched_le following the same logic used hci_sched_acl. Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-01-15Bluetooth: monitor: Add support for ISO packetsLuiz Augusto von Dentz1-0/+6
This enables passing ISO packets to the monitor socket. Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-01-15Bluetooth: Implementation of MGMT_OP_SET_BLOCKED_KEYS.Alain Michaud4-8/+188
MGMT command is added to receive the list of blocked keys from user-space. The list is used to: 1) Block keys from being distributed by the device during the ke distribution phase of SMP. 2) Filter out any keys that were previously saved so they are no longer used. Signed-off-by: Alain Michaud <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>