aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2020-01-08Bluetooth: remove redundant assignment to variable icidColin Ian King1-1/+0
Variable icid is being rc is assigned with a value that is never read. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-01-08pkt_sched: fq: do not accept silly TCA_FQ_QUANTUMEric Dumazet1-2/+4
As diagnosed by Florian : If TCA_FQ_QUANTUM is set to 0x80000000, fq_deueue() can loop forever in : if (f->credit <= 0) { f->credit += q->quantum; goto begin; } ... because f->credit is either 0 or -2147483648. Let's limit TCA_FQ_QUANTUM to no more than 1 << 20 : This max value should limit risks of breaking user setups while fixing this bug. Fixes: afe4fd062416 ("pkt_sched: fq: Fair Queue packet scheduler") Signed-off-by: Eric Dumazet <[email protected]> Diagnosed-by: Florian Westphal <[email protected]> Reported-by: [email protected] Signed-off-by: David S. Miller <[email protected]>
2020-01-08tipc: remove meaningless assignment in MakefileMasahiro Yamada1-2/+0
There is no module named tipc_diag. The assignment to tipc_diag-y has no effect. Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-08tipc: do not add socket.o to tipc-y twiceMasahiro Yamada1-1/+1
net/tipc/Makefile adds socket.o twice. tipc-y += addr.o bcast.o bearer.o \ core.o link.o discover.o msg.o \ name_distr.o subscr.o monitor.o name_table.o net.o \ netlink.o netlink_compat.o node.o socket.o eth_media.o \ ^^^^^^^^ topsrv.o socket.o group.o trace.o ^^^^^^^^ Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-07net/rose: remove redundant assignment to variable failedColin Ian King1-1/+0
The variable failed is being assigned a value that is never read, the following goto statement jumps to the end of the function and variable failed is not referenced at all. Remove the redundant assignment. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-07vlan: vlan_changelink() should propagate errorsEric Dumazet1-3/+7
Both vlan_dev_change_flags() and vlan_dev_set_egress_priority() can return an error. vlan_changelink() should not ignore them. Fixes: 07b5b17e157b ("[VLAN]: Use rtnl_link API") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-07vlan: fix memory leak in vlan_dev_set_egress_priorityEric Dumazet3-5/+8
There are few cases where the ndo_uninit() handler might be not called if an error happens while device is initialized. Since vlan_newlink() calls vlan_changelink() before trying to register the netdevice, we need to make sure vlan_dev_uninit() has been called at least once, or we might leak allocated memory. BUG: memory leak unreferenced object 0xffff888122a206c0 (size 32): comm "syz-executor511", pid 7124, jiffies 4294950399 (age 32.240s) hex dump (first 32 bytes): 00 00 00 00 00 00 61 73 00 00 00 00 00 00 00 00 ......as........ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000000eb3bb85>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<000000000eb3bb85>] slab_post_alloc_hook mm/slab.h:586 [inline] [<000000000eb3bb85>] slab_alloc mm/slab.c:3320 [inline] [<000000000eb3bb85>] kmem_cache_alloc_trace+0x145/0x2c0 mm/slab.c:3549 [<000000007b99f620>] kmalloc include/linux/slab.h:556 [inline] [<000000007b99f620>] vlan_dev_set_egress_priority+0xcc/0x150 net/8021q/vlan_dev.c:194 [<000000007b0cb745>] vlan_changelink+0xd6/0x140 net/8021q/vlan_netlink.c:126 [<0000000065aba83a>] vlan_newlink+0x135/0x200 net/8021q/vlan_netlink.c:181 [<00000000fb5dd7a2>] __rtnl_newlink+0x89a/0xb80 net/core/rtnetlink.c:3305 [<00000000ae4273a1>] rtnl_newlink+0x4e/0x80 net/core/rtnetlink.c:3363 [<00000000decab39f>] rtnetlink_rcv_msg+0x178/0x4b0 net/core/rtnetlink.c:5424 [<00000000accba4ee>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477 [<00000000319fe20f>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442 [<00000000d51938dc>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] [<00000000d51938dc>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328 [<00000000e539ac79>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917 [<000000006250c27e>] sock_sendmsg_nosec net/socket.c:639 [inline] [<000000006250c27e>] sock_sendmsg+0x54/0x70 net/socket.c:659 [<00000000e2a156d1>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330 [<000000008c87466e>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384 [<00000000110e3054>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417 [<00000000d71077c8>] __do_sys_sendmsg net/socket.c:2426 [inline] [<00000000d71077c8>] __se_sys_sendmsg net/socket.c:2424 [inline] [<00000000d71077c8>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424 Fixe: 07b5b17e157b ("[VLAN]: Use rtnl_link API") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-06sctp: free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLYXin Long1-10/+18
This patch is to fix a memleak caused by no place to free cmd->obj.chunk for the unprocessed SCTP_CMD_REPLY. This issue occurs when failing to process a cmd while there're still SCTP_CMD_REPLY cmds on the cmd seq with an allocated chunk in cmd->obj.chunk. So fix it by freeing cmd->obj.chunk for each SCTP_CMD_REPLY cmd left on the cmd seq when any cmd returns error. While at it, also remove 'nomem' label. Reported-by: [email protected] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-06tipc: eliminate KMSAN: uninit-value in __tipc_nl_compat_dumpit errorYing Xue1-2/+2
syzbot found the following crash on: ===================================================== BUG: KMSAN: uninit-value in __nlmsg_parse include/net/netlink.h:661 [inline] BUG: KMSAN: uninit-value in nlmsg_parse_deprecated include/net/netlink.h:706 [inline] BUG: KMSAN: uninit-value in __tipc_nl_compat_dumpit+0x553/0x11e0 net/tipc/netlink_compat.c:215 CPU: 0 PID: 12425 Comm: syz-executor062 Not tainted 5.5.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0x128/0x220 mm/kmsan/kmsan_report.c:108 __msan_warning+0x57/0xa0 mm/kmsan/kmsan_instr.c:245 __nlmsg_parse include/net/netlink.h:661 [inline] nlmsg_parse_deprecated include/net/netlink.h:706 [inline] __tipc_nl_compat_dumpit+0x553/0x11e0 net/tipc/netlink_compat.c:215 tipc_nl_compat_dumpit+0x761/0x910 net/tipc/netlink_compat.c:308 tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline] tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311 genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline] genl_family_rcv_msg net/netlink/genetlink.c:717 [inline] genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477 genl_rcv+0x63/0x80 net/netlink/genetlink.c:745 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x444179 Code: 18 89 d0 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 1b d8 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007ffd2d6409c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000444179 RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003 RBP: 00000000006ce018 R08: 0000000000000000 R09: 00000000004002e0 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000401e20 R13: 0000000000401eb0 R14: 0000000000000000 R15: 0000000000000000 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:149 [inline] kmsan_internal_poison_shadow+0x5c/0x110 mm/kmsan/kmsan.c:132 kmsan_slab_alloc+0x8a/0xe0 mm/kmsan/kmsan_hooks.c:86 slab_alloc_node mm/slub.c:2774 [inline] __kmalloc_node_track_caller+0xe47/0x11f0 mm/slub.c:4382 __kmalloc_reserve net/core/skbuff.c:141 [inline] __alloc_skb+0x309/0xa50 net/core/skbuff.c:209 alloc_skb include/linux/skbuff.h:1049 [inline] nlmsg_new include/net/netlink.h:888 [inline] tipc_nl_compat_dumpit+0x6e4/0x910 net/tipc/netlink_compat.c:301 tipc_nl_compat_handle net/tipc/netlink_compat.c:1252 [inline] tipc_nl_compat_recv+0x12e9/0x2870 net/tipc/netlink_compat.c:1311 genl_family_rcv_msg_doit net/netlink/genetlink.c:672 [inline] genl_family_rcv_msg net/netlink/genetlink.c:717 [inline] genl_rcv_msg+0x1dd0/0x23a0 net/netlink/genetlink.c:734 netlink_rcv_skb+0x431/0x620 net/netlink/af_netlink.c:2477 genl_rcv+0x63/0x80 net/netlink/genetlink.c:745 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xfa0/0x1100 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x11f0/0x1480 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] ____sys_sendmsg+0x1362/0x13f0 net/socket.c:2330 ___sys_sendmsg net/socket.c:2384 [inline] __sys_sendmsg+0x4f0/0x5e0 net/socket.c:2417 __do_sys_sendmsg net/socket.c:2426 [inline] __se_sys_sendmsg+0x97/0xb0 net/socket.c:2424 __x64_sys_sendmsg+0x4a/0x70 net/socket.c:2424 do_syscall_64+0xb6/0x160 arch/x86/entry/common.c:295 entry_SYSCALL_64_after_hwframe+0x44/0xa9 ===================================================== The complaint above occurred because the memory region pointed by attrbuf variable was not initialized. To eliminate this warning, we use kcalloc() rather than kmalloc_array() to allocate memory for attrbuf. Reported-by: [email protected] Signed-off-by: Ying Xue <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-06netfilter: flowtable: add nf_flowtable_time_stampPablo Neira Ayuso3-10/+5
This patch adds nf_flowtable_time_stamp and updates the existing code to use it. This patch is also implicitly fixing up hardware statistic fetching via nf_flow_offload_stats() where casting to u32 is missing. Use nf_flow_timeout_delta() to fix this. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Pablo Neira Ayuso <[email protected]> Acked-by: wenxu <[email protected]>
2020-01-05net: dsa: Pass pcs_poll flag from driver to PHYLINKVladimir Oltean1-0/+1
The DSA drivers that implement .phylink_mac_link_state should normally register an interrupt for the PCS, from which they should call phylink_mac_change(). However not all switches implement this, and those who don't should set this flag in dsa_switch in the .setup callback, so that PHYLINK will poll for a few ms until the in-band AN link timer expires and the PCS state settles. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-05net: dsa: tag_sja1105: Slightly improve the Xmas tree in sja1105_xmitVladimir Oltean1-2/+1
This is a cosmetic patch that makes the dp, tx_vid, queue_mapping and pcp local variable definitions a bit closer in length, so they don't look like an eyesore as much. The 'ds' variable is not used otherwise, except for ds->dp. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-05net: dsa: Make deferred_xmit private to sja1105Vladimir Oltean3-39/+15
There are 3 things that are wrong with the DSA deferred xmit mechanism: 1. Its introduction has made the DSA hotpath ever so slightly more inefficient for everybody, since DSA_SKB_CB(skb)->deferred_xmit needs to be initialized to false for every transmitted frame, in order to figure out whether the driver requested deferral or not (a very rare occasion, rare even for the only driver that does use this mechanism: sja1105). That was necessary to avoid kfree_skb from freeing the skb. 2. Because L2 PTP is a link-local protocol like STP, it requires management routes and deferred xmit with this switch. But as opposed to STP, the deferred work mechanism needs to schedule the packet rather quickly for the TX timstamp to be collected in time and sent to user space. But there is no provision for controlling the scheduling priority of this deferred xmit workqueue. Too bad this is a rather specific requirement for a feature that nobody else uses (more below). 3. Perhaps most importantly, it makes the DSA core adhere a bit too much to the NXP company-wide policy "Innovate Where It Doesn't Matter". The sja1105 is probably the only DSA switch that requires some frames sent from the CPU to be routed to the slave port via an out-of-band configuration (register write) rather than in-band (DSA tag). And there are indeed very good reasons to not want to do that: if that out-of-band register is at the other end of a slow bus such as SPI, then you limit that Ethernet flow's throughput to effectively the throughput of the SPI bus. So hardware vendors should definitely not be encouraged to design this way. We do _not_ want more widespread use of this mechanism. Luckily we have a solution for each of the 3 issues: For 1, we can just remove that variable in the skb->cb and counteract the effect of kfree_skb with skb_get, much to the same effect. The advantage, of course, being that anybody who doesn't use deferred xmit doesn't need to do any extra operation in the hotpath. For 2, we can create a kernel thread for each port's deferred xmit work. If the user switch ports are named swp0, swp1, swp2, the kernel threads will be named swp0_xmit, swp1_xmit, swp2_xmit (there appears to be a 15 character length limit on kernel thread names). With this, the user can change the scheduling priority with chrt $(pidof swp2_xmit). For 3, we can actually move the entire implementation to the sja1105 driver. So this patch deletes the generic implementation from the DSA core and adds a new one, more adequate to the requirements of PTP TX timestamping, in sja1105_main.c. Suggested-by: Florian Fainelli <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-05net: qrtr: fix len of skb_put_padto in qrtr_node_enqueueCarl Huang1-1/+1
The len used for skb_put_padto is wrong, it need to add len of hdr. In qrtr_node_enqueue, local variable size_t len is assign with skb->len, then skb_push(skb, sizeof(*hdr)) will add skb->len with sizeof(*hdr), so local variable size_t len is not same with skb->len after skb_push(skb, sizeof(*hdr)). Then the purpose of skb_put_padto(skb, ALIGN(len, 4)) is to add add pad to the end of the skb's data if skb->len is not aligned to 4, but unfortunately it use len instead of skb->len, at this line, skb->len is 32 bytes(sizeof(*hdr)) more than len, for example, len is 3 bytes, then skb->len is 35 bytes(3 + 32), and ALIGN(len, 4) is 4 bytes, so __skb_put_padto will do nothing after check size(35) < len(4), the correct value should be 36(sizeof(*hdr) + ALIGN(len, 4) = 32 + 4), then __skb_put_padto will pass check size(35) < len(36) and add 1 byte to the end of skb's data, then logic is correct. function of skb_push: void *skb_push(struct sk_buff *skb, unsigned int len) { skb->data -= len; skb->len += len; if (unlikely(skb->data < skb->head)) skb_under_panic(skb, len, __builtin_return_address(0)); return skb->data; } function of skb_put_padto static inline int skb_put_padto(struct sk_buff *skb, unsigned int len) { return __skb_put_padto(skb, len, true); } function of __skb_put_padto static inline int __skb_put_padto(struct sk_buff *skb, unsigned int len, bool free_on_error) { unsigned int size = skb->len; if (unlikely(size < len)) { len -= size; if (__skb_pad(skb, len, free_on_error)) return -ENOMEM; __skb_put(skb, len); } return 0; } Signed-off-by: Carl Huang <[email protected]> Signed-off-by: Wen Gong <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-05netfilter: nf_tables: unbind callbacks from flowtable destroy pathPablo Neira Ayuso1-2/+6
Callback unbinding needs to be done after nf_flow_table_free(), otherwise entries are not removed from the hardware. Update nft_unregister_flowtable_net_hooks() to call nf_unregister_net_hook() instead since the commit/abort paths do not deal with the callback unbinding anymore. Add a comment to nft_flowtable_event() to clarify that flow_offload_netdev_event() already removes the entries before the callback unbinding. Fixes: 8bb69f3b2918 ("netfilter: nf_tables: add flowtable offload control plane") Fixes ff4bf2f42a40 ("netfilter: nf_tables: add nft_unregister_flowtable_hook()") Signed-off-by: Pablo Neira Ayuso <[email protected]> Acked-by: wenxu <[email protected]>
2020-01-05netfilter: nf_flow_table_offload: fix the nat port mangle.wenxu1-8/+16
Shift on 32-bit word to define the port number depends on the flow direction. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Fixes: 7acd9378dc652 ("netfilter: nf_flow_table_offload: Correct memcpy size for flow_overload_mangle()") Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-05netfilter: nf_flow_table_offload: check the status of dst_neighwenxu1-2/+14
It is better to get the dst_neigh with neigh->lock and check the nud_state is VALID. If there is not neigh previous, the lookup will Create a non NUD_VALID with 00:00:00:00:00:00 mac. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-05netfilter: nf_flow_table_offload: fix incorrect ethernet dst addresswenxu1-2/+4
Ethernet destination for original traffic takes the source ethernet address in the reply direction. For reply traffic, this takes the source ethernet address of the original direction. Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-05netfilter: nft_flow_offload: fix underflow in flowtable reference counterwenxu1-3/+0
The .deactivate and .activate interfaces already deal with the reference counter. Otherwise, this results in spurious "Device is busy" errors. Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression") Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-01-04Bluetooth: Auto tune if input MTU is set to 0Luiz Augusto von Dentz1-2/+52
This enables the code to set the input MTU using the underline link packet types when set to 0, previously this would likely be rejected by the remote peer since it would be bellow the minimal of 48 for BR/EDR or 23 for LE, that way it shall be safe to use 0 without causing any side effects. This is convenient for the likes of A2DP transport, see: https://habr.com/en/post/456182/ Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-01-04Bluetooth: Add support for LE PHY Update Complete eventLuiz Augusto von Dentz1-0/+27
This handles LE PHY Update Complete event and store both tx_phy and rx_phy into hci_conn. Signed-off-by: Luiz Augusto von Dentz <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-01-04Bluetooth: Remove usage of BT_ERR_RATELIMITED macroMarcel Holtmann1-8/+6
The macro is really not needed and can be replaced with either usage of bt_err_ratelimited or bt_dev_err_ratelimited. Signed-off-by: Marcel Holtmann <[email protected]> Signed-off-by: Johan Hedberg <[email protected]>
2020-01-04Bluetooth: Adding a bt_dev_warn_ratelimited macro.Alain Michaud1-0/+16
The macro will be used to display rate limited warning messages in the log. Signed-off-by: Alain Michaud <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>
2020-01-03l2tp: Remove redundant BUG_ON() check in l2tp_pernetXu Wang1-2/+0
Passing NULL to l2tp_pernet causes a crash via BUG_ON. Dereferencing net in net_generic() also has the same effect. This patch removes the redundant BUG_ON check on the same parameter. Signed-off-by: Xu Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-03net: Remove redundant BUG_ON() check in phonet_pernetXu Wang1-2/+0
Passing NULL to phonet_pernet causes a crash via BUG_ON. Dereferencing net in net_generic() also has the same effect. This patch removes the redundant BUG_ON check on the same parameter. Signed-off-by: Xu Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-03net: remove the check argument from __skb_gro_checksum_convertLi RongQing3-3/+3
The argument is always ignored, so remove it. Signed-off-by: Li RongQing <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-03ethtool: remove set but not used variable 'lsettings'YueHaibing1-2/+0
Fixes gcc '-Wunused-but-set-variable' warning: net/ethtool/linkmodes.c: In function 'ethnl_set_linkmodes': net/ethtool/linkmodes.c:326:32: warning: variable 'lsettings' set but not used [-Wunused-but-set-variable] struct ethtool_link_settings *lsettings; ^ It is never used, so remove it. Reported-by: Hulk Robot <[email protected]> Signed-off-by: YueHaibing <[email protected]> Reviewed-by: Michal Kubecek <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-02tcp: use REXMIT_NEW instead of magic numberMao Wenan1-1/+1
REXMIT_NEW is a macro for "FRTO-style transmit of unsent/new packets", this patch makes it more readable. Signed-off-by: Mao Wenan <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-02sch_cake: avoid possible divide by zero in cake_enqueue()Wen Yang1-1/+1
The variables 'window_interval' is u64 and do_div() truncates it to 32 bits, which means it can test non-zero and be truncated to zero for division. The unit of window_interval is nanoseconds, so its lower 32-bit is relatively easy to exceed. Fix this issue by using div64_u64() instead. Fixes: 7298de9cd725 ("sch_cake: Add ingress mode") Signed-off-by: Wen Yang <[email protected]> Cc: Kevin Darbyshire-Bryant <[email protected]> Cc: Toke Høiland-Jørgensen <[email protected]> Cc: David S. Miller <[email protected]> Cc: Cong Wang <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Acked-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-02net: Add device index to tcp_md5sigDavid Ahern2-1/+37
Add support for userspace to specify a device index to limit the scope of an entry via the TCP_MD5SIG_EXT setsockopt. The existing __tcpm_pad is renamed to tcpm_ifindex and the new field is only checked if the new TCP_MD5SIG_FLAG_IFINDEX is set in tcpm_flags. For now, the device index must point to an L3 master device (e.g., VRF). The API and error handling are setup to allow the constraint to be relaxed in the future to any device index. Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-02tcp: Add l3index to tcp_md5sig_key and md5 functionsDavid Ahern2-39/+101
Add l3index to tcp_md5sig_key to represent the L3 domain of a key, and add l3index to tcp_md5_do_add and tcp_md5_do_del to fill in the key. With the key now based on an l3index, add the new parameter to the lookup functions and consider the l3index when looking for a match. The l3index comes from the skb when processing ingress packets leveraging the helpers created for socket lookups, tcp_v4_sdif and inet_iif (and the v6 variants). When the sdif index is set it means the packet ingressed a device that is part of an L3 domain and inet_iif points to the VRF device. For egress, the L3 domain is determined from the socket binding and sk_bound_dev_if. Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-02ipv4/tcp: Pass dif and sdif to tcp_v4_inbound_md5_hashDavid Ahern1-5/+8
The original ingress device index is saved to the cb space of the skb and the cb is moved during tcp processing. Since tcp_v4_inbound_md5_hash can be called before and after the cb move, pass dif and sdif to it so the caller can save both prior to the cb move. Both are used by a later patch. Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-02ipv6/tcp: Pass dif and sdif to tcp_v6_inbound_md5_hashDavid Ahern1-6/+9
The original ingress device index is saved to the cb space of the skb and the cb is moved during tcp processing. Since tcp_v6_inbound_md5_hash can be called before and after the cb move, pass dif and sdif to it so the caller can save both prior to the cb move. Both are used by a later patch. Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-02ipv4/tcp: Use local variable for tcp_md5_addrDavid Ahern1-17/+26
Extract the typecast to (union tcp_md5_addr *) to a local variable rather than the current long, inline declaration with function calls. No functional change intended. Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-02tcp: fix "old stuff" D-SACK causing SACK to be treated as D-SACKPengcheng Yang1-1/+4
When we receive a D-SACK, where the sequence number satisfies: undo_marker <= start_seq < end_seq <= prior_snd_una we consider this is a valid D-SACK and tcp_is_sackblock_valid() returns true, then this D-SACK is discarded as "old stuff", but the variable first_sack_index is not marked as negative in tcp_sacktag_write_queue(). If this D-SACK also carries a SACK that needs to be processed (for example, the previous SACK segment was lost), this SACK will be treated as a D-SACK in the following processing of tcp_sacktag_write_queue(), which will eventually lead to incorrect updates of undo_retrans and reordering. Fixes: fd6dad616d4f ("[TCP]: Earlier SACK block verification & simplify access to them") Signed-off-by: Pengcheng Yang <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-02page_pool: help compiler remove code in case CONFIG_NUMA=nJesper Dangaard Brouer1-0/+9
When kernel is compiled without NUMA support, then page_pool NUMA config setting (pool->p.nid) doesn't make any practical sense. The compiler cannot see that it can remove the code paths. This patch avoids reading pool->p.nid setting in case of !CONFIG_NUMA, in allocation and numa check code, which helps compiler to see the optimisation potential. It leaves update code intact to keep API the same. $ ./scripts/bloat-o-meter net/core/page_pool.o-numa-enabled \ net/core/page_pool.o-numa-disabled add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-113 (-113) Function old new delta page_pool_create 401 398 -3 __page_pool_alloc_pages_slow 439 426 -13 page_pool_refill_alloc_cache 425 328 -97 Total: Before=3611, After=3498, chg -3.13% Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-02page_pool: handle page recycle for NUMA_NO_NODE conditionJesper Dangaard Brouer1-19/+61
The check in pool_page_reusable (page_to_nid(page) == pool->p.nid) is not valid if page_pool was configured with pool->p.nid = NUMA_NO_NODE. The goal of the NUMA changes in commit d5394610b1ba ("page_pool: Don't recycle non-reusable pages"), were to have RX-pages that belongs to the same NUMA node as the CPU processing RX-packet during softirq/NAPI. As illustrated by the performance measurements. This patch moves the NAPI checks out of fast-path, and at the same time solves the NUMA_NO_NODE issue. First realize that alloc_pages_node() with pool->p.nid = NUMA_NO_NODE will lookup current CPU nid (Numa ID) via numa_mem_id(), which is used as the the preferred nid. It is only in rare situations, where e.g. NUMA zone runs dry, that page gets doesn't get allocated from preferred nid. The page_pool API allows drivers to control the nid themselves via controlling pool->p.nid. This patch moves the NAPI check to when alloc cache is refilled, via dequeuing/consuming pages from the ptr_ring. Thus, we can allow placing pages from remote NUMA into the ptr_ring, as the dequeue/consume step will check the NUMA node. All current drivers using page_pool will alloc/refill RX-ring from same CPU running softirq/NAPI process. Drivers that control the nid explicitly, also use page_pool_update_nid when changing nid runtime. To speed up transision to new nid the alloc cache is now flushed on nid changes. This force pages to come from ptr_ring, which does the appropate nid check. For the NUMA_NO_NODE case, when a NIC IRQ is moved to another NUMA node, we accept that transitioning the alloc cache doesn't happen immediately. The preferred nid change runtime via consulting numa_mem_id() based on the CPU processing RX-packets. Notice, to avoid stressing the page buddy allocator and avoid doing too much work under softirq with preempt disabled, the NUMA check at ptr_ring dequeue will break the refill cycle, when detecting a NUMA mismatch. This will cause a slower transition, but its done on purpose. Fixes: d5394610b1ba ("page_pool: Don't recycle non-reusable pages") Reported-by: Li RongQing <[email protected]> Reported-by: Yunsheng Lin <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-02mac80211: mesh: restrict airtime metric to peered established plinksMarkus Theil1-0/+3
The following warning is triggered every time an unestablished mesh peer gets dumped. Checks if a peer link is established before retrieving the airtime link metric. [ 9563.022567] WARNING: CPU: 0 PID: 6287 at net/mac80211/mesh_hwmp.c:345 airtime_link_metric_get+0xa2/0xb0 [mac80211] [ 9563.022697] Hardware name: PC Engines apu2/apu2, BIOS v4.10.0.3 [ 9563.022756] RIP: 0010:airtime_link_metric_get+0xa2/0xb0 [mac80211] [ 9563.022838] Call Trace: [ 9563.022897] sta_set_sinfo+0x936/0xa10 [mac80211] [ 9563.022964] ieee80211_dump_station+0x6d/0x90 [mac80211] [ 9563.023062] nl80211_dump_station+0x154/0x2a0 [cfg80211] [ 9563.023120] netlink_dump+0x17b/0x370 [ 9563.023130] netlink_recvmsg+0x2a4/0x480 [ 9563.023140] ____sys_recvmsg+0xa6/0x160 [ 9563.023154] ___sys_recvmsg+0x93/0xe0 [ 9563.023169] __sys_recvmsg+0x7e/0xd0 [ 9563.023210] do_syscall_64+0x4e/0x140 [ 9563.023217] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Signed-off-by: Markus Theil <[email protected]> Link: https://lore.kernel.org/r/[email protected] [rewrite commit message] Signed-off-by: Johannes Berg <[email protected]>
2020-01-01batman-adv: Disable CONFIG_BATMAN_ADV_SYSFS by defaultSven Eckelmann1-1/+0
The sysfs support in batman-adv is deprecated since a while and will be removed completely next year. All tools which were known to the batman-adv development team are supporting the batman-adv netlink interface since a while. Thus disabling CONFIG_BATMAN_ADV_SYSFS by default should not cause problems on most systems. It is still possible to enable it in case it is still required in a specific setup. Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2020-01-01batman-adv: Update copyright years for 2020Sven Eckelmann61-61/+61
Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2019-12-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller41-295/+330
Simple overlapping changes in bpf land wrt. bpf_helper_defs.h handling. Signed-off-by: David S. Miller <[email protected]>
2019-12-30hsr: fix slab-out-of-bounds Read in hsr_debugfs_rename()Taehee Yoo1-1/+2
hsr slave interfaces don't have debugfs directory. So, hsr_debugfs_rename() shouldn't be called when hsr slave interface name is changed. Test commands: ip link add dummy0 type dummy ip link add dummy1 type dummy ip link add hsr0 type hsr slave1 dummy0 slave2 dummy1 ip link set dummy0 name ap Splat looks like: [21071.899367][T22666] ap: renamed from dummy0 [21071.914005][T22666] ================================================================== [21071.919008][T22666] BUG: KASAN: slab-out-of-bounds in hsr_debugfs_rename+0xaa/0xb0 [hsr] [21071.923640][T22666] Read of size 8 at addr ffff88805febcd98 by task ip/22666 [21071.926941][T22666] [21071.927750][T22666] CPU: 0 PID: 22666 Comm: ip Not tainted 5.5.0-rc2+ #240 [21071.929919][T22666] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [21071.935094][T22666] Call Trace: [21071.935867][T22666] dump_stack+0x96/0xdb [21071.936687][T22666] ? hsr_debugfs_rename+0xaa/0xb0 [hsr] [21071.937774][T22666] print_address_description.constprop.5+0x1be/0x360 [21071.939019][T22666] ? hsr_debugfs_rename+0xaa/0xb0 [hsr] [21071.940081][T22666] ? hsr_debugfs_rename+0xaa/0xb0 [hsr] [21071.940949][T22666] __kasan_report+0x12a/0x16f [21071.941758][T22666] ? hsr_debugfs_rename+0xaa/0xb0 [hsr] [21071.942674][T22666] kasan_report+0xe/0x20 [21071.943325][T22666] hsr_debugfs_rename+0xaa/0xb0 [hsr] [21071.944187][T22666] hsr_netdev_notify+0x1fe/0x9b0 [hsr] [21071.945052][T22666] ? __module_text_address+0x13/0x140 [21071.945897][T22666] notifier_call_chain+0x90/0x160 [21071.946743][T22666] dev_change_name+0x419/0x840 [21071.947496][T22666] ? __read_once_size_nocheck.constprop.6+0x10/0x10 [21071.948600][T22666] ? netdev_adjacent_rename_links+0x280/0x280 [21071.949577][T22666] ? __read_once_size_nocheck.constprop.6+0x10/0x10 [21071.950672][T22666] ? lock_downgrade+0x6e0/0x6e0 [21071.951345][T22666] ? do_setlink+0x811/0x2ef0 [21071.951991][T22666] do_setlink+0x811/0x2ef0 [21071.952613][T22666] ? is_bpf_text_address+0x81/0xe0 [ ... ] Reported-by: [email protected] Fixes: 4c2d5e33dcd3 ("hsr: rename debugfs file when interface name is changed") Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-12-30net/sched: add delete_empty() to filters and use it in cls_flowerDavide Caratti3-51/+17
Revert "net/sched: cls_u32: fix refcount leak in the error path of u32_change()", and fix the u32 refcount leak in a more generic way that preserves the semantic of rule dumping. On tc filters that don't support lockless insertion/removal, there is no need to guard against concurrent insertion when a removal is in progress. Therefore, for most of them we can avoid a full walk() when deleting, and just decrease the refcount, like it was done on older Linux kernels. This fixes situations where walk() was wrongly detecting a non-empty filter, like it happened with cls_u32 in the error path of change(), thus leading to failures in the following tdc selftests: 6aa7: (filter, u32) Add/Replace u32 with source match and invalid indev 6658: (filter, u32) Add/Replace u32 with custom hash table and invalid handle 74c2: (filter, u32) Add/Replace u32 filter with invalid hash table id On cls_flower, and on (future) lockless filters, this check is necessary: move all the check_empty() logic in a callback so that each filter can have its own implementation. For cls_flower, it's sufficient to check if no IDRs have been allocated. This reverts commit 275c44aa194b7159d1191817b20e076f55f0e620. Changes since v1: - document the need for delete_empty() when TCF_PROTO_OPS_DOIT_UNLOCKED is used, thanks to Vlad Buslov - implement delete_empty() without doing fl_walk(), thanks to Vlad Buslov - squash revert and new fix in a single patch, to be nice with bisect tests that run tdc on u32 filter, thanks to Dave Miller Fixes: 275c44aa194b ("net/sched: cls_u32: fix refcount leak in the error path of u32_change()") Fixes: 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty") Suggested-by: Jamal Hadi Salim <[email protected]> Suggested-by: Vlad Buslov <[email protected]> Signed-off-by: Davide Caratti <[email protected]> Reviewed-by: Vlad Buslov <[email protected]> Tested-by: Jamal Hadi Salim <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-12-30net/ncsi: Fix gma flag setting after responseVijay Khemka2-3/+6
gma_flag was set at the time of GMA command request but it should only be set after getting successful response. Movinng this flag setting in GMA response handler. This flag is used mainly for not repeating GMA command once received MAC address. Signed-off-by: Vijay Khemka <[email protected]> Reviewed-by: Samuel Mendoza-Jonas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-12-30sctp: add enabled check for path tracepoint loop.Kevin Kou1-2/+3
sctp_outq_sack is the main function handles SACK, it is called very frequently. As the commit "move trace_sctp_probe_path into sctp_outq_sack" added below code to this function, sctp tracepoint is disabled most of time, but the loop of transport list will be always called even though the tracepoint is disabled, this is unnecessary. + /* SCTP path tracepoint for congestion control debugging. */ + list_for_each_entry(transport, transport_list, transports) { + trace_sctp_probe_path(transport, asoc); + } This patch is to add tracepoint enabled check at outside of the loop of transport list, and avoid traversing the loop when trace is disabled, it is a small optimization. Signed-off-by: Kevin Kou <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-12-30tcp: Fix highest_sack and highest_sack_seqCambda Zhu1-0/+3
>From commit 50895b9de1d3 ("tcp: highest_sack fix"), the logic about setting tp->highest_sack to the head of the send queue was removed. Of course the logic is error prone, but it is logical. Before we remove the pointer to the highest sack skb and use the seq instead, we need to set tp->highest_sack to NULL when there is no skb after the last sack, and then replace NULL with the real skb when new skb inserted into the rtx queue, because the NULL means the highest sack seq is tp->snd_nxt. If tp->highest_sack is NULL and new data sent, the next ACK with sack option will increase tp->reordering unexpectedly. This patch sets tp->highest_sack to the tail of the rtx queue if it's NULL and new data is sent. The patch keeps the rule that the highest_sack can only be maintained by sack processing, except for this only case. Fixes: 50895b9de1d3 ("tcp: highest_sack fix") Signed-off-by: Cambda Zhu <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-12-30tcp_cubic: refactor code to perform a divide only when neededEric Dumazet1-23/+28
Neal Cardwell suggested to not change ca->delay_min and apply the ack delay cushion only when Hystart ACK train is still under consideration. This should avoid a 64bit divide unless needed. Tested: 40Gbit(mlx4) testbed (with sch_fq as packet scheduler) $ echo -n 'file tcp_cubic.c +p' >/sys/kernel/debug/dynamic_debug/control $ nstat -n;for f in {1..10}; do ./super_netperf 1 -H lpaa24 -l -4000000; done;nstat|egrep "Hystart" 14815 16280 15293 15563 11574 15145 14789 18548 16972 12520 TcpExtTCPHystartTrainDetect 10 0.0 TcpExtTCPHystartTrainCwnd 1396 0.0 $ dmesg | tail -10 [ 4873.951350] hystart_ack_train (116 > 93) delay_min 24 (+ ack_delay 69) cwnd 80 [ 4875.155379] hystart_ack_train (55 > 50) delay_min 21 (+ ack_delay 29) cwnd 160 [ 4876.333921] hystart_ack_train (69 > 62) delay_min 23 (+ ack_delay 39) cwnd 130 [ 4877.519037] hystart_ack_train (69 > 60) delay_min 22 (+ ack_delay 38) cwnd 130 [ 4878.701559] hystart_ack_train (87 > 63) delay_min 24 (+ ack_delay 39) cwnd 160 [ 4879.844597] hystart_ack_train (93 > 50) delay_min 21 (+ ack_delay 29) cwnd 216 [ 4880.956650] hystart_ack_train (74 > 67) delay_min 20 (+ ack_delay 47) cwnd 108 [ 4882.098500] hystart_ack_train (61 > 57) delay_min 23 (+ ack_delay 34) cwnd 130 [ 4883.262056] hystart_ack_train (72 > 67) delay_min 21 (+ ack_delay 46) cwnd 130 [ 4884.418760] hystart_ack_train (74 > 67) delay_min 29 (+ ack_delay 38) cwnd 152 10Gbit(bnx2x) testbed (with sch_fq as packet scheduler) $ echo -n 'file tcp_cubic.c +p' >/sys/kernel/debug/dynamic_debug/control $ nstat -n;for f in {1..10}; do ./super_netperf 1 -H lpk52 -l -4000000; done;nstat|egrep "Hystart" 7050 7065 7100 6900 7202 7263 7189 6869 7463 7034 TcpExtTCPHystartTrainDetect 10 0.0 TcpExtTCPHystartTrainCwnd 3199 0.0 $ dmesg | tail -10 [ 176.920012] hystart_ack_train (161 > 141) delay_min 83 (+ ack_delay 58) cwnd 264 [ 179.144645] hystart_ack_train (164 > 159) delay_min 120 (+ ack_delay 39) cwnd 444 [ 181.354527] hystart_ack_train (214 > 168) delay_min 125 (+ ack_delay 43) cwnd 436 [ 183.539565] hystart_ack_train (170 > 147) delay_min 96 (+ ack_delay 51) cwnd 326 [ 185.727309] hystart_ack_train (177 > 160) delay_min 61 (+ ack_delay 99) cwnd 128 [ 187.947142] hystart_ack_train (184 > 167) delay_min 123 (+ ack_delay 44) cwnd 367 [ 190.166680] hystart_ack_train (230 > 153) delay_min 116 (+ ack_delay 37) cwnd 444 [ 192.327285] hystart_ack_train (210 > 206) delay_min 86 (+ ack_delay 120) cwnd 152 [ 194.511392] hystart_ack_train (173 > 151) delay_min 94 (+ ack_delay 57) cwnd 239 [ 196.736023] hystart_ack_train (149 > 146) delay_min 105 (+ ack_delay 41) cwnd 399 Fixes: 42f3a8aaae66 ("tcp_cubic: tweak Hystart detection for short RTT flows") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Neal Cardwell <[email protected]> Link: https://www.spinics.net/lists/netdev/msg621886.html Link: https://www.spinics.net/lists/netdev/msg621797.html Acked-by: Neal Cardwell <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-12-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller5-144/+352
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Remove #ifdef pollution around nf_ingress(), from Lukas Wunner. 2) Document ingress hook in netdevice, also from Lukas. 3) Remove htons() in tunnel metadata port netlink attributes, from Xin Long. 4) Missing erspan netlink attribute validation also from Xin Long. 5) Missing erspan version in tunnel, from Xin Long. 6) Missing attribute nest in NFTA_TUNNEL_KEY_OPTS_{VXLAN,ERSPAN} Patch from Xin Long. 7) Missing nla_nest_cancel() in tunnel netlink dump path, from Xin Long. 8) Remove two exported conntrack symbols with no clients, from Florian Westphal. 9) Add nft_meta_get_eval_time() helper to nft_meta, from Florian. 10) Add nft_meta_pkttype helper for loopback, also from Florian. 11) Add nft_meta_socket uid helper, from Florian Westphal. 12) Add nft_meta_cgroup helper, from Florian. 13) Add nft_meta_ifkind helper, from Florian. 14) Group all interface related meta selector, from Florian. 15) Add nft_prandom_u32() helper, from Florian. 16) Add nft_meta_rtclassid helper, from Florian. 17) Add support for matching on the slave device index, from Florian. This batch, among other things, contains updates for the netfilter tunnel netlink interface: This extension is still incomplete and lacking proper userspace support which is actually my fault, I did not find the time to go back and finish this. This update is breaking tunnel UAPI in some aspects to fix it but do it better sooner than never. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-12-30netfilter: arp_tables: init netns pointer in xt_tgchk_param structFlorian Westphal1-11/+16
We get crash when the targets checkentry function tries to make use of the network namespace pointer for arptables. When the net pointer got added back in 2010, only ip/ip6/ebtables were changed to initialize it, so arptables has this set to NULL. This isn't a problem for normal arptables because no existing arptables target has a checkentry function that makes use of par->net. However, direct users of the setsockopt interface can provide any target they want as long as its registered for ARP or UNPSEC protocols. syzkaller managed to send a semi-valid arptables rule for RATEEST target which is enough to trigger NULL deref: kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN RIP: xt_rateest_tg_checkentry+0x11d/0xb40 net/netfilter/xt_RATEEST.c:109 [..] xt_check_target+0x283/0x690 net/netfilter/x_tables.c:1019 check_target net/ipv4/netfilter/arp_tables.c:399 [inline] find_check_entry net/ipv4/netfilter/arp_tables.c:422 [inline] translate_table+0x1005/0x1d70 net/ipv4/netfilter/arp_tables.c:572 do_replace net/ipv4/netfilter/arp_tables.c:977 [inline] do_arpt_set_ctl+0x310/0x640 net/ipv4/netfilter/arp_tables.c:1456 Fixes: add67461240c1d ("netfilter: add struct net * to target parameters") Reported-by: [email protected] Signed-off-by: Florian Westphal <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-12-28net: dsa: Deny PTP on master if switch supports itVladimir Oltean1-0/+30
It is possible to kill PTP on a DSA switch completely and absolutely, until a reboot, with a simple command: tcpdump -i eth2 -j adapter_unsynced where eth2 is the switch's DSA master. Why? Well, in short, the PTP API in place today is a bit rudimentary and relies on applications to retrieve the TX timestamps by polling the error queue and looking at the cmsg structure. But there is no timestamp identification of any sorts (except whether it's HW or SW), you don't know how many more timestamps are there to come, which one is this one, from whom it is, etc. In other words, the SO_TIMESTAMPING API is fundamentally limited in that you can get a single HW timestamp from the stack. And the "-j adapter_unsynced" flag of tcpdump enables hardware timestamping. So let's imagine what happens when the DSA master decides it wants to deliver TX timestamps to the skb's socket too: - The timestamp that the user space sees is taken by the DSA master. Whereas the RX timestamp will eventually be overwritten by the DSA switch. So the RX and TX timestamps will be in different time bases (aka garbage). - The user space applications have no way to deal with the second (real) TX timestamp finally delivered by the DSA switch, or even to know to wait for it. Take ptp4l from the linuxptp project, for example. This is its behavior after running tcpdump, before the patch: ptp4l[172]: [6469.594] Unexpected data on socket err queue: ptp4l[172]: [6469.693] rms 8 max 16 freq -21257 +/- 11 delay 748 +/- 0 ptp4l[172]: [6469.711] Unexpected data on socket err queue: ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 05 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.721] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b1 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.838] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 06 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.848] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 13 02 ptp4l[172]: 0010 00 36 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 04 1a 45 05 7f ptp4l[172]: 0030 00 00 5e 05 41 32 27 c2 1a 68 00 04 9f ff fe 05 ptp4l[172]: 0040 de 06 00 01 ptp4l[172]: [6469.855] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 01 c6 b2 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: [6469.974] Unexpected data on socket err queue: ptp4l[172]: 0000 01 80 c2 00 00 0e 00 1f 7b 63 02 48 88 f7 10 02 ptp4l[172]: 0010 00 2c 00 00 02 00 00 00 00 00 00 00 00 00 00 00 ptp4l[172]: 0020 00 00 00 1f 7b ff fe 63 02 48 00 03 aa 07 00 fd ptp4l[172]: 0030 00 00 00 00 00 00 00 00 00 00 The ptp4l program itself is heavily patched to show this (more details here [0]). Otherwise, by default it just hangs. On the other hand, with the DSA patch to disallow HW timestamping applied: tcpdump -i eth2 -j adapter_unsynced tcpdump: SIOCSHWTSTAMP failed: Device or resource busy So it is a fact of life that PTP timestamping on the DSA master is incompatible with timestamping on the switch MAC, at least with the current API. And if the switch supports PTP, taking the timestamps from the switch MAC is highly preferable anyway, due to the fact that those don't contain the queuing latencies of the switch. So just disallow PTP on the DSA master if there is any PTP-capable switch attached. [0]: https://sourceforge.net/p/linuxptp/mailman/message/36880648/ Fixes: 0336369d3a4d ("net: dsa: forward hardware timestamping ioctls to switch driver") Signed-off-by: Vladimir Oltean <[email protected]> Acked-by: Richard Cochran <[email protected]> Signed-off-by: David S. Miller <[email protected]>