aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2019-03-03net: fixup address-space warnings in compat_mc_{get,set}sockopt()Ben Dooks1-4/+4
Add __user attributes in some of the casts in this function to avoid the following sparse warnings: net/compat.c:592:57: warning: cast removes address space of expression net/compat.c:592:57: warning: incorrect type in initializer (different address spaces) net/compat.c:592:57: expected struct compat_group_req [noderef] <asn:1>*gr32 net/compat.c:592:57: got void *<noident> net/compat.c:613:65: warning: cast removes address space of expression net/compat.c:613:65: warning: incorrect type in initializer (different address spaces) net/compat.c:613:65: expected struct compat_group_source_req [noderef] <asn:1>*gsr32 net/compat.c:613:65: got void *<noident> net/compat.c:634:60: warning: cast removes address space of expression net/compat.c:634:60: warning: incorrect type in initializer (different address spaces) net/compat.c:634:60: expected struct compat_group_filter [noderef] <asn:1>*gf32 net/compat.c:634:60: got void *<noident> net/compat.c:672:52: warning: cast removes address space of expression net/compat.c:672:52: warning: incorrect type in initializer (different address spaces) net/compat.c:672:52: expected struct compat_group_filter [noderef] <asn:1>*gf32 net/compat.c:672:52: got void *<noident> Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03net: dsa: Use prepare/commit phase in dsa_slave_vlan_rx_add_vid()Florian Fainelli1-3/+6
We were skipping the prepare phase which causes some problems with at least a couple of drivers: - mv88e6xxx chooses to skip programming VID = 0 with -EOPNOTSUPP in the prepare phase, but we would still try to force this VID since we would only call the commit phase and so we would get the driver to return -EINVAL instead - qca8k does not currently have a port_vlan_add() callback implemented, yet we would try to call that unconditionally leading to a NPD Fix both issues by conforming to the current model doing a prepare/commit phase, this makes us consistent throughout the code and assumptions. Reported-by: Heiner Kallweit <hkallweit1@gmail.com> Reported-by: Michal Vokáč <michal.vokac@ysoft.com> Fixes: 061f6a505ac3 ("net: dsa: Add ndo_vlan_rx_{add, kill}_vid implementation") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03sch_cake: Simplify logic in cake_select_tin()Toke Høiland-Jørgensen1-38/+23
With more modes added the logic in cake_select_tin() was getting a bit hairy, and it turns out we can actually simplify it quite a bit. This also allows us to get rid of one of the two diffserv parsing functions, which has the added benefit that already-zeroed DSCP fields won't get re-written. Suggested-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03sch_cake: Permit use of connmarks as tin classifiersKevin Darbyshire-Bryant1-7/+27
Add flag 'FWMARK' to enable use of firewall connmarks as tin selector. The connmark (skbuff->mark) needs to be in the range 1->tin_cnt ie. for diffserv3 the mark needs to be 1->3. Background Typically CAKE uses DSCP as the basis for tin selection. DSCP values are relatively easily changed as part of the egress path, usually with iptables & the mangle table, ingress is more challenging. CAKE is often used on the WAN interface of a residential gateway where passthrough of DSCP from the ISP is either missing or set to unhelpful values thus use of ingress DSCP values for tin selection isn't helpful in that environment. An approach to solving the ingress tin selection problem is to use CAKE's understanding of tc filters. Naive tc filters could match on source/destination port numbers and force tin selection that way, but multiple filters don't scale particularly well as each filter must be traversed whether it matches or not. e.g. a simple example to map 3 firewall marks to tins: MAJOR=$( tc qdisc show dev $DEV | head -1 | awk '{print $3}' ) tc filter add dev $DEV parent $MAJOR protocol all handle 0x01 fw action skbedit priority ${MAJOR}1 tc filter add dev $DEV parent $MAJOR protocol all handle 0x02 fw action skbedit priority ${MAJOR}2 tc filter add dev $DEV parent $MAJOR protocol all handle 0x03 fw action skbedit priority ${MAJOR}3 Another option is to use eBPF cls_act with tc filters e.g. MAJOR=$( tc qdisc show dev $DEV | head -1 | awk '{print $3}' ) tc filter add dev $DEV parent $MAJOR bpf da obj my-bpf-fwmark-to-class.o This has the disadvantages of a) needing someone to write & maintain the bpf program, b) a bpf toolchain to compile it and c) needing to hardcode the major number in the bpf program so it matches the cake instance (or forcing the cake instance to a particular major number) since the major number cannot be passed to the bpf program via tc command line. As already hinted at by the previous examples, it would be helpful to associate tins with something that survives the Internet path and ideally allows tin selection on both egress and ingress. Netfilter's conntrack permits setting an identifying mark on a connection which can also be restored to an ingress packet with tc action connmark e.g. tc filter add dev eth0 parent ffff: protocol all prio 10 u32 \ match u32 0 0 flowid 1:1 action connmark action mirred egress redirect dev ifb1 Since tc's connmark action has restored any connmark into skb->mark, any of the previous solutions are based upon it and in one form or another copy that mark to the skb->priority field where again CAKE picks this up. This change cuts out at least one of the (less intuitive & non-scalable) middlemen and permit direct access to skb->mark. Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03sch_cake: Make the dual modes fairerGeorge Amanakis1-29/+63
CAKE host fairness does not work well with TCP flows in dual-srchost and dual-dsthost setup. The reason is that ACKs generated by TCP flows are classified as sparse flows, and affect flow isolation from other hosts. Fix this by calculating host_load based only on the bulk flows a host generates. In a hash collision the host_bulk_flow_count values must be decremented on the old hosts and incremented on the new ones *if* the queue is in the bulk set. Reported-by: Pete Heist <peteheist@gmail.com> Signed-off-by: George Amanakis <gamanakis@gmail.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03net: dsa: add KSZ9893 switch tagging supportTristram Ha3-0/+37
KSZ9893 switch is similar to KSZ9477 switch except the ingress tail tag has 1 byte instead of 2 bytes. The size of the portmap is smaller and so the override and lookup bits are also moved. Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03appletalk: Fix use-after-free in atalk_proc_exitYueHaibing3-8/+36
KASAN report this: BUG: KASAN: use-after-free in pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71 Read of size 8 at addr ffff8881f41fe5b0 by task syz-executor.0/2806 CPU: 0 PID: 2806 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xfa/0x1ce lib/dump_stack.c:113 print_address_description+0x65/0x270 mm/kasan/report.c:187 kasan_report+0x149/0x18d mm/kasan/report.c:317 pde_subdir_find+0x12d/0x150 fs/proc/generic.c:71 remove_proc_entry+0xe8/0x420 fs/proc/generic.c:667 atalk_proc_exit+0x18/0x820 [appletalk] atalk_exit+0xf/0x5a [appletalk] __do_sys_delete_module kernel/module.c:1018 [inline] __se_sys_delete_module kernel/module.c:961 [inline] __x64_sys_delete_module+0x3dc/0x5e0 kernel/module.c:961 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fb2de6b9c58 EFLAGS: 00000246 ORIG_RAX: 00000000000000b0 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000200001c0 RBP: 0000000000000002 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007fb2de6ba6bc R13: 00000000004bccaa R14: 00000000006f6bc8 R15: 00000000ffffffff Allocated by task 2806: set_track mm/kasan/common.c:85 [inline] __kasan_kmalloc.constprop.3+0xa0/0xd0 mm/kasan/common.c:496 slab_post_alloc_hook mm/slab.h:444 [inline] slab_alloc_node mm/slub.c:2739 [inline] slab_alloc mm/slub.c:2747 [inline] kmem_cache_alloc+0xcf/0x250 mm/slub.c:2752 kmem_cache_zalloc include/linux/slab.h:730 [inline] __proc_create+0x30f/0xa20 fs/proc/generic.c:408 proc_mkdir_data+0x47/0x190 fs/proc/generic.c:469 0xffffffffc10c01bb 0xffffffffc10c0166 do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 2806: set_track mm/kasan/common.c:85 [inline] __kasan_slab_free+0x130/0x180 mm/kasan/common.c:458 slab_free_hook mm/slub.c:1409 [inline] slab_free_freelist_hook mm/slub.c:1436 [inline] slab_free mm/slub.c:2986 [inline] kmem_cache_free+0xa6/0x2a0 mm/slub.c:3002 pde_put+0x6e/0x80 fs/proc/generic.c:647 remove_proc_entry+0x1d3/0x420 fs/proc/generic.c:684 0xffffffffc10c031c 0xffffffffc10c0166 do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The buggy address belongs to the object at ffff8881f41fe500 which belongs to the cache proc_dir_entry of size 256 The buggy address is located 176 bytes inside of 256-byte region [ffff8881f41fe500, ffff8881f41fe600) The buggy address belongs to the page: page:ffffea0007d07f80 count:1 mapcount:0 mapping:ffff8881f6e69a00 index:0x0 flags: 0x2fffc0000000200(slab) raw: 02fffc0000000200 dead000000000100 dead000000000200 ffff8881f6e69a00 raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff8881f41fe480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc ffff8881f41fe500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb >ffff8881f41fe580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ^ ffff8881f41fe600: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb ffff8881f41fe680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb It should check the return value of atalk_proc_init fails, otherwise atalk_exit will trgger use-after-free in pde_subdir_find while unload the module.This patch fix error cleanup path of atalk_init Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-03appletalk: use remove_proc_subtree to simplify procfs codeYueHaibing1-39/+17
Use remove_proc_subtree to remove the whole subtree on cleanup.Also do some cleanup. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02net: sched: put back q.qlen into a single locationEric Dumazet2-9/+6
In the series fc8b81a5981f ("Merge branch 'lockless-qdisc-series'") John made the assumption that the data path had no need to read the qdisc qlen (number of packets in the qdisc). It is true when pfifo_fast is used as the root qdisc, or as direct MQ/MQPRIO children. But pfifo_fast can be used as leaf in class full qdiscs, and existing logic needs to access the child qlen in an efficient way. HTB breaks badly, since it uses cl->leaf.q->q.qlen in : htb_activate() -> WARN_ON() htb_dequeue_tree() to decide if a class can be htb_deactivated when it has no more packets. HFSC, DRR, CBQ, QFQ have similar issues, and some calls to qdisc_tree_reduce_backlog() also read q.qlen directly. Using qdisc_qlen_sum() (which iterates over all possible cpus) in the data path is a non starter. It seems we have to put back qlen in a central location, at least for stable kernels. For all qdisc but pfifo_fast, qlen is guarded by the qdisc lock, so the existing q.qlen{++|--} are correct. For 'lockless' qdisc (pfifo_fast so far), we need to use atomic_{inc|dec}() because the spinlock might be not held (for example from pfifo_fast_enqueue() and pfifo_fast_dequeue()) This patch adds atomic_qlen (in the same location than qlen) and renames the following helpers, since we want to express they can be used without qdisc lock, and that qlen is no longer percpu. - qdisc_qstats_cpu_qlen_dec -> qdisc_qstats_atomic_qlen_dec() - qdisc_qstats_cpu_qlen_inc -> qdisc_qstats_atomic_qlen_inc() Later (net-next) we might revert this patch by tracking all these qlen uses and replace them by a more efficient method (not having to access a precise qlen, but an empty/non_empty status that might be less expensive to maintain/track). Another possibility is to have a legacy pfifo_fast version that would be used when used a a child qdisc, since the parent qdisc needs a spinlock anyway. But then, future lockless qdiscs would also have the same problem. Fixes: 7e66016f2c65 ("net: sched: helpers to sum qlen and qlen for per cpu logic") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller47-2267/+1695
Pablo Neira Ayuso says: ==================== Netfilter/IPVS updates for net-next The following patchset contains Netfilter/IPVS updates for net-next: 1) Add .release_ops to properly unroll .select_ops, use it from nft_compat. After this change, we can remove list of extensions too to simplify this codebase. 2) Update amanda conntrack helper to support v3.4, from Florian Tham. 3) Get rid of the obsolete BUGPRINT macro in ebtables, from Florian Westphal. 4) Merge IPv4 and IPv6 masquerading infrastructure into one single module. From Florian Westphal. 5) Patchset to remove nf_nat_l3proto structure to get rid of indirections, from Florian Westphal. 6) Skip unnecessary conntrack timeout updates in case the value is still the same, also from Florian Westphal. 7) Remove unnecessary 'fall through' comments in empty switch cases, from Li RongQing. 8) Fix lookup to fixed size hashtable sets on big endian with 32-bit keys. 9) Incorrect logic to deactivate path of fixed size hashtable sets, element was being tested to self. 10) Remove nft_hash_key(), the bitmap set is always selected for 16-bit keys. 11) Use boolean whenever possible in IPVS codebase, from Andrea Claudi. 12) Enter close state in conntrack if RST matches exact sequence number, from Florian Westphal. 13) Initialize dst_cache in tunnel extension, from wenxu. 14) Pass protocol as u16 to xt_check_match and xt_check_target, from Li RongQing. 15) SCTP header is granted to be in a linear area from IPVS NAT handler, from Xin Long. 16) Don't steal packets coming from slave VRF device from the ip_sabotage_in() path, from David Ahern. 17) Fix unsafe update of basechain stats, from Li RongQing. 18) Make sure CONNTRACK_LOCKS is power of 2 to let compiler optimize modulo operation as bitwise AND, from Li RongQing. 19) Use device_attribute instead of internal definition in the IDLETIMER target, from Sami Tolvanen. 20) Merge redir, masq and IPv4/IPv6 NAT chain types, from Florian Westphal. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02Merge branch 'for-upstream' of ↵David S. Miller2-7/+50
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2019-03-02 Here's one more bluetooth-next pull request for the 5.1 kernel: - Added support for MediaTek MT7663U and MT7668U UART devices - Cleanups & fixes to the hci_qca driver - Fixed wakeup pin behavior for QCA6174A controller Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller22-42/+142
2019-03-02bpf: add bpf helper bpf_skb_ecn_set_cebrakmo1-0/+28
This patch adds a new bpf helper BPF_FUNC_skb_ecn_set_ce "int bpf_skb_ecn_set_ce(struct sk_buff *skb)". It is added to BPF_PROG_TYPE_CGROUP_SKB typed bpf_prog which currently can be attached to the ingress and egress path. The helper is needed because his type of bpf_prog cannot modify the skb directly. This helper is used to set the ECN field of ECN capable IP packets to ce (congestion encountered) in the IPv6 or IPv4 header of the skb. It can be used by a bpf_prog to manage egress or ingress network bandwdith limit per cgroupv2 by inducing an ECN response in the TCP sender. This works best when using DCTCP. Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Song Liu <songliubraving@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2019-03-02net: sit: fix memory leak in sit_init_net()Mao Wenan1-0/+1
If register_netdev() is failed to register sitn->fb_tunnel_dev, it will go to err_reg_dev and forget to free netdev(sitn->fb_tunnel_dev). BUG: memory leak unreferenced object 0xffff888378daad00 (size 512): comm "syz-executor.1", pid 4006, jiffies 4295121142 (age 16.115s) hex dump (first 32 bytes): 00 e6 ed c0 83 88 ff ff 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000d6dcb63e>] kvmalloc include/linux/mm.h:577 [inline] [<00000000d6dcb63e>] kvzalloc include/linux/mm.h:585 [inline] [<00000000d6dcb63e>] netif_alloc_netdev_queues net/core/dev.c:8380 [inline] [<00000000d6dcb63e>] alloc_netdev_mqs+0x600/0xcc0 net/core/dev.c:8970 [<00000000867e172f>] sit_init_net+0x295/0xa40 net/ipv6/sit.c:1848 [<00000000871019fa>] ops_init+0xad/0x3e0 net/core/net_namespace.c:129 [<00000000319507f6>] setup_net+0x2ba/0x690 net/core/net_namespace.c:314 [<0000000087db4f96>] copy_net_ns+0x1dc/0x330 net/core/net_namespace.c:437 [<0000000057efc651>] create_new_namespaces+0x382/0x730 kernel/nsproxy.c:107 [<00000000676f83de>] copy_namespaces+0x2ed/0x3d0 kernel/nsproxy.c:165 [<0000000030b74bac>] copy_process.part.27+0x231e/0x6db0 kernel/fork.c:1919 [<00000000fff78746>] copy_process kernel/fork.c:1713 [inline] [<00000000fff78746>] _do_fork+0x1bc/0xe90 kernel/fork.c:2224 [<000000001c2e0d1c>] do_syscall_64+0xc8/0x580 arch/x86/entry/common.c:290 [<00000000ec48bd44>] entry_SYSCALL_64_after_hwframe+0x49/0xbe [<0000000039acff8a>] 0xffffffffffffffff Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-02net: ipv4: Fix NULL pointer dereference in route lookupIdo Schimmel1-1/+1
When calculating the multipath hash for input routes the flow info is not available and therefore should not be used. Fixes: 24ba14406c5c ("route: Add multipath_hash in flowi_common to make user-define hash") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Cc: wenxu <wenxu@ucloud.cn> Acked-by: wenxu <wenxu@ucloud.cn> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: support 64bit rates for getsockopt(SO_MAX_PACING_RATE)Eric Dumazet1-2/+8
For legacy applications using 32bit variable, SO_MAX_PACING_RATE has to cap the returned value to 0xFFFFFFFF, meaning that rates above 34.35 Gbit are capped. This patch allows applications to read socket pacing rate at full resolution, if they provide a 64bit variable to store it, and the kernel is 64bit. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01net: support 64bit values for setsockopt(SO_MAX_PACING_RATE)Eric Dumazet1-5/+13
64bit kernels now support 64bit pacing rates. This commit changes setsockopt() to accept 64bit values provided by applications. Old applications providing 32bit value are still supported, but limited to the old 34Gbit limitation. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01switchdev: Remove unused transaction item queueFlorian Fainelli1-98/+2
There are no more in tree users of the switchdev_trans_item_{dequeue,enqueue} or switchdev_trans_item structure in the kernel since commit 00fc0c51e35b ("rocker: Change world_ops API and implementation to be switchdev independant"). Remove this unused code and update the documentation accordingly since. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01devlink: fix kdocJakub Kicinski1-7/+5
devlink suffers from a few kdoc warnings: net/core/devlink.c:5292: warning: Function parameter or member 'dev' not described in 'devlink_register' net/core/devlink.c:5351: warning: Function parameter or member 'port_index' not described in 'devlink_port_register' net/core/devlink.c:5753: warning: Function parameter or member 'parent_resource_id' not described in 'devlink_resource_register' net/core/devlink.c:5753: warning: Function parameter or member 'size_params' not described in 'devlink_resource_register' net/core/devlink.c:5753: warning: Excess function parameter 'top_hierarchy' description in 'devlink_resource_register' net/core/devlink.c:5753: warning: Excess function parameter 'reload_required' description in 'devlink_resource_register' net/core/devlink.c:5753: warning: Excess function parameter 'parent_reosurce_id' description in 'devlink_resource_register' net/core/devlink.c:6451: warning: Function parameter or member 'region' not described in 'devlink_region_snapshot_create' net/core/devlink.c:6451: warning: Excess function parameter 'devlink_region' description in 'devlink_region_snapshot_create' Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01ipv4: Add ICMPv6 support when parse route ipprotoHangbin Liu3-6/+16
For ip rules, we need to use 'ipproto ipv6-icmp' to match ICMPv6 headers. But for ip -6 route, currently we only support tcp, udp and icmp. Add ICMPv6 support so we can match ipv6-icmp rules for route lookup. v2: As David Ahern and Sabrina Dubroca suggested, Add an argument to rtm_getroute_parse_ip_proto() to handle ICMP/ICMPv6 with different family. Reported-by: Jianlin Shi <jishi@redhat.com> Fixes: eacb9384a3fe ("ipv6: support sport, dport and ip_proto in RTM_GETROUTE") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-03-01netfilter: nf_tables: merge ipv4 and ipv6 nat chain typesFlorian Westphal9-194/+111
Merge the ipv4 and ipv6 nat chain type. This is the last missing piece which allows to provide inet family support for nat in a follow patch. The kconfig knobs for ipv4/ipv6 nat chain are removed, the nat chain type will be built unconditionally if NFT_NAT expression is enabled. Before: text data bss dec hex filename 1576 896 0 2472 9a8 nft_chain_nat_ipv4.ko 1697 896 0 2593 a21 nft_chain_nat_ipv6.ko After: text data bss dec hex filename 1832 896 0 2728 aa8 nft_chain_nat.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nf_tables: nat: merge nft_masq protocol specific modulesFlorian Westphal8-214/+168
The family specific masq modules are way too small to warrant an extra module, just place all of them in nft_masq. before: text data bss dec hex filename 1001 832 0 1833 729 nft_masq.ko 766 896 0 1662 67e nft_masq_ipv4.ko 764 896 0 1660 67c nft_masq_ipv6.ko after: 2010 960 0 2970 b9a nft_masq.ko Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nf_tables: nat: merge nft_redir protocol specific modulesFlorian Westphal8-195/+143
before: text data bss dec hex filename 990 832 0 1822 71e nft_redir.ko 697 896 0 1593 639 nft_redir_ipv4.ko 713 896 0 1609 649 nft_redir_ipv6.ko after: text data bss dec hex filename 1910 960 0 2870 b36 nft_redir.ko size is reduced, all helpers from nft_redir.ko can be made static. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: xt_IDLETIMER: fix sysfs callback function typeSami Tolvanen1-10/+4
Use struct device_attribute instead of struct idletimer_tg_attr, and the correct callback function type to avoid indirect call mismatches with Control Flow Integrity checking. Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nf_conntrack: ensure that CONNTRACK_LOCKS is power of 2Li RongQing1-0/+1
CONNTRACK_LOCKS is divisor when computer array index, if it is power of 2, compiler will optimize modulo operation as bitwise AND, or else modulo will lower performance. Suggested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nf_tables: check the result of dereferencing base_chain->statsLi RongQing1-6/+8
Check the result of dereferencing base_chain->stats, instead of result of this_cpu_ptr with NULL. base_chain->stats maybe be changed to NULL when a chain is updated and a new NULL counter can be attached. And we do not need to check returning of this_cpu_ptr since base_chain->stats is from percpu allocator if it is non-NULL, this_cpu_ptr returns a valid value. And fix two sparse error by replacing rcu_access_pointer and rcu_dereference with READ_ONCE under rcu_read_lock. Thanks for Eric's help to finish this patch. Fixes: 009240940e84c1 ("netfilter: nf_tables: don't assume chain stats are set when jumplabel is set") Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: bridge: Don't sabotage nf_hook calls for an l3mdev slaveDavid Ahern1-1/+2
Followup to a173f066c7cf ("netfilter: bridge: Don't sabotage nf_hook calls from an l3mdev"). Some packets (e.g., ndisc) do not have the skb device flipped to the l3mdev (e.g., VRF) device. Update ip_sabotage_in to not drop packets for slave devices too. Currently, neighbor solicitation packets for 'dev -> bridge (addr) -> vrf' setups are getting dropped. This patch enables IPv6 communications for bridges with an address that are enslaved to a VRF. Fixes: 73e20b761acf ("net: vrf: Add support for PREROUTING rules on vrf device") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01ipvs: get sctphdr by sctphoff in sctp_csum_checkXin Long1-5/+2
sctp_csum_check() is called by sctp_s/dnat_handler() where it calls skb_make_writable() to ensure sctphdr to be linearized. So there's no need to get sctphdr by calling skb_header_pointer() in sctp_csum_check(). Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: convert the proto argument from u8 to u16Li RongQing2-5/+5
The proto in struct xt_match and struct xt_target is u16, when calling xt_check_target/match, their proto argument is u8, and will cause truncation, it is harmless to ip packet, since ip proto is u8 if a etable's match/target has proto that is u16, will cause the check failure. and convert be16 to short in bridge/netfilter/ebtables.c Signed-off-by: Zhang Yu <zhangyu31@baidu.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: nft_tunnel: Add dst_cache supportwenxu1-0/+7
The metadata_dst does not initialize the dst_cache field, this causes problems to ip_md_tunnel_xmit() since it cannot use this cache, hence, Triggering a route lookup for every packet. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01netfilter: conntrack: tcp: only close if RST matches exact sequenceFlorian Westphal1-10/+40
TCP resets cause instant transition from established to closed state provided the reset is in-window. Endpoints that implement RFC 5961 require resets to match the next expected sequence number. RST segments that are in-window (but that do not match RCV.NXT) are ignored, and a "challenge ACK" is sent back. Main problem for conntrack is that its a middlebox, i.e. whereas an end host might have ACK'd SEQ (and would thus accept an RST with this sequence number), conntrack might not have seen this ACK (yet). Therefore we can't simply flag RSTs with non-exact match as invalid. This updates RST processing as follows: 1. If the connection is in a state other than ESTABLISHED, nothing is changed, RST is subject to normal in-window check. 2. If the RSTs sequence number either matches exactly RCV.NXT, connection state moves to CLOSE. 3. The same applies if the RST sequence number aligns with a previous packet in the same direction. In all other cases, the connection remains in ESTABLISHED state. If the normal-in-window check passes, the timeout will be lowered to that of CLOSE. If the peer sends a challenge ack, connection timeout will be reset. If the challenge ACK triggers another RST (RST was valid after all), this 2nd RST will match expected sequence and conntrack state changes to CLOSE. If no challenge ACK is received, the connection will time out after CLOSE seconds (10 seconds by default), just like without this patch. Packetdrill test case: 0.000 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 0.000 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 0.000 bind(3, ..., ...) = 0 0.000 listen(3, 1) = 0 0.100 < S 0:0(0) win 32792 <mss 1460,sackOK,nop,nop,nop,wscale 7> 0.100 > S. 0:0(0) ack 1 win 64240 <mss 1460,nop,nop,sackOK,nop,wscale 7> 0.200 < . 1:1(0) ack 1 win 257 0.200 accept(3, ..., ...) = 4 // Receive a segment. 0.210 < P. 1:1001(1000) ack 1 win 46 0.210 > . 1:1(0) ack 1001 // Application writes 1000 bytes. 0.250 write(4, ..., 1000) = 1000 0.250 > P. 1:1001(1000) ack 1001 // First reset, old sequence. Conntrack (correctly) considers this // invalid due to failed window validation (regardless of this patch). 0.260 < R 2:2(0) ack 1001 win 260 // 2nd reset, but too far ahead sequence. Same: correctly handled // as invalid. 0.270 < R 99990001:99990001(0) ack 1001 win 260 // in-window, but not exact sequence. // Current Linux kernels might reply with a challenge ack, and do not // remove connection. // Without this patch, conntrack state moves to CLOSE. // With patch, timeout is lowered like CLOSE, but connection stays // in ESTABLISHED state. 0.280 < R 1010:1010(0) ack 1001 win 260 // Expect challenge ACK 0.281 > . 1001:1001(0) ack 1001 win 501 // With or without this patch, RST will cause connection // to move to CLOSE (sequence number matches) // 0.282 < R 1001:1001(0) ack 1001 win 260 // ACK 0.300 < . 1001:1001(0) ack 1001 win 257 // more data could be exchanged here, connection // is still established // Client closes the connection. 0.610 < F. 1001:1001(0) ack 1001 win 260 0.650 > . 1001:1001(0) ack 1002 // Close the connection without reading outstanding data 0.700 close(4) = 0 // so one more reset. Will be deemed acceptable with patch as well: // connection is already closing. 0.701 > R. 1001:1001(0) ack 1002 win 501 // End packetdrill test case. With patch, this generates following conntrack events: [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UNREPLIED] [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 120 FIN_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 60 CLOSE_WAIT src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5437 dport=80 [ASSURED] Without patch, first RST moves connection to close, whereas socket state does not change until FIN is received. [NEW] 120 SYN_SENT src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UNREPLIED] [UPDATE] 60 SYN_RECV src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [UPDATE] 432000 ESTABLISHED src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED] [UPDATE] 10 CLOSE src=10.0.2.1 dst=10.0.0.1 sport=5141 dport=80 [ASSURED] Cc: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-03-01ipvs: change some data types from int to boolAndrea Claudi5-18/+18
Change the data type of the following variables from int to bool across ipvs code: - found - loop - need_full_dest - need_full_svc - payload_csum Also change the following functions to use bool full_entry param instead of int: - ip_vs_genl_parse_dest() - ip_vs_genl_parse_service() This patch does not change any functionality but makes the source code slightly easier to read. Signed-off-by: Andrea Claudi <aclaudi@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-28net/smc: allow pnetid-less configurationUrsula Braun1-1/+41
Without hardware pnetid support there must currently be a pnet table configured to determine the IB device port to be used for SMC RDMA traffic. This patch enables a setup without pnet table, if the used handshake interface belongs already to a RoCE port. Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28net: sched: pie: avoid slow division in drop probability decayLeslie Monis1-1/+2
As per RFC 8033, it is sufficient for the drop probability decay factor to have a value of (1 - 1/64) instead of 98%. This avoids the need to do slow division. Suggested-by: David Laight <David.Laight@aculab.com> Signed-off-by: Leslie Monis <lesliemonis@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28sctp: chunk.c: correct format string for size_t in printkMatthias Maennich1-1/+1
According to Documentation/core-api/printk-formats.rst, size_t should be printed with %zu, rather than %Zu. In addition, using %Zu triggers a warning on clang (-Wformat-extra-args): net/sctp/chunk.c:196:25: warning: data argument not used by format string [-Wformat-extra-args] __func__, asoc, max_data); ~~~~~~~~~~~~~~~~^~~~~~~~~ ./include/linux/printk.h:440:49: note: expanded from macro 'pr_warn_ratelimited' printk_ratelimited(KERN_WARNING pr_fmt(fmt), ##__VA_ARGS__) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~ ./include/linux/printk.h:424:17: note: expanded from macro 'printk_ratelimited' printk(fmt, ##__VA_ARGS__); \ ~~~ ^ Fixes: 5b5e0928f742 ("lib/vsprintf.c: remove %Z support") Link: https://github.com/ClangBuiltLinux/linux/issues/378 Signed-off-by: Matthias Maennich <maennich@google.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-28net: netem: fix skb length BUG_ON in __skb_to_sgvecSheng Lan1-3/+7
It can be reproduced by following steps: 1. virtio_net NIC is configured with gso/tso on 2. configure nginx as http server with an index file bigger than 1M bytes 3. use tc netem to produce duplicate packets and delay: tc qdisc add dev eth0 root netem delay 100ms 10ms 30% duplicate 90% 4. continually curl the nginx http server to get index file on client 5. BUG_ON is seen quickly [10258690.371129] kernel BUG at net/core/skbuff.c:4028! [10258690.371748] invalid opcode: 0000 [#1] SMP PTI [10258690.372094] CPU: 5 PID: 0 Comm: swapper/5 Tainted: G W 5.0.0-rc6 #2 [10258690.372094] RSP: 0018:ffffa05797b43da0 EFLAGS: 00010202 [10258690.372094] RBP: 00000000000005ea R08: 0000000000000000 R09: 00000000000005ea [10258690.372094] R10: ffffa0579334d800 R11: 00000000000002c0 R12: 0000000000000002 [10258690.372094] R13: 0000000000000000 R14: ffffa05793122900 R15: ffffa0578f7cb028 [10258690.372094] FS: 0000000000000000(0000) GS:ffffa05797b40000(0000) knlGS:0000000000000000 [10258690.372094] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [10258690.372094] CR2: 00007f1a6dc00868 CR3: 000000001000e000 CR4: 00000000000006e0 [10258690.372094] Call Trace: [10258690.372094] <IRQ> [10258690.372094] skb_to_sgvec+0x11/0x40 [10258690.372094] start_xmit+0x38c/0x520 [virtio_net] [10258690.372094] dev_hard_start_xmit+0x9b/0x200 [10258690.372094] sch_direct_xmit+0xff/0x260 [10258690.372094] __qdisc_run+0x15e/0x4e0 [10258690.372094] net_tx_action+0x137/0x210 [10258690.372094] __do_softirq+0xd6/0x2a9 [10258690.372094] irq_exit+0xde/0xf0 [10258690.372094] smp_apic_timer_interrupt+0x74/0x140 [10258690.372094] apic_timer_interrupt+0xf/0x20 [10258690.372094] </IRQ> In __skb_to_sgvec(), the skb->len is not equal to the sum of the skb's linear data size and nonlinear data size, thus BUG_ON triggered. Because the skb is cloned and a part of nonlinear data is split off. Duplicate packet is cloned in netem_enqueue() and may be delayed some time in qdisc. When qdisc len reached the limit and returns NET_XMIT_DROP, the skb will be retransmit later in write queue. the skb will be fragmented by tso_fragment(), the limit size that depends on cwnd and mss decrease, the skb's nonlinear data will be split off. The length of the skb cloned by netem will not be updated. When we use virtio_net NIC and invoke skb_to_sgvec(), the BUG_ON trigger. To fix it, netem returns NET_XMIT_SUCCESS to upper stack when it clones a duplicate packet. Fixes: 35d889d1 ("sch_netem: fix skb leak in netem_enqueue()") Signed-off-by: Sheng Lan <lansheng@huawei.com> Reported-by: Qin Ji <jiqin.ji@huawei.com> Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27net: sched: act_csum: Fix csum calc for tagged packetsEli Britstein1-2/+29
The csum calculation is different for IPv4/6. For VLAN packets, tc_skb_protocol returns the VLAN protocol rather than the packet's one (e.g. IPv4/6), so csum is not calculated. Furthermore, VLAN may not be stripped so csum is not calculated in this case too. Calculate the csum for those cases. Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan path") Signed-off-by: Eli Britstein <elibr@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27netlabel: fix out-of-bounds memory accessesPaul Moore2-2/+4
There are two array out-of-bounds memory accesses, one in cipso_v4_map_lvl_valid(), the other in netlbl_bitmap_walk(). Both errors are embarassingly simple, and the fixes are straightforward. As a FYI for anyone backporting this patch to kernels prior to v4.8, you'll want to apply the netlbl_bitmap_walk() patch to cipso_v4_bitmap_walk() as netlbl_bitmap_walk() doesn't exist before Linux v4.8. Reported-by: Jann Horn <jannh@google.com> Fixes: 446fda4f2682 ("[NetLabel]: CIPSOv4 engine") Fixes: 3faa8f982f95 ("netlabel: Move bitmap manipulation functions to the NetLabel core.") Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27ipv4: Pass original device to ip_rcv_finish_coreDavid Ahern1-4/+5
ip_route_input_rcu expects the original ingress device (e.g., for proper multicast handling). The skb->dev can be changed by l3mdev_ip_rcv, so dev needs to be saved prior to calling it. This was the behavior prior to the listify changes. Fixes: 5fa12739a53d0 ("net: ipv4: listify ip_rcv_finish") Cc: Edward Cree <ecree@solarflare.com> Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27net: sched: act_tunnel_key: fix metadata handlingVlad Buslov1-9/+9
Tunnel key action params->tcft_enc_metadata is only set when action is TCA_TUNNEL_KEY_ACT_SET. However, metadata pointer is incorrectly dereferenced during tunnel key init and release without verifying that action is if correct type, which causes NULL pointer dereference. Metadata tunnel dst_cache is also leaked on action overwrite. Fix metadata handling: - Verify that metadata pointer is not NULL before dereferencing it in tunnel_key_init error handling code. - Move dst_cache destroy code into tunnel_key_release_params() function that is called in both action overwrite and release cases (fixes resource leak) and verifies that actions has correct type before dereferencing metadata pointer (fixes NULL pointer dereference). Oops with KASAN enabled during tdc tests execution: [ 261.080482] ================================================================== [ 261.088049] BUG: KASAN: null-ptr-deref in dst_cache_destroy+0x21/0xa0 [ 261.094613] Read of size 8 at addr 00000000000000b0 by task tc/2976 [ 261.102524] CPU: 14 PID: 2976 Comm: tc Not tainted 5.0.0-rc7+ #157 [ 261.108844] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 261.116726] Call Trace: [ 261.119234] dump_stack+0x9a/0xeb [ 261.122625] ? dst_cache_destroy+0x21/0xa0 [ 261.126818] ? dst_cache_destroy+0x21/0xa0 [ 261.131004] kasan_report+0x176/0x192 [ 261.134752] ? idr_get_next+0xd0/0x120 [ 261.138578] ? dst_cache_destroy+0x21/0xa0 [ 261.142768] dst_cache_destroy+0x21/0xa0 [ 261.146799] tunnel_key_release+0x3a/0x50 [act_tunnel_key] [ 261.152392] tcf_action_cleanup+0x2c/0xc0 [ 261.156490] tcf_generic_walker+0x4c2/0x5c0 [ 261.160794] ? tcf_action_dump_1+0x390/0x390 [ 261.165163] ? tunnel_key_walker+0x5/0x1a0 [act_tunnel_key] [ 261.170865] ? tunnel_key_walker+0xe9/0x1a0 [act_tunnel_key] [ 261.176641] tca_action_gd+0x600/0xa40 [ 261.180482] ? tca_get_fill.constprop.17+0x200/0x200 [ 261.185548] ? __lock_acquire+0x588/0x1d20 [ 261.189741] ? __lock_acquire+0x588/0x1d20 [ 261.193922] ? mark_held_locks+0x90/0x90 [ 261.197944] ? mark_held_locks+0x90/0x90 [ 261.202018] ? __nla_parse+0xfe/0x190 [ 261.205774] tc_ctl_action+0x218/0x230 [ 261.209614] ? tcf_action_add+0x230/0x230 [ 261.213726] rtnetlink_rcv_msg+0x3a5/0x600 [ 261.217910] ? lock_downgrade+0x2d0/0x2d0 [ 261.222006] ? validate_linkmsg+0x400/0x400 [ 261.226278] ? find_held_lock+0x6d/0xd0 [ 261.230200] ? match_held_lock+0x1b/0x210 [ 261.234296] ? validate_linkmsg+0x400/0x400 [ 261.238567] netlink_rcv_skb+0xc7/0x1f0 [ 261.242489] ? netlink_ack+0x470/0x470 [ 261.246319] ? netlink_deliver_tap+0x1f3/0x5a0 [ 261.250874] netlink_unicast+0x2ae/0x350 [ 261.254884] ? netlink_attachskb+0x340/0x340 [ 261.261647] ? _copy_from_iter_full+0xdd/0x380 [ 261.268576] ? __virt_addr_valid+0xb6/0xf0 [ 261.275227] ? __check_object_size+0x159/0x240 [ 261.282184] netlink_sendmsg+0x4d3/0x630 [ 261.288572] ? netlink_unicast+0x350/0x350 [ 261.295132] ? netlink_unicast+0x350/0x350 [ 261.301608] sock_sendmsg+0x6d/0x80 [ 261.307467] ___sys_sendmsg+0x48e/0x540 [ 261.313633] ? copy_msghdr_from_user+0x210/0x210 [ 261.320545] ? save_stack+0x89/0xb0 [ 261.326289] ? __lock_acquire+0x588/0x1d20 [ 261.332605] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 261.340063] ? mark_held_locks+0x90/0x90 [ 261.346162] ? do_filp_open+0x138/0x1d0 [ 261.352108] ? may_open_dev+0x50/0x50 [ 261.357897] ? match_held_lock+0x1b/0x210 [ 261.364016] ? __fget_light+0xa6/0xe0 [ 261.369840] ? __sys_sendmsg+0xd2/0x150 [ 261.375814] __sys_sendmsg+0xd2/0x150 [ 261.381610] ? __ia32_sys_shutdown+0x30/0x30 [ 261.388026] ? lock_downgrade+0x2d0/0x2d0 [ 261.394182] ? mark_held_locks+0x1c/0x90 [ 261.400230] ? do_syscall_64+0x1e/0x280 [ 261.406172] do_syscall_64+0x78/0x280 [ 261.411932] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 261.419103] RIP: 0033:0x7f28e91a8b87 [ 261.424791] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48 [ 261.448226] RSP: 002b:00007ffdc5c4e2d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 261.458183] RAX: ffffffffffffffda RBX: 000000005c73c202 RCX: 00007f28e91a8b87 [ 261.467728] RDX: 0000000000000000 RSI: 00007ffdc5c4e340 RDI: 0000000000000003 [ 261.477342] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000c [ 261.486970] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001 [ 261.496599] R13: 000000000067b4e0 R14: 00007ffdc5c5248c R15: 00007ffdc5c52480 [ 261.506281] ================================================================== [ 261.516076] Disabling lock debugging due to kernel taint [ 261.523979] BUG: unable to handle kernel NULL pointer dereference at 00000000000000b0 [ 261.534413] #PF error: [normal kernel read fault] [ 261.541730] PGD 8000000317400067 P4D 8000000317400067 PUD 316878067 PMD 0 [ 261.551294] Oops: 0000 [#1] SMP KASAN PTI [ 261.557985] CPU: 14 PID: 2976 Comm: tc Tainted: G B 5.0.0-rc7+ #157 [ 261.568306] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017 [ 261.578874] RIP: 0010:dst_cache_destroy+0x21/0xa0 [ 261.586413] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff <49> 83 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40 [ 261.611247] RSP: 0018:ffff888316447160 EFLAGS: 00010282 [ 261.619564] RAX: 0000000000000000 RBX: ffff88835b3e2f00 RCX: ffffffffad1c5071 [ 261.629862] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: 0000000000000297 [ 261.640149] RBP: 00000000ffffffff R08: fffffbfff5dd4e89 R09: fffffbfff5dd4e89 [ 261.650467] R10: 0000000000000001 R11: fffffbfff5dd4e88 R12: 00000000000000b0 [ 261.660785] R13: ffff8883267a10c0 R14: ffffffffaf35fe60 R15: 0000000000000001 [ 261.671110] FS: 00007f28ea3e6400(0000) GS:ffff888364200000(0000) knlGS:0000000000000000 [ 261.682447] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 261.691491] CR2: 00000000000000b0 CR3: 00000003178ae004 CR4: 00000000001606e0 [ 261.701283] Call Trace: [ 261.706374] tunnel_key_release+0x3a/0x50 [act_tunnel_key] [ 261.714522] tcf_action_cleanup+0x2c/0xc0 [ 261.721208] tcf_generic_walker+0x4c2/0x5c0 [ 261.728074] ? tcf_action_dump_1+0x390/0x390 [ 261.734996] ? tunnel_key_walker+0x5/0x1a0 [act_tunnel_key] [ 261.743247] ? tunnel_key_walker+0xe9/0x1a0 [act_tunnel_key] [ 261.751557] tca_action_gd+0x600/0xa40 [ 261.757991] ? tca_get_fill.constprop.17+0x200/0x200 [ 261.765644] ? __lock_acquire+0x588/0x1d20 [ 261.772461] ? __lock_acquire+0x588/0x1d20 [ 261.779266] ? mark_held_locks+0x90/0x90 [ 261.785880] ? mark_held_locks+0x90/0x90 [ 261.792470] ? __nla_parse+0xfe/0x190 [ 261.798738] tc_ctl_action+0x218/0x230 [ 261.805145] ? tcf_action_add+0x230/0x230 [ 261.811760] rtnetlink_rcv_msg+0x3a5/0x600 [ 261.818564] ? lock_downgrade+0x2d0/0x2d0 [ 261.825433] ? validate_linkmsg+0x400/0x400 [ 261.832256] ? find_held_lock+0x6d/0xd0 [ 261.838624] ? match_held_lock+0x1b/0x210 [ 261.845142] ? validate_linkmsg+0x400/0x400 [ 261.851729] netlink_rcv_skb+0xc7/0x1f0 [ 261.857976] ? netlink_ack+0x470/0x470 [ 261.864132] ? netlink_deliver_tap+0x1f3/0x5a0 [ 261.870969] netlink_unicast+0x2ae/0x350 [ 261.877294] ? netlink_attachskb+0x340/0x340 [ 261.883962] ? _copy_from_iter_full+0xdd/0x380 [ 261.890750] ? __virt_addr_valid+0xb6/0xf0 [ 261.897188] ? __check_object_size+0x159/0x240 [ 261.903928] netlink_sendmsg+0x4d3/0x630 [ 261.910112] ? netlink_unicast+0x350/0x350 [ 261.916410] ? netlink_unicast+0x350/0x350 [ 261.922656] sock_sendmsg+0x6d/0x80 [ 261.928257] ___sys_sendmsg+0x48e/0x540 [ 261.934183] ? copy_msghdr_from_user+0x210/0x210 [ 261.940865] ? save_stack+0x89/0xb0 [ 261.946355] ? __lock_acquire+0x588/0x1d20 [ 261.952358] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 261.959468] ? mark_held_locks+0x90/0x90 [ 261.965248] ? do_filp_open+0x138/0x1d0 [ 261.970910] ? may_open_dev+0x50/0x50 [ 261.976386] ? match_held_lock+0x1b/0x210 [ 261.982210] ? __fget_light+0xa6/0xe0 [ 261.987648] ? __sys_sendmsg+0xd2/0x150 [ 261.993263] __sys_sendmsg+0xd2/0x150 [ 261.998613] ? __ia32_sys_shutdown+0x30/0x30 [ 262.004555] ? lock_downgrade+0x2d0/0x2d0 [ 262.010236] ? mark_held_locks+0x1c/0x90 [ 262.015758] ? do_syscall_64+0x1e/0x280 [ 262.021234] do_syscall_64+0x78/0x280 [ 262.026500] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 262.033207] RIP: 0033:0x7f28e91a8b87 [ 262.038421] Code: 64 89 02 48 c7 c0 ff ff ff ff eb b9 0f 1f 80 00 00 00 00 8b 05 6a 2b 2c 00 48 63 d2 48 63 ff 85 c0 75 18 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 59 f3 c3 0f 1f 80 00 00 00 00 53 48 89 f3 48 [ 262.060708] RSP: 002b:00007ffdc5c4e2d8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e [ 262.070112] RAX: ffffffffffffffda RBX: 000000005c73c202 RCX: 00007f28e91a8b87 [ 262.079087] RDX: 0000000000000000 RSI: 00007ffdc5c4e340 RDI: 0000000000000003 [ 262.088122] RBP: 0000000000000000 R08: 0000000000000001 R09: 000000000000000c [ 262.097157] R10: 000000000000000c R11: 0000000000000246 R12: 0000000000000001 [ 262.106207] R13: 000000000067b4e0 R14: 00007ffdc5c5248c R15: 00007ffdc5c52480 [ 262.115271] Modules linked in: act_tunnel_key act_skbmod act_simple act_connmark nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 act_csum libcrc32c act_meta_skbtcindex act_meta_skbprio act_meta_mark act_ife ife act_police act_sample psample act_gact veth nfsv3 nfs_acl nfs lockd grace fscache bridge stp llc intel_rapl sb_edac mlx5_ib x86_pkg_temp_thermal sunrpc intel_powerclamp coretemp ib_uverbs kvm_intel ib_core kvm irqbypass mlx5_core crct10dif_pclmul crc32_pclmul crc32c_intel igb ghash_clmulni_intel intel_cstate mlxfw iTCO_wdt devlink intel_uncore iTCO_vendor_support ipmi_ssif ptp mei_me intel_rapl_perf ioatdma joydev pps_core ses mei i2c_i801 pcspkr enclosure lpc_ich dca wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad acpi_power_meter pcc_cpufreq ast i2c_algo_bit drm_kms_helper ttm drm mpt3sas raid_class scsi_transport_sas [ 262.204393] CR2: 00000000000000b0 [ 262.210390] ---[ end trace 2e41d786f2c7901a ]--- [ 262.226790] RIP: 0010:dst_cache_destroy+0x21/0xa0 [ 262.234083] Code: f4 ff ff ff eb f6 0f 1f 00 0f 1f 44 00 00 41 56 41 55 49 c7 c6 60 fe 35 af 41 54 55 49 89 fc 53 bd ff ff ff ff e8 ef 98 73 ff <49> 83 3c 24 00 75 35 eb 6c 4c 63 ed e8 de 98 73 ff 4a 8d 3c ed 40 [ 262.258311] RSP: 0018:ffff888316447160 EFLAGS: 00010282 [ 262.266304] RAX: 0000000000000000 RBX: ffff88835b3e2f00 RCX: ffffffffad1c5071 [ 262.276251] RDX: 0000000000000003 RSI: dffffc0000000000 RDI: 0000000000000297 [ 262.286208] RBP: 00000000ffffffff R08: fffffbfff5dd4e89 R09: fffffbfff5dd4e89 [ 262.296183] R10: 0000000000000001 R11: fffffbfff5dd4e88 R12: 00000000000000b0 [ 262.306157] R13: ffff8883267a10c0 R14: ffffffffaf35fe60 R15: 0000000000000001 [ 262.316139] FS: 00007f28ea3e6400(0000) GS:ffff888364200000(0000) knlGS:0000000000000000 [ 262.327146] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 262.335815] CR2: 00000000000000b0 CR3: 00000003178ae004 CR4: 00000000001606e0 Fixes: 41411e2fd6b8 ("net/sched: act_tunnel_key: Add dst_cache support") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27route: Add multipath_hash in flowi_common to make user-define hashwenxu3-4/+8
Current fib_multipath_hash_policy can make hash based on the L3 or L4. But it only work on the outer IP. So a specific tunnel always has the same hash value. But a specific tunnel may contain so many inner connections. This patch provide a generic multipath_hash in floi_common. It can make a user-define hash which can mix with L3 or L4 hash. Signed-off-by: wenxu <wenxu@ucloud.cn> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27net: nfc: Fix NULL dereference on nfc_llcp_build_tlv failsYueHaibing2-4/+40
KASAN report this: BUG: KASAN: null-ptr-deref in nfc_llcp_build_gb+0x37f/0x540 [nfc] Read of size 3 at addr 0000000000000000 by task syz-executor.0/5401 CPU: 0 PID: 5401 Comm: syz-executor.0 Not tainted 5.0.0-rc7+ #45 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0xfa/0x1ce lib/dump_stack.c:113 kasan_report+0x171/0x18d mm/kasan/report.c:321 memcpy+0x1f/0x50 mm/kasan/common.c:130 nfc_llcp_build_gb+0x37f/0x540 [nfc] nfc_llcp_register_device+0x6eb/0xb50 [nfc] nfc_register_device+0x50/0x1d0 [nfc] nfcsim_device_new+0x394/0x67d [nfcsim] ? 0xffffffffc1080000 nfcsim_init+0x6b/0x1000 [nfcsim] do_one_initcall+0xfa/0x5ca init/main.c:887 do_init_module+0x204/0x5f6 kernel/module.c:3460 load_module+0x66b2/0x8570 kernel/module.c:3808 __do_sys_finit_module+0x238/0x2a0 kernel/module.c:3902 do_syscall_64+0x147/0x600 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x462e99 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f9cb79dcc58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000462e99 RDX: 0000000000000000 RSI: 0000000020000280 RDI: 0000000000000003 RBP: 00007f9cb79dcc70 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9cb79dd6bc R13: 00000000004bcefb R14: 00000000006f7030 R15: 0000000000000004 nfc_llcp_build_tlv will return NULL on fails, caller should check it, otherwise will trigger a NULL dereference. Reported-by: Hulk Robot <hulkci@huawei.com> Fixes: eda21f16a5ed ("NFC: Set MIU and RW values from CONNECT and CC LLCP frames") Fixes: d646960f7986 ("NFC: Initial LLCP support") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27net: Remove switchdev_opsFlorian Fainelli1-5/+0
Now that we have converted all possible callers to using a switchdev notifier for attributes we do not have a need for implementing switchdev_ops anymore, and this can be removed from all drivers the net_device structure. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27net: switchdev: Replace port attr set SDO with a notificationFlorian Fainelli2-30/+31
Drop switchdev_ops.switchdev_port_attr_set. Drop the uses of this field from all clients, which were migrated to use switchdev notification in the previous patches. Add a new function switchdev_port_attr_notify() that sends the switchdev notifications SWITCHDEV_PORT_ATTR_SET and calls the blocking (process) notifier chain. We have one odd case within net/bridge/br_switchdev.c with the SWITCHDEV_ATTR_ID_PORT_PRE_BRIDGE_FLAGS attribute identifier that requires executing from atomic context, we deal with that one specifically. Drop __switchdev_port_attr_set() and update switchdev_port_attr_set() likewise. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27net: dsa: Handle SWITCHDEV_PORT_ATTR_SETFlorian Fainelli1-0/+18
Following patches will change the way we communicate setting a port's attribute and use notifiers towards that goal. Prepare DSA to support receiving notifier events targeting SWITCHDEV_PORT_ATTR_SET from both atomic and process context and use a small helper to translate the event notifier into something that dsa_slave_port_attr_set() can process. Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27switchdev: Add SWITCHDEV_PORT_ATTR_SETFlorian Fainelli1-0/+51
In preparation for allowing switchdev enabled drivers to veto specific attribute settings from within the context of the caller, introduce a new switchdev notifier type for port attributes. Suggested-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27Revert "net: sched: fw: don't set arg->stop in fw_walk() when empty"Vlad Buslov1-1/+4
This reverts commit 31a998487641 ("net: sched: fw: don't set arg->stop in fw_walk() when empty") Cls API function tcf_proto_is_empty() was changed in commit 6676d5e416ee ("net: sched: set dedicated tcf_walker flag when tp is empty") to no longer depend on arg->stop to determine that classifier instance is empty. Instead, it adds dedicated arg->nonempty field, which makes the fix in fw classifier no longer necessary. Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27ethtool: Use explicit designated initializers for .cmdLi RongQing1-2/+2
Initialize the .cmd member by using a designated struct initializer. This fixes warning of missing field initializers, and makes code a little easier to read. Signed-off-by: Li RongQing <lirongqing@baidu.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>
2019-02-27netfilter: nft_set_hash: remove nft_hash_key()Pablo Neira Ayuso1-11/+2
hashtable is never used for 2-byte keys, remove nft_hash_key(). Fixes: e240cd0df481 ("netfilter: nf_tables: place all set backends in one single module") Reported-by: Florian Westphal <fw@strlen.de> Tested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2019-02-27netfilter: nft_set_hash: bogus element self comparison from deactivation pathPablo Neira Ayuso1-1/+1
Use the element from the loop iteration, not the same element we want to deactivate otherwise this branch always evaluates true. Fixes: 6c03ae210ce3 ("netfilter: nft_set_hash: add non-resizable hashtable implementation") Reported-by: Florian Westphal <fw@strlen.de> Tested-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>