aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-08-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller207-835/+1699
Resolved kernel/bpf/btf.c using instructions from merge commit 69138b34a7248d2396ab85c8652e20c0c39beaba Signed-off-by: David S. Miller <[email protected]>
2020-08-01selftests/bpf: Fix spurious test failures in core_retro selftestAndrii Nakryiko2-2/+19
core_retro selftest uses BPF program that's triggered on sys_enter system-wide, but has no protection from some unrelated process doing syscall while selftest is running. This leads to occasional test failures with unexpected PIDs being returned. Fix that by filtering out all processes that are not test_progs process. Fixes: fcda189a5133 ("selftests/bpf: Add test relying only on CO-RE and no recent kernel features") Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-08-01Merge branch 'link_detach'Alexei Starovoitov20-34/+208
Andrii Nakryiko says: ==================== This patch set adds new BPF link operation, LINK_DETACH, allowing processes with BPF link FD to force-detach it from respective BPF hook, similarly how BPF link is auto-detached when such BPF hook (e.g., cgroup, net_device, netns, etc) is removed. This facility allows admin to forcefully undo BPF link attachment, while process that created BPF link in the first place is left intact. Once force-detached, BPF link stays valid in the kernel as long as there is at least one FD open against it. It goes into defunct state, just like auto-detached BPF link. bpftool also got `link detach` command to allow triggering this in non-programmatic fashion. v1->v2: - improve error reporting in `bpftool link detach` (Song). ==================== Signed-off-by: Alexei Starovoitov <[email protected]>
2020-08-01tools/bpftool: Add documentation and bash-completion for `link detach`Andrii Nakryiko2-2/+10
Add info on link detach sub-command to man page. Add detach to bash-completion as well. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: John Fastabend <[email protected]. Link: https://lore.kernel.org/bpf/[email protected]
2020-08-01tools/bpftool: Add `link detach` subcommandAndrii Nakryiko1-1/+36
Add ability to force-detach BPF link. Also add missing error message, if specified link ID is wrong. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-08-01selftests/bpf: Add link detach tests for cgroup, netns, and xdp bpf_linksAndrii Nakryiko5-29/+73
Add bpf_link__detach() testing to selftests for cgroup, netns, and xdp bpf_links. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-08-01libbpf: Add bpf_link detach APIsAndrii Nakryiko6-0/+25
Add low-level bpf_link_detach() API. Also add higher-level bpf_link__detach() one. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-08-01bpf: Add support for forced LINK_DETACH commandAndrii Nakryiko6-2/+64
Add LINK_DETACH command to force-detach bpf_link without destroying it. It has the same behavior as auto-detaching of bpf_link due to cgroup dying for bpf_cgroup_link or net_device being destroyed for bpf_xdp_link. In such case, bpf_link is still a valid kernel object, but is defuncts and doesn't hold BPF program attached to corresponding BPF hook. This functionality allows users with enough access rights to manually force-detach attached bpf_link without killing respective owner process. This patch implements LINK_DETACH for cgroup, xdp, and netns links, mostly re-using existing link release handling code. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Song Liu <[email protected]> Acked-by: John Fastabend <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-08-01bpf, selftests: Use single cgroup helpers for both test_sockmap/progsJohn Fastabend15-132/+43
Nearly every user of cgroup helpers does the same sequence of API calls. So push these into a single helper cgroup_setup_and_join. The cases that do a bit of extra logic are test_progs which currently uses an env variable to decide if it needs to setup the cgroup environment or can use an existingi environment. And then tests that are doing cgroup tests themselves. We skip these cases for now. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Link: https://lore.kernel.org/bpf/159623335418.30208.15807461815525100199.stgit@john-XPS-13-9370
2020-08-02netfilter: nf_tables: extended netlink error reporting for expressionsPablo Neira Ayuso1-1/+6
This patch extends 36dd1bcc07e5 ("netfilter: nf_tables: initial support for extended ACK reporting") to include netlink extended error reporting for expressions. This allows userspace to identify what rule expression is triggering the error. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2020-08-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds81-451/+782
Pull networking fixes from David Miller: 1) Encap offset calculation is incorrect in esp6, from Sabrina Dubroca. 2) Better parameter validation in pfkey_dump(), from Mark Salyzyn. 3) Fix several clang issues on powerpc in selftests, from Tanner Love. 4) cmsghdr_from_user_compat_to_kern() uses the wrong length, from Al Viro. 5) Out of bounds access in mlx5e driver, from Raed Salem. 6) Fix transfer buffer memleak in lan78xx, from Johan Havold. 7) RCU fixups in rhashtable, from Herbert Xu. 8) Fix ipv6 nexthop refcnt leak, from Xiyu Yang. 9) vxlan FDB dump must be done under RCU, from Ido Schimmel. 10) Fix use after free in mlxsw, from Ido Schimmel. 11) Fix map leak in HASH_OF_MAPS bpf code, from Andrii Nakryiko. 12) Fix bug in mac80211 Tx ack status reporting, from Vasanthakumar Thiagarajan. 13) Fix memory leaks in IPV6_ADDRFORM code, from Cong Wang. 14) Fix bpf program reference count leaks in mlx5 during mlx5e_alloc_rq(), from Xin Xiong. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (86 commits) vxlan: fix memleak of fdb rds: Prevent kernel-infoleak in rds_notify_queue_get() net/sched: The error lable position is corrected in ct_init_module net/mlx5e: fix bpf_prog reference count leaks in mlx5e_alloc_rq net/mlx5e: E-Switch, Specify flow_source for rule with no in_port net/mlx5e: E-Switch, Add misc bit when misc fields changed for mirroring net/mlx5e: CT: Support restore ipv6 tunnel net: gemini: Fix missing clk_disable_unprepare() in error path of gemini_ethernet_port_probe() ionic: unlock queue mutex in error path atm: fix atm_dev refcnt leaks in atmtcp_remove_persistent net: ethernet: mtk_eth_soc: fix MTU warnings net: nixge: fix potential memory leak in nixge_probe() devlink: ignore -EOPNOTSUPP errors on dumpit rxrpc: Fix race between recvmsg and sendmsg on immediate call failure MAINTAINERS: Replace Thor Thayer as Altera Triple Speed Ethernet maintainer selftests/bpf: fix netdevsim trap_flow_action_cookie read ipv6: fix memory leaks on IPV6_ADDRFORM path net/bpfilter: Initialize pos in __bpfilter_process_sockopt igb: reinit_locked() should be called with rtnl_lock e1000e: continue to init PHY even when failed to disable ULP ...
2020-08-01Merge tag 'for-linus-2020-08-01' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull thread fix from Christian Brauner: "A simple spelling fix for dequeue_synchronous_signal()" * tag 'for-linus-2020-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: signal: fix typo in dequeue_synchronous_signal()
2020-08-01Merge tag 'perf-tools-fixes-2020-08-01' of ↵Linus Torvalds4-6/+8
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tooling fixes from Arnaldo Carvalho de Melo: - Fix libtraceevent build with binutils 2.35 - Fix memory leak in process_dynamic_array_len in libtraceevent - Fix 'perf test 68' zstd compression for s390 - Fix record failure when mixed with ARM SPE event * tag 'perf-tools-fixes-2020-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: libtraceevent: Fix build with binutils 2.35 perf tools: Fix record failure when mixed with ARM SPE event perf tests: Fix test 68 zstd compression for s390 tools lib traceevent: Fix memory leak in process_dynamic_array_len
2020-08-01mptcp: fix syncookie build error on UPFlorian Westphal1-3/+1
kernel test robot says: net/mptcp/syncookies.c: In function 'mptcp_join_cookie_init': include/linux/kernel.h:47:38: warning: division by zero [-Wdiv-by-zero] #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr)) I forgot that spinock_t size is 0 on UP, so ARRAY_SIZE cannot be used. Fixes: 9466a1ccebbe54 ("mptcp: enable JOIN requests even if cookies are in use") Reported-by: kernel test robot <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-01vxlan: fix memleak of fdbTaehee Yoo1-2/+4
When vxlan interface is deleted, all fdbs are deleted by vxlan_flush(). vxlan_flush() flushes fdbs but it doesn't delete fdb, which contains all-zeros-mac because it is deleted by vxlan_uninit(). But vxlan_uninit() deletes only the fdb, which contains both all-zeros-mac and default vni. So, the fdb, which contains both all-zeros-mac and non-default vni will not be deleted. Test commands: ip link add vxlan0 type vxlan dstport 4789 external ip link set vxlan0 up bridge fdb add to 00:00:00:00:00:00 dst 172.0.0.1 dev vxlan0 via lo \ src_vni 10000 self permanent ip link del vxlan0 kmemleak reports as follows: unreferenced object 0xffff9486b25ced88 (size 96): comm "bridge", pid 2151, jiffies 4294701712 (age 35506.901s) hex dump (first 32 bytes): 02 00 00 00 ac 00 00 01 40 00 09 b1 86 94 ff ff ........@....... 46 02 00 00 00 00 00 00 a7 03 00 00 12 b5 6a 6b F.............jk backtrace: [<00000000c10cf651>] vxlan_fdb_append.part.51+0x3c/0xf0 [vxlan] [<000000006b31a8d9>] vxlan_fdb_create+0x184/0x1a0 [vxlan] [<0000000049399045>] vxlan_fdb_update+0x12f/0x220 [vxlan] [<0000000090b1ef00>] vxlan_fdb_add+0x12a/0x1b0 [vxlan] [<0000000056633c2c>] rtnl_fdb_add+0x187/0x270 [<00000000dd5dfb6b>] rtnetlink_rcv_msg+0x264/0x490 [<00000000fc44dd54>] netlink_rcv_skb+0x4a/0x110 [<00000000dff433e7>] netlink_unicast+0x18e/0x250 [<00000000b87fb421>] netlink_sendmsg+0x2e9/0x400 [<000000002ed55153>] ____sys_sendmsg+0x237/0x260 [<00000000faa51c66>] ___sys_sendmsg+0x88/0xd0 [<000000006c3982f1>] __sys_sendmsg+0x4e/0x80 [<00000000a8f875d2>] do_syscall_64+0x56/0xe0 [<000000003610eefa>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 unreferenced object 0xffff9486b1c40080 (size 128): comm "bridge", pid 2157, jiffies 4294701754 (age 35506.866s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 f8 dc 42 b2 86 94 ff ff ..........B..... 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b kkkkkkkkkkkkkkkk backtrace: [<00000000a2981b60>] vxlan_fdb_create+0x67/0x1a0 [vxlan] [<0000000049399045>] vxlan_fdb_update+0x12f/0x220 [vxlan] [<0000000090b1ef00>] vxlan_fdb_add+0x12a/0x1b0 [vxlan] [<0000000056633c2c>] rtnl_fdb_add+0x187/0x270 [<00000000dd5dfb6b>] rtnetlink_rcv_msg+0x264/0x490 [<00000000fc44dd54>] netlink_rcv_skb+0x4a/0x110 [<00000000dff433e7>] netlink_unicast+0x18e/0x250 [<00000000b87fb421>] netlink_sendmsg+0x2e9/0x400 [<000000002ed55153>] ____sys_sendmsg+0x237/0x260 [<00000000faa51c66>] ___sys_sendmsg+0x88/0xd0 [<000000006c3982f1>] __sys_sendmsg+0x4e/0x80 [<00000000a8f875d2>] do_syscall_64+0x56/0xe0 [<000000003610eefa>] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 3ad7a4b141eb ("vxlan: support fdb and learning in COLLECT_METADATA mode") Signed-off-by: Taehee Yoo <[email protected]> Acked-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-01fib: fix another fib_rules_ops indirect call wrapper problemBrian Vazquez1-0/+6
It turns out that on commit 41d707b7332f ("fib: fix fib_rules_ops indirect calls wrappers") I forgot to include the case when CONFIG_IP_MULTIPLE_TABLES is not set. Fixes: 41d707b7332f ("fib: fix fib_rules_ops indirect calls wrappers") Reported-by: Randy Dunlap <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Brian Vazquez <[email protected]> Acked-by: Randy Dunlap <[email protected]> # build-tested Signed-off-by: David S. Miller <[email protected]>
2020-08-01tcp: fix build fong CONFIG_MPTCP=nEric Dumazet1-2/+3
Fixes these errors: net/ipv4/syncookies.c: In function 'tcp_get_cookie_sock': net/ipv4/syncookies.c:216:19: error: 'struct tcp_request_sock' has no member named 'drop_req' 216 | if (tcp_rsk(req)->drop_req) { | ^~ net/ipv4/syncookies.c: In function 'cookie_tcp_reqsk_alloc': net/ipv4/syncookies.c:289:27: warning: unused variable 'treq' [-Wunused-variable] 289 | struct tcp_request_sock *treq; | ^~~~ make[3]: *** [scripts/Makefile.build:280: net/ipv4/syncookies.o] Error 1 make[3]: *** Waiting for unfinished jobs.... Fixes: 9466a1ccebbe ("mptcp: enable JOIN requests even if cookies are in use") Signed-off-by: Eric Dumazet <[email protected]> Cc: Florian Westphal <[email protected]> Acked-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-08-01Merge tag 'pinctrl-v5.8-4' of ↵Linus Torvalds4-2/+79
git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl Pull pin control fix from Linus Walleij: "A single last minute pin control fix to the Qualcomm driver fixing missing dual edge PCH interrupts" * tag 'pinctrl-v5.8-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl: pinctrl: qcom: Handle broken/missing PDC dual edge IRQs on sc7180
2020-08-01ice: Misc minor fixesTony Nguyen9-18/+14
This is a collection of minor fixes including typos, white space, and style. No functional changes. Signed-off-by: Tony Nguyen <[email protected]> Tested-by: Andrew Bowers <[email protected]>
2020-08-01ice: adjust profile ID map locksVictor Raj1-45/+45
The profile ID map lock should be held till the caller completes all references of that profile entries. The current code releases the lock right after the match search. This caused a driver issue when the profile map entries were referenced after it was freed in other thread after the lock was released earlier. Signed-off-by: Victor Raj <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-08-01ice: update PTYPE lookup tableTony Nguyen1-0/+314
Update the PTYPE lookup table to reflect values that can be set by the hardware. Signed-off-by: Tony Nguyen <[email protected]> Tested-by: Andrew Bowers <[email protected]>
2020-08-01ice: Disable VLAN pruning in promiscuous modeNick Nunley2-0/+10
Disable VLAN pruning when entering promiscuous mode, and re-enable it when exiting. Without this VLAN-over-bridge topologies created on the device won't be functional unless rx-vlan-filter is explicitly disabled with ethtool. Signed-off-by: Nick Nunley <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-08-01ice: Graceful error handling in HW table calloc failureSurabhi Boob1-1/+3
In the ice_init_hw_tbls, if the devm_kcalloc for es->written fails, catch that error and bail out gracefully, instead of continuing with a NULL pointer. Fixes: 32d63fa1e9f3 ("ice: Initialize DDP package structures") Signed-off-by: Surabhi Boob <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-08-01ice: port fix for chk_linearlizeKiran Patil1-3/+23
This is a port of commit 248de22e638f ("i40e/i40evf: Account for frags split over multiple descriptors in check linearize") As part of testing workloads (read/write) using larger IO size (128K) tx_timeout is observed and whenever it happens, it was due to tx_linearize. Signed-off-by: Kiran Patil <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-08-01ice: Allow 2 queue pairs per VF on SR-IOV initializationBrett Creeley2-0/+3
Currently VFs are only allowed to get 16, 4, and 1 queue pair by default, which require 17, 5, and 2 MSI-X vectors respectively. This is because each VF needs a MSI-X per data queue and a MSI-X for its other interrupt. The calculation is based on the number of VFs created, MSI-X available, and queue pairs available at the time of VF creation. Unfortunately the values above exclude 2 queue pairs when only 3 MSI-X are available to each VF based on resource constraints. The current calculation would default to 2 MSI-X and 1 queue pair. This is a waste of resources, so fix this by allowing 2 queue pairs per VF when there are between 2 and 5 MSI-X available per VF. Signed-off-by: Brett Creeley <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-08-01ice: Clear and free XLT entries on resetVignesh Sridhar1-1/+3
This fix has been added to address memory leak issues resulting from triggering a sudden driver reset which does not allow us to follow our normal removal flows for SW XLT entries for advanced features. - Adding call to destroy flow profile locks when clearing SW XLT tables. - Extraction sequence entries were not correctly cleared previously which could cause ownership conflicts for repeated reset-replay calls. Fixes: 31ad4e4ee1e4 ("ice: Allocate flow profile") Signed-off-by: Vignesh Sridhar <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-08-01ice: add useful statisticsJesse Brandeburg5-2/+15
Display and count some useful hot-path statistics. The usefulness is as follows: - tx_restart: use to determine if the transmit ring size is too small or if the transmit interrupt rate is too low. - rx_gro_dropped: use to count drops from GRO layer, which previously were completely uncounted when occurring. - tx_busy: use to determine when the driver is miscounting number of descriptors needed for an skb. - tx_timeout: as our other drivers, count the number of times we've reset due to timeout because the kernel only prints a warning once per netdev. Several of these were already counted but not displayed. Signed-off-by: Jesse Brandeburg <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-08-01ice: remove page_reuse statisticJesse Brandeburg2-3/+0
The page reuse statistic wasn't even being displayed to the user, even though the driver counted it. Don't waste the struct space and hot-path cycles since the driver doesn't display it. Signed-off-by: Jesse Brandeburg <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-08-01ice: Fix RSS profile locksVignesh Sridhar1-6/+7
Replacing flow profile locks with RSS profile locks in the function to remove all RSS rules for a given VSI. This is to align the locks used for RSS rule addition to VSI and removal during VSI teardown to avoid a race condition owing to several iterations of the above operations. In function to get RSS rules for given VSI and protocol header replacing the pointer reference of the RSS entry with a copy of hash value to ensure thread safety. Signed-off-by: Vignesh Sridhar <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-08-01ice: fix the vsi_id mask to be 10 bit for set_rss_lutKiran Patil1-1/+1
set_rss_lut can fail due to incorrect vsi_id mask. vsi_id is 10 bit but mask was 0x1FF whereas it should be 0x3FF. For vsi_num >= 512, FW set_rss_lut can fail with return code EACCESS (VSI ownership issue) because software was providing incorrect vsi_num (dropping 10th bit due to incorrect mask) for set_rss_lut admin command Signed-off-by: Kiran Patil <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-08-01ice: rename misleading grst_delay variableNick Nunley1-5/+5
The grst_delay variable in ice_check_reset contains the maximum time (in 100 msec units) that the driver will wait for a reset event to transition to the Device Active state. The value is the sum of three separate components: 1) The maximum time it may take for the firmware to process its outstanding command before handling the reset request. 2) The value in RSTCTL.GRSTDEL (the delay firmware inserts between first seeing the driver reset request and the actual hardware assertion). 3) The maximum expected reset processing time in hardware. Referring to this total time as "grst_delay" is misleading and potentially confusing to someone checking the code and cross-referencing the hardware specification. Fix this by renaming the variable to "grst_timeout", which is more descriptive of its actual use. Signed-off-by: Nick Nunley <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-08-01ice: mark PM functions as __maybe_unusedWei Yongjun1-2/+2
In certain configurations without power management support, the following warnings happen: drivers/net/ethernet/intel/ice/ice_main.c:4214:12: warning: 'ice_resume' defined but not used [-Wunused-function] 4214 | static int ice_resume(struct device *dev) | ^~~~~~~~~~ drivers/net/ethernet/intel/ice/ice_main.c:4150:12: warning: 'ice_suspend' defined but not used [-Wunused-function] 4150 | static int ice_suspend(struct device *dev) | ^~~~~~~~~~~ Mark these functions as __maybe_unused to make it clear to the compiler that this is going to happen based on the configuration, which is the standard for these types of functions. Fixes: 769c500dcc1e ("ice: Add advanced power mgmt for WoL") Reported-by: Hulk Robot <[email protected]> Signed-off-by: Wei Yongjun <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2020-07-31Merge tag 'mac80211-next-for-davem-2020-07-31' of ↵David S. Miller42-235/+511
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== We have a number of changes * code cleanups and fixups as usual * AQL & internal TXQ improvements from Felix * some mesh 802.1X support bits * some injection improvements from Mathy of KRACK fame, so we'll see what this results in ;-) * some more initial S1G supports bits, this time (some of?) the userspace APIs ==================== Signed-off-by: David S. Miller <[email protected]>
2020-07-31rtnetlink: add support for protodown reasonRoopa Prabhu4-5/+147
netdev protodown is a mechanism that allows protocols to hold an interface down. It was initially introduced in the kernel to hold links down by a multihoming protocol. There was also an attempt to introduce protodown reason at the time but was rejected. protodown and protodown reason is supported by almost every switching and routing platform. It was ok for a while to live without a protodown reason. But, its become more critical now given more than one protocol may need to keep a link down on a system at the same time. eg: vrrp peer node, port security, multihoming protocol. Its common for Network operators and protocol developers to look for such a reason on a networking box (Its also known as errDisable by most networking operators) This patch adds support for link protodown reason attribute. There are two ways to maintain protodown reasons. (a) enumerate every possible reason code in kernel - A protocol developer has to make a request and have that appear in a certain kernel version (b) provide the bits in the kernel, and allow user-space (sysadmin or NOS distributions) to manage the bit-to-reasonname map. - This makes extending reason codes easier (kind of like the iproute2 table to vrf-name map /etc/iproute2/rt_tables.d/) This patch takes approach (b). a few things about the patch: - It treats the protodown reason bits as counter to indicate active protodown users - Since protodown attribute is already an exposed UAPI, the reason is not enforced on a protodown set. Its a no-op if not used. the patch follows the below algorithm: - presence of reason bits set indicates protodown is in use - user can set protodown and protodown reason in a single or multiple setlink operations - setlink operation to clear protodown, will return -EBUSY if there are active protodown reason bits - reason is not included in link dumps if not used example with patched iproute2: $cat /etc/iproute2/protodown_reasons.d/r.conf 0 mlag 1 evpn 2 vrrp 3 psecurity $ip link set dev vxlan0 protodown on protodown_reason vrrp on $ip link set dev vxlan0 protodown_reason mlag on $ip link show 14: vxlan0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether f6:06:be:17:91:e7 brd ff:ff:ff:ff:ff:ff protodown on <mlag,vrrp> $ip link set dev vxlan0 protodown_reason mlag off $ip link set dev vxlan0 protodown off protodown_reason vrrp off Signed-off-by: Roopa Prabhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller5-18/+126
Daniel Borkmann says: ==================== pull-request: bpf 2020-07-31 The following pull-request contains BPF updates for your *net* tree. We've added 5 non-merge commits during the last 21 day(s) which contain a total of 5 files changed, 126 insertions(+), 18 deletions(-). The main changes are: 1) Fix a map element leak in HASH_OF_MAPS map type, from Andrii Nakryiko. 2) Fix a NULL pointer dereference in __btf_resolve_helper_id() when no btf_vmlinux is available, from Peilin Ye. 3) Init pos variable in __bpfilter_process_sockopt(), from Christoph Hellwig. 4) Fix a cgroup sockopt verifier test by specifying expected attach type, from Jean-Philippe Brucker. Note that when net gets merged into net-next later on, there is a small merge conflict in kernel/bpf/btf.c between commit 5b801dfb7feb ("bpf: Fix NULL pointer dereference in __btf_resolve_helper_id()") from the bpf tree and commit 138b9a0511c7 ("bpf: Remove btf_id helpers resolving") from the net-next tree. Resolve as follows: remove the old hunk with the __btf_resolve_helper_id() function. Change the btf_resolve_helper_id() so it actually tests for a NULL btf_vmlinux and bails out: int btf_resolve_helper_id(struct bpf_verifier_log *log, const struct bpf_func_proto *fn, int arg) { int id; if (fn->arg_type[arg] != ARG_PTR_TO_BTF_ID || !btf_vmlinux) return -EINVAL; id = fn->btf_id[arg]; if (!id || id > btf_vmlinux->nr_types) return -EINVAL; return id; } Let me know if you run into any others issues (CC'ing Jiri Olsa so he's in the loop with regards to merge conflict resolution). ==================== Signed-off-by: David S. Miller <[email protected]>
2020-07-31tun: add missing rcu annotation in tun_set_ebpf()Jason Wang1-1/+1
We expecte prog_p to be protected by rcu, so adding the rcu annotation to fix the following sparse warning: drivers/net/tun.c:3003:36: warning: incorrect type in argument 2 (different address spaces) drivers/net/tun.c:3003:36: expected struct tun_prog [noderef] __rcu **prog_p drivers/net/tun.c:3003:36: got struct tun_prog **prog_p drivers/net/tun.c:3292:42: warning: incorrect type in argument 2 (different address spaces) drivers/net/tun.c:3292:42: expected struct tun_prog **prog_p drivers/net/tun.c:3292:42: got struct tun_prog [noderef] __rcu ** drivers/net/tun.c:3296:42: warning: incorrect type in argument 2 (different address spaces) drivers/net/tun.c:3296:42: expected struct tun_prog **prog_p drivers/net/tun.c:3296:42: got struct tun_prog [noderef] __rcu ** Reported-by: Michael S. Tsirkin <[email protected]> Signed-off-by: Jason Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-31Merge branch 'master' of ↵David S. Miller6-58/+104
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2020-07-31 1) Fix policy matching with mark and mask on userspace interfaces. From Xin Long. 2) Several fixes for the new ESP in TCP encapsulation. From Sabrina Dubroca. 3) Fix crash when the hold queue is used. The assumption that xdst->path and dst->child are not a NULL pointer only if dst->xfrm is not a NULL pointer is true with the exception of using the hold queue. Fix this by checking for hold queue usage before dereferencing xdst->path or dst->child. 4) Validate pfkey_dump parameter before sending them. From Mark Salyzyn. 5) Fix the location of the transport header with ESP in UDPv6 encapsulation. From Sabrina Dubroca. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-07-31Merge tag 'mlx5-fixes-2020-07-30' of ↵David S. Miller7-18/+42
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Mellanox, mlx5 fixes 2020-07-30 This small patchset introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. For -stable v4.18: ('net/mlx5e: fix bpf_prog reference count leaks in mlx5e_alloc_rq') For -stable v5.7: ('net/mlx5e: E-Switch, Add misc bit when misc fields changed for mirroring') ==================== Signed-off-by: David S. Miller <[email protected]>
2020-07-31tcp: add earliest departure time to SCM_TIMESTAMPING_OPT_STATSYousuk Seung4-3/+9
This change adds TCP_NLA_EDT to SCM_TIMESTAMPING_OPT_STATS that reports the earliest departure time(EDT) of the timestamped skb. By tracking EDT values of the skb from different timestamps, we can observe when and how much the value changed. This allows to measure the precise delay injected on the sender host e.g. by a bpf-base throttler. Signed-off-by: Yousuk Seung <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Neal Cardwell <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Acked-by: Yuchung Cheng <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-31Merge branch '1GbE' of ↵David S. Miller17-178/+77
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Tony Nguyen says: ==================== 1GbE Intel Wired LAN Driver Updates 2020-07-30 This series contains updates to e100, e1000, e1000e, igb, igbvf, ixgbe, ixgbevf, iavf, and driver documentation. Vaibhav Gupta converts legacy .suspend() and .resume() to generic PM callbacks for e100, igbvf, ixgbe, ixgbevf, and iavf. Suraj Upadhyay replaces 1 byte memsets with assignments for e1000, e1000e, igb, and ixgbe. Alexander Klimov replaces http links with https. Miaohe Lin replaces uses of memset to clear MAC addresses with eth_zero_addr(). ==================== Signed-off-by: David S. Miller <[email protected]>
2020-07-31Merge branch 'mptcp-syncookies'David S. Miller16-43/+453
Florian Westphal says: ==================== mptcp: add syncookie support Changes in v2: - first patch renames req->ts_cookie to req->syncookie instead of removing ts_cookie member. - patch to add 'want_cookie' arg to init_req() functions has been dropped. All users of that arg were changed to check 'req->syncookie' instead. v1 cover letter: When syn-cookies are used the SYN?ACK never contains a MPTCP option, because the code path that creates a request socket based on a valid cookie ACK lacks the needed changes to construct MPTCP request sockets. After this series, if SYN carries MP_CAPABLE option, the option is not cleared anymore and request socket will be reconstructed using the MP_CAPABLE option data that is re-sent with the ACK. This means that no additional state gets encoded into the syn cookie or the TCP timestamp. There are two caveats for SYN-Cookies with MPTCP: 1. When syn-cookies are used, the server-generated key is not stored. The drawback is that the next connection request that comes in before the cookie-ACK has a small chance that it will generate the same local_key. If this happens, the cookie ACK that comes in second will (re)compute the token hash and then detects that this is already in use. Unlike normal case, where the server will pick a new key value and then re-tries, we can't do that because we already committed to the key value (it was sent to peer already). Im this case, MPTCP cannot be used and late TCP fallback happens. 2). SYN packets with a MP_JOIN requests cannot be handled without storing state. This is because the SYN contains a nonce value that is needed to verify the HMAC of the MP_JOIN ACK that completes the three-way handshake. Also, a local nonce is generated and used in the cookie SYN/ACK. There are only 2 ways to solve this: a) Do not support JOINs when cookies are in effect. b) Store the nonces somewhere. The approach chosen here is b). Patch 8 adds a fixed-size (1024 entries) state table to store the information required to validate the MP_JOIN ACK and re-build the request socket. State gets stored when syn-cookies are active and the token in the JOIN request referred to an established MPTCP connection that can also accept a new subflow. State is restored if the ACK cookie is valid, an MP_JOIN option is present and the state slot contains valid data from a previous SYN. After the request socket has been re-build, normal HMAC check is done just as without syn cookies. Largely identical to last RFC, except patch #8 which follows Paolos suggestion to use a private table storage area rather than keeping request sockets around. This also means I dropped the patch to remove const qualifier from sk_listener pointers. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-07-31selftests: mptcp: add test cases for mptcp join tests with syn cookiesFlorian Westphal1-2/+64
Also add test cases with MP_JOIN when tcp_syncookies sysctl is 2 (i.e., syncookies are always-on). While at it, also print the test number and add the test number to the pcap files that can be generated optionally. This makes it easier to match the pcap to the test case. Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-31selftests: mptcp: make 2nd net namespace use tcp syn cookies unconditionallyFlorian Westphal1-0/+47
check we can establish connections also when syn cookies are in use. Check that MPTcpExtMPCapableSYNRX and MPTcpExtMPCapableACKRX increase for each MPTCP test. Check TcpExtSyncookiesSent and TcpExtSyncookiesRecv increase in netns2. Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-31mptcp: enable JOIN requests even if cookies are in useFlorian Westphal6-0/+174
JOIN requests do not work in syncookie mode -- for HMAC validation, the peers nonce and the mptcp token (to obtain the desired connection socket the join is for) are required, but this information is only present in the initial syn. So either we need to drop all JOIN requests once a listening socket enters syncookie mode, or we need to store enough state to reconstruct the request socket later. This adds a state table (1024 entries) to store the data present in the MP_JOIN syn request and the random nonce used for the cookie syn/ack. When a MP_JOIN ACK passed cookie validation, the table is consulted to rebuild the request socket from it. An alternate approach would be to "cancel" syn-cookie mode and force MP_JOIN to always use a syn queue entry. However, doing so brings the backlog over the configured queue limit. v2: use req->syncookie, not (removed) want_cookie arg Suggested-by: Paolo Abeni <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-31tcp: syncookies: create mptcp request socket for ACK cookies with MPTCP optionFlorian Westphal4-11/+37
If SYN packet contains MP_CAPABLE option, keep it enabled. Syncokie validation and cookie-based socket creation is changed to instantiate an mptcp request sockets if the ACK contains an MPTCP connection request. Rather than extend both cookie_v4/6_check, add a common helper to create the (mp)tcp request socket. Suggested-by: Paolo Abeni <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-31mptcp: subflow: add mptcp_subflow_init_cookie_req helperFlorian Westphal4-1/+86
Will be used to initialize the mptcp request socket when a MP_CAPABLE request was handled in syncookie mode, i.e. when a TCP ACK containing a MP_CAPABLE option is a valid syncookie value. Normally (non-cookie case), MPTCP will generate a unique 32 bit connection ID and stores it in the MPTCP token storage to be able to retrieve the mptcp socket for subflow joining. In syncookie case, we do not want to store any state, so just generate the unique ID and use it in the reply. This means there is a small window where another connection could generate the same token. When Cookie ACK comes back, we check that the token has not been registered in the mean time. If it was, the connection needs to fall back to TCP. Changes in v2: - use req->syncookie instead of passing 'want_cookie' arg to ->init_req() (Eric Dumazet) Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-31mptcp: rename and export mptcp_subflow_request_sock_opsFlorian Westphal2-5/+7
syncookie code path needs to create an mptcp request sock. Prepare for this and add mptcp prefix plus needed export of ops struct. Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-31mptcp: subflow: split subflow_init_reqFlorian Westphal1-10/+22
When syncookie support is added, we will need to add a variant of subflow_init_req() helper. It will do almost same thing except that it will not compute/add a token to the mptcp token tree. To avoid excess copy&paste, this commit splits away part of the code into a new helper, __subflow_init_req, that can then be re-used from the 'no insert' function added in a followup change. Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-31mptcp: token: move retry to callerFlorian Westphal2-9/+12
Once syncookie support is added, no state will be stored anymore when the syn/ack is generated in syncookie mode. When the ACK comes back, the generated key will be taken from the TCP ACK, the token is re-generated and inserted into the token tree. This means we can't retry with a new key when the token is already taken in the syncookie case. Therefore, move the retry logic to the caller to prepare for syncookie support in mptcp. Signed-off-by: Florian Westphal <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-07-31tcp: rename request_sock cookie_ts bit to syncookieFlorian Westphal4-5/+4
Nowadays output function has a 'synack_type' argument that tells us when the syn/ack is emitted via syncookies. The request already tells us when timestamps are supported, so check both to detect special timestamp for tcp option encoding is needed. We could remove cookie_ts altogether, but a followup patch would otherwise need to adjust function signatures to pass 'want_cookie' to mptcp core. This way, the 'existing' bit can be used. Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>