aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-11-04selftests/net: Fix reuseport_bpf_numa by skipping unavailable nodesKleber Sacilotto de Souza1-0/+4
In some platforms the numa node numbers are not necessarily consecutive, meaning that not all nodes from 0 to the value returned by numa_max_node() are available on the system. Using node numbers which are not available results on errors from libnuma such as: ---- IPv4 UDP ---- send node 0, receive socket 0 libnuma: Warning: Cannot read node cpumask from sysfs ./reuseport_bpf_numa: failed to pin to node: No such file or directory Fix it by checking if the node number bit is set on numa_nodes_ptr, which is defined on libnuma as "Set with all nodes the kernel has exposed to userspace". Signed-off-by: Kleber Sacilotto de Souza <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-11-03selftests/bpf: Verifier test on refill from a smaller spillMartin KaFai Lau1-0/+17
This patch adds a verifier test to ensure the verifier can read 8 bytes from the stack after two 32bit write at fp-4 and fp-8. The test is similar to the reported case from bcc [0]. [0] https://github.com/iovisor/bcc/pull/3683 Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-11-03bpf: Do not reject when the stack read size is different from the tracked ↵Martin KaFai Lau1-12/+6
scalar size Below is a simplified case from a report in bcc [0]: r4 = 20 *(u32 *)(r10 -4) = r4 *(u32 *)(r10 -8) = r4 /* r4 state is tracked */ r4 = *(u64 *)(r10 -8) /* Read more than the tracked 32bit scalar. * verifier rejects as 'corrupted spill memory'. */ After commit 354e8f1970f8 ("bpf: Support <8-byte scalar spill and refill"), the 8-byte aligned 32bit spill is also tracked by the verifier and the register state is stored. However, if 8 bytes are read from the stack instead of the tracked 4 byte scalar, then verifier currently rejects the program as "corrupted spill memory". This patch fixes this case by allowing it to read but marks the register as unknown. Also note that, if the prog is trying to corrupt/leak an earlier spilled pointer by spilling another <8 bytes register on top, this has already been rejected in the check_stack_write_fixed_off(). [0] https://github.com/iovisor/bcc/pull/3683 Fixes: 354e8f1970f8 ("bpf: Support <8-byte scalar spill and refill") Reported-by: Hengqi Chen <[email protected]> Reported-by: Yonghong Song <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: Hengqi Chen <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-11-03selftests/bpf: Make netcnt selftests serial to avoid spurious failuresAndrii Nakryiko1-1/+1
When running `./test_progs -j` test_netcnt fails with a very high probability, undercounting number of packets received (9999 vs expected 10000). It seems to be conflicting with other cgroup/skb selftests. So make it serial for now to make parallel mode more robust. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-11-03selftests/bpf: Test RENAME_EXCHANGE and RENAME_NOREPLACE on bpffsLorenz Bauer1-1/+64
Add tests to exercise the behaviour of RENAME_EXCHANGE and RENAME_NOREPLACE on bpffs. The former checks that after an exchange the inode of two directories has changed. The latter checks that the source still exists after a failed rename. Generally, having support for renameat2(RENAME_EXCHANGE) in bpffs fixes atomic upgrades of our sk_lookup control plane. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-11-03selftests/bpf: Convert test_bpffs to ASSERT macrosLorenz Bauer1-11/+11
Remove usage of deprecated CHECK macros. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-11-03libfs: Support RENAME_EXCHANGE in simple_rename()Lorenz Bauer1-1/+4
Allow atomic exchange via RENAME_EXCHANGE when using simple_rename. This affects binderfs, ramfs, hubetlbfs and bpffs. Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Christian Brauner <[email protected]> Cc: Al Viro <[email protected]> Cc: Miklos Szeredi <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-11-03libfs: Move shmem_exchange to simple_rename_exchangeLorenz Bauer3-23/+27
Move shmem_exchange and make it available to other callers. Suggested-by: Miklos Szeredi <[email protected]> Signed-off-by: Lorenz Bauer <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Miklos Szeredi <[email protected]> Cc: Al Viro <[email protected]> Cc: Christian Brauner <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-11-03net: dsa: felix: fix broken VLAN-tagged PTP under VLAN-aware bridgeVladimir Oltean3-6/+7
Normally it is expected that the dsa_device_ops :: rcv() method finishes parsing the DSA tag and consumes it, then never looks at it again. But commit c0bcf537667c ("net: dsa: ocelot: add hardware timestamping support for Felix") added support for RX timestamping in a very unconventional way. On this switch, a partial timestamp is available in the DSA header, but the driver got away with not parsing that timestamp right away, but instead delayed that parsing for a little longer: dsa_switch_rcv(): nskb = cpu_dp->rcv(skb, dev); <------------- not here -> ocelot_rcv() ... skb = nskb; skb_push(skb, ETH_HLEN); skb->pkt_type = PACKET_HOST; skb->protocol = eth_type_trans(skb, skb->dev); ... if (dsa_skb_defer_rx_timestamp(p, skb)) <--- but here -> felix_rxtstamp() return 0; When in felix_rxtstamp(), this driver accounted for the fact that eth_type_trans() happened in the meanwhile, so it got a hold of the extraction header again by subtracting (ETH_HLEN + OCELOT_TAG_LEN) bytes from the current skb->data. This worked for quite some time but was quite fragile from the very beginning. Not to mention that having DSA tag parsing split in two different files, under different folders (net/dsa/tag_ocelot.c vs drivers/net/dsa/ocelot/felix.c) made it quite non-obvious for patches to come that they might break this. Finally, the blamed commit does the following: at the end of ocelot_rcv(), it checks whether the skb payload contains a VLAN header. If it does, and this port is under a VLAN-aware bridge, that VLAN ID might not be correct in the sense that the packet might have suffered VLAN rewriting due to TCAM rules (VCAP IS1). So we consume the VLAN ID from the skb payload using __skb_vlan_pop(), and take the classified VLAN ID from the DSA tag, and construct a hwaccel VLAN tag with the classified VLAN, and the skb payload is VLAN-untagged. The big problem is that __skb_vlan_pop() does: memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN); __skb_pull(skb, VLAN_HLEN); aka it moves the Ethernet header 4 bytes to the right, and pulls 4 bytes from the skb headroom (effectively also moving skb->data, by definition). So for felix_rxtstamp()'s fragile logic, all bets are off now. Instead of having the "extraction" pointer point to the DSA header, it actually points to 4 bytes _inside_ the extraction header. Corollary, the last 4 bytes of the "extraction" header are in fact 4 stale bytes of the destination MAC address from the Ethernet header, from prior to the __skb_vlan_pop() movement. So of course, RX timestamps are completely bogus when the system is configured in this way. The fix is actually very simple: just don't structure the code like that. For better or worse, the DSA PTP timestamping API does not offer a straightforward way for drivers to present their RX timestamps, but other drivers (sja1105) have established a simple mechanism to carry their RX timestamp from dsa_device_ops :: rcv() all the way to dsa_switch_ops :: port_rxtstamp() and even later. That mechanism is to simply save the partial timestamp to the skb->cb, and complete it later. Question: why don't we simply populate the skb's struct skb_shared_hwtstamps from ocelot_rcv(), and bother with this complication of propagating the timestamp to felix_rxtstamp()? Answer: dsa_switch_ops :: port_rxtstamp() answers the question whether PTP packets need sleepable context to retrieve the full RX timestamp. Currently felix_rxtstamp() answers "no, thanks" to that question, and calls ocelot_ptp_gettime64() from softirq atomic context. This is understandable, since Felix VSC9959 is a PCIe memory-mapped switch, so hardware access does not require sleeping. But the felix driver is preparing for the introduction of other switches where hardware access is over a slow bus like SPI or MDIO: https://lore.kernel.org/lkml/[email protected]/ So I would like to keep this code structure, so the rework needed when that driver will need PTP support will be minimal (answer "yes, I need deferred context for this skb's RX timestamp", then the partial timestamp will still be found in the skb->cb. Fixes: ea440cd2d9b2 ("net: dsa: tag_ocelot: use VLAN information from tagging header when available") Reported-by: Po Liu <[email protected]> Cc: Yangbo Lu <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03net: dsa: qca8k: make sure PAD0 MAC06 exchange is disabledAnsuel Smith2-0/+9
Some device set MAC06 exchange in the bootloader. This cause some problem as we don't support this strange mode and we just set the port6 as the primary CPU port. With MAC06 exchange, PAD0 reg configure port6 instead of port0. Add an extra check and explicitly disable MAC06 exchange to correctly configure the port PAD config. Signed-off-by: Ansuel Smith <[email protected]> Fixes: 3fcf734aa482 ("net: dsa: qca8k: add support for cpu port 6") Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03net: vlan: fix a UAF in vlan_dev_real_dev()Ziyang Xuan2-3/+3
The real_dev of a vlan net_device may be freed after unregister_vlan_dev(). Access the real_dev continually by vlan_dev_real_dev() will trigger the UAF problem for the real_dev like following: ================================================================== BUG: KASAN: use-after-free in vlan_dev_real_dev+0xf9/0x120 Call Trace: kasan_report.cold+0x83/0xdf vlan_dev_real_dev+0xf9/0x120 is_eth_port_of_netdev_filter.part.0+0xb1/0x2c0 is_eth_port_of_netdev_filter+0x28/0x40 ib_enum_roce_netdev+0x1a3/0x300 ib_enum_all_roce_netdevs+0xc7/0x140 netdevice_event_work_handler+0x9d/0x210 ... Freed by task 9288: kasan_save_stack+0x1b/0x40 kasan_set_track+0x1c/0x30 kasan_set_free_info+0x20/0x30 __kasan_slab_free+0xfc/0x130 slab_free_freelist_hook+0xdd/0x240 kfree+0xe4/0x690 kvfree+0x42/0x50 device_release+0x9f/0x240 kobject_put+0x1c8/0x530 put_device+0x1b/0x30 free_netdev+0x370/0x540 ppp_destroy_interface+0x313/0x3d0 ... Move the put_device(real_dev) to vlan_dev_free(). Ensure real_dev not be freed before vlan_dev unregistered. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: [email protected] Signed-off-by: Ziyang Xuan <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03net: udp6: replace __UDP_INC_STATS() with __UDP6_INC_STATS()Menglong Dong1-3/+3
__UDP_INC_STATS() is used in udpv6_queue_rcv_one_skb() when encap_rcv() fails. __UDP6_INC_STATS() should be used here, so replace it with __UDP6_INC_STATS(). Signed-off-by: Menglong Dong <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03ethtool: fix ethtool msg len calculation for pause statsJakub Kicinski3-3/+7
ETHTOOL_A_PAUSE_STAT_MAX is the MAX attribute id, so we need to subtract non-stats and add one to get a count (IOW -2+1 == -1). Otherwise we'll see: ethnl cmd 21: calculated reply length 40, but consumed 52 Fixes: 9a27a33027f2 ("ethtool: add standard pause stats") Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Saeed Mahameed <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03net: avoid double accounting for pure zerocopy skbsTalal Ahmad6-9/+53
Track skbs containing only zerocopy data and avoid charging them to kernel memory to correctly account the memory utilization for msg_zerocopy. All of the data in such skbs is held in user pages which are already accounted to user. Before this change, they are charged again in kernel in __zerocopy_sg_from_iter. The charging in kernel is excessive because data is not being copied into skb frags. This excessive charging can lead to kernel going into memory pressure state which impacts all sockets in the system adversely. Mark pure zerocopy skbs with a SKBFL_PURE_ZEROCOPY flag and remove charge/uncharge for data in such skbs. Initially, an skb is marked pure zerocopy when it is empty and in zerocopy path. skb can then change from a pure zerocopy skb to mixed data skb (zerocopy and copy data) if it is at tail of write queue and there is room available in it and non-zerocopy data is being sent in the next sendmsg call. At this time sk_mem_charge is done for the pure zerocopied data and the pure zerocopy flag is unmarked. We found that this happens very rarely on workloads that pass MSG_ZEROCOPY. A pure zerocopy skb can later be coalesced into normal skb if they are next to each other in queue but this patch prevents coalescing from happening. This avoids complexity of charging when skb downgrades from pure zerocopy to mixed. This is also rare. In sk_wmem_free_skb, if it is a pure zerocopy skb, an sk_mem_uncharge for SKB_TRUESIZE(skb_end_offset(skb)) is done for sk_mem_charge in tcp_skb_entail for an skb without data. Testing with the msg_zerocopy.c benchmark between two hosts(100G nics) with zerocopy showed that before this patch the 'sock' variable in memory.stat for cgroup2 that tracks sum of sk_forward_alloc, sk_rmem_alloc and sk_wmem_queued is around 1822720 and with this change it is 0. This is due to no charge to sk_forward_alloc for zerocopy data and shows memory utilization for kernel is lowered. With this commit we don't see the warning we saw in previous commit which resulted in commit 84882cf72cd774cf16fd338bdbf00f69ac9f9194. Signed-off-by: Talal Ahmad <[email protected]> Acked-by: Arjun Roy <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: Willem de Bruijn <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03net:ipv6:Remove unneeded semicolonZhang Mingyu1-1/+1
Eliminate the following coccinelle check warning: net/ipv6/seg6.c:381:2-3 Reported-by: Zeal Robot <[email protected]> Signed-off-by: Zhang Mingyu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03NFC: add necessary privilege flags in netlink layerLin Ma1-0/+15
The CAP_NET_ADMIN checks are needed to prevent attackers faking a device under NCIUARTSETDRIVER and exploit privileged commands. This patch add GENL_ADMIN_PERM flags in genl_ops to fulfill the check. Except for commands like NFC_CMD_GET_DEVICE, NFC_CMD_GET_TARGET, NFC_CMD_LLC_GET_PARAMS, and NFC_CMD_GET_SE, which are mainly information- read operations. Signed-off-by: Lin Ma <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03Merge branch 'sctp-=security-hook-fixes'David S. Miller11-95/+133
Xin Long says: ==================== security: fixups for the security hooks in sctp There are a couple of problems in the currect security hooks in sctp: 1. The hooks incorrectly treat sctp_endpoint in SCTP as request_sock in TCP, while it's in fact no more than an extension of the sock, and represents the local host. It is created when sock is created, not when a conn request comes. sctp_association is actually the correct one to represent the connection, and created when a conn request arrives. 2. security_sctp_assoc_request() hook should also be called in processing COOKIE ECHO, as that's the place where the real assoc is created and used in the future. The problems above may cause accept sk, peeloff sk or client sk having the incorrect security labels. So this patchset is to change some hooks and pass asoc into them and save these secids into asoc, as well as add the missing sctp_assoc_request hook into the COOKIE ECHO processing. v1->v2: - See each patch, and thanks the help from Ondrej, Paul and Richard. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-11-03security: implement sctp_assoc_established hook in selinuxXin Long1-1/+13
Different from selinux_inet_conn_established(), it also gives the secid to asoc->peer_secid in selinux_sctp_assoc_established(), as one UDP-type socket may have more than one asocs. Note that peer_secid in asoc will save the peer secid for this asoc connection, and peer_sid in sksec will just keep the peer secid for the latest connection. So the right use should be do peeloff for UDP-type socket if there will be multiple asocs in one socket, so that the peeloff socket has the right label for its asoc. v1->v2: - call selinux_inet_conn_established() to reduce some code duplication in selinux_sctp_assoc_established(), as Ondrej suggested. - when doing peeloff, it calls sock_create() where it actually gets secid for socket from socket_sockcreate_sid(). So reuse SECSID_WILD to ensure the peeloff socket keeps using that secid after calling selinux_sctp_sk_clone() for client side. Fixes: 72e89f50084c ("security: Add support for SCTP security hooks") Reported-by: Prashanth Prahlad <[email protected]> Reviewed-by: Richard Haines <[email protected]> Tested-by: Richard Haines <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03security: add sctp_assoc_established hookXin Long6-13/+32
security_sctp_assoc_established() is added to replace security_inet_conn_established() called in sctp_sf_do_5_1E_ca(), so that asoc can be accessed in security subsystem and save the peer secid to asoc->peer_secid. v1->v2: - fix the return value of security_sctp_assoc_established() in security.h, found by kernel test robot and Ondrej. Fixes: 72e89f50084c ("security: Add support for SCTP security hooks") Reported-by: Prashanth Prahlad <[email protected]> Reviewed-by: Richard Haines <[email protected]> Tested-by: Richard Haines <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03security: call security_sctp_assoc_request in sctp_sf_do_5_1D_ceXin Long2-6/+14
The asoc created when receives the INIT chunk is a temporary one, it will be deleted after INIT_ACK chunk is replied. So for the real asoc created in sctp_sf_do_5_1D_ce() when the COOKIE_ECHO chunk is received, security_sctp_assoc_request() should also be called. v1->v2: - fix some typo and grammar errors, noticed by Ondrej. Fixes: 72e89f50084c ("security: Add support for SCTP security hooks") Reported-by: Prashanth Prahlad <[email protected]> Reviewed-by: Richard Haines <[email protected]> Tested-by: Richard Haines <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03security: pass asoc to sctp_assoc_request and sctp_sk_cloneXin Long11-77/+76
This patch is to move secid and peer_secid from endpoint to association, and pass asoc to sctp_assoc_request and sctp_sk_clone instead of ep. As ep is the local endpoint and asoc represents a connection, and in SCTP one sk/ep could have multiple asoc/connection, saving secid/peer_secid for new asoc will overwrite the old asoc's. Note that since asoc can be passed as NULL, security_sctp_assoc_request() is moved to the place right after the new_asoc is created in sctp_sf_do_5_1B_init() and sctp_sf_do_unexpected_init(). v1->v2: - fix the description of selinux_netlbl_skbuff_setsid(), as Jakub noticed. - fix the annotation in selinux_sctp_assoc_request(), as Richard Noticed. Fixes: 72e89f50084c ("security: Add support for SCTP security hooks") Reported-by: Prashanth Prahlad <[email protected]> Reviewed-by: Richard Haines <[email protected]> Tested-by: Richard Haines <[email protected]> Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03Merge branch 'kselftests-net-missing'David S. Miller1-2/+7
Hangbin Liu says: ==================== kselftests/net: add missed tests to Makefile When generating the selftest to another folder, some tests are missing as they are not added in Makefile. e.g. make -C tools/testing/selftests/ install \ TARGETS="net" INSTALL_PATH=/tmp/kselftests These pathset add them separately to make the Fixes tags less. It would also make the stable tree or downstream backport easier. If you think there is no need to add the Fixes tag for this minor issue. I can repost a new patch and merge all the fixes together. Thanks v3: no update, just rebase to latest net tree. v2: move toeplitz.sh/toeplitz_client.sh under TEST_PROGS_EXTENDED. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-11-03kselftests/net: add missed toeplitz.sh/toeplitz_client.sh to MakefileHangbin Liu1-0/+1
When generating the selftests to another folder, the toeplitz.sh and toeplitz_client.sh are missing as they are not in Makefile, e.g. make -C tools/testing/selftests/ install \ TARGETS="net" INSTALL_PATH=/tmp/kselftests Making them under TEST_PROGS_EXTENDED as they test NIC hardware features and are not intended to be run from kselftests. Fixes: 5ebfb4cc3048 ("selftests/net: toeplitz test") Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03kselftests/net: add missed vrf_strict_mode_test.sh test to MakefileHangbin Liu1-0/+1
When generating the selftests to another folder, the vrf_strict_mode_test.sh test will miss as it is not in Makefile, e.g. make -C tools/testing/selftests/ install \ TARGETS="net" INSTALL_PATH=/tmp/kselftests Fixes: 8735e6eaa438 ("selftests: add selftest for the VRF strict mode") Signed-off-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03kselftests/net: add missed SRv6 testsHangbin Liu1-0/+3
When generating the selftests to another folder, the SRv6 tests are missing as they are not in Makefile, e.g. make -C tools/testing/selftests/ install \ TARGETS="net" INSTALL_PATH=/tmp/kselftests Fixes: 03a0b567a03d ("selftests: seg6: add selftest for SRv6 End.DT46 Behavior") Fixes: 2195444e09b4 ("selftests: add selftest for the SRv6 End.DT4 behavior") Fixes: 2bc035538e16 ("selftests: add selftest for the SRv6 End.DT6 (VRF) behavior") Signed-off-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03kselftests/net: add missed setup_loopback.sh/setup_veth.sh to MakefileHangbin Liu1-1/+1
When generating the selftests to another folder, the include file setup_loopback.sh/setup_veth.sh for gro.sh/gre_gro.sh are missing as they are not in Makefile, e.g. make -C tools/testing/selftests/ install \ TARGETS="net" INSTALL_PATH=/tmp/kselftests Fixes: 7d1575014a63 ("selftests/net: GRO coalesce test") Fixes: 9af771d2ec04 ("selftests/net: allow GRO coalesce test on veth") Signed-off-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03kselftests/net: add missed icmp.sh test to MakefileHangbin Liu1-1/+1
When generating the selftests to another folder, the icmp.sh test will miss as it is not in Makefile, e.g. make -C tools/testing/selftests/ install \ TARGETS="net" INSTALL_PATH=/tmp/kselftests Fixes: 7e9838b7915e ("selftests/net: Add icmp.sh for testing ICMP dummy address responses") Signed-off-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-03amt: Remove duplicate includeJiapeng Chong1-1/+0
Clean up the following includecheck warning: ./drivers/net/amt.c: net/protocol.h is included more than once. Reported-by: Abaci Robot <[email protected]> Signed-off-by: Jiapeng Chong <[email protected]> Reviewed-by: Taehee Yoo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-11-02amt: fix error return code in amt_init()Yang Yingliang1-1/+3
Return error code when alloc_workqueue() fails in amt_init(). Reported-by: Hulk Robot <[email protected]> Signed-off-by: Yang Yingliang <[email protected]> Reviewed-by: Taehee Yoo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-02MAINTAINERS: Update ENA maintainers informationShay Agroskin1-2/+3
The ENA driver is no longer maintained by Netanel and Guy Signed-off-by: Shay Agroskin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-02net: add and use skb_unclone_keeptruesize() helperEric Dumazet3-16/+20
While commit 097b9146c0e2 ("net: fix up truesize of cloned skb in skb_prepare_for_shift()") fixed immediate issues found when KFENCE was enabled/tested, there are still similar issues, when tcp_trim_head() hits KFENCE while the master skb is cloned. This happens under heavy networking TX workloads, when the TX completion might be delayed after incoming ACK. This patch fixes the WARNING in sk_stream_kill_queues when sk->sk_mem_queued/sk->sk_forward_alloc are not zero. Fixes: d3fb45f370d9 ("mm, kfence: insert KFENCE hooks for SLAB") Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Marco Elver <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-02net: marvell: prestera: Add explicit paddingGeert Uytterhoeven1-0/+12
On m68k: In function ‘prestera_hw_build_tests’, inlined from ‘prestera_hw_switch_init’ at drivers/net/ethernet/marvell/prestera/prestera_hw.c:788:2: ././include/linux/compiler_types.h:335:38: error: call to ‘__compiletime_assert_345’ declared with attribute error: BUILD_BUG_ON failed: sizeof(struct prestera_msg_switch_attr_req) != 16 ... The driver assumes structure members are naturally aligned, but does not add explicit padding, thus breaking architectures where integral values are not always naturally aligned (e.g. on m68k, __alignof(int) is 2, not 4). Fixes: bb5dbf2cc64d5cfa ("net: marvell: prestera: add firmware v4.0 support") Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-02bnxt_en: avoid newline at end of message in NL_SET_ERR_MSG_MODWan Jiabing1-1/+1
Fix following coccicheck warning: ./drivers/net/ethernet/broadcom/bnxt/bnxt_devlink.c:446:8-56: WARNING avoid newline at end of message in NL_SET_ERR_MSG_MOD. Signed-off-by: Wan Jiabing <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nfJakub Kicinski2-1/+3
Pablo Neira Ayuso says: ==================== Netfilter/IPVS fixes for net 1) Fix mac address UAF reported by KASAN in nfnetlink_queue, from Florian Westphal. 2) Autoload genetlink IPVS on demand, from Thomas Weissschuh. * git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf: ipvs: autoload ipvs on genl access netfilter: nfnetlink_queue: fix OOB when mac header was cleared ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-02net: davinci_emac: Fix interrupt pacing disableMaxim Kiselev1-2/+14
This patch allows to use 0 for `coal->rx_coalesce_usecs` param to disable rx irq coalescing. Previously we could enable rx irq coalescing via ethtool (For ex: `ethtool -C eth0 rx-usecs 2000`) but we couldn't disable it because this part rejects 0 value: if (!coal->rx_coalesce_usecs) return -EINVAL; Fixes: 84da2658a619 ("TI DaVinci EMAC : Implement interrupt pacing functionality.") Signed-off-by: Maxim Kiselev <[email protected]> Reviewed-by: Grygorii Strashko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-02net: phy: microchip_t1: add lan87xx_config_rgmii_delay for lan87xx phyYuiko Oshino1-1/+43
Add a function to initialize phy rgmii delay according to phydev->interface. Signed-off-by: Yuiko Oshino <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-02Merge tag 'x86_core_for_v5.16_rc1' of ↵Linus Torvalds22-130/+284
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Do not #GP on userspace use of CLI/STI but pretend it was a NOP to keep old userspace from breaking. Adjust the corresponding iopl selftest to that. - Improve stack overflow warnings to say which stack got overflowed and raise the exception stack sizes to 2 pages since overflowing the single page of exception stack is very easy to do nowadays with all the tracing machinery enabled. With that, rip out the custom mapping of AMD SEV's too. - A bunch of changes in preparation for FGKASLR like supporting more than 64K section headers in the relocs tool, correct ORC lookup table size to cover the whole kernel .text and other adjustments. * tag 'x86_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/x86/iopl: Adjust to the faked iopl CLI/STI usage vmlinux.lds.h: Have ORC lookup cover entire _etext - _stext x86/boot/compressed: Avoid duplicate malloc() implementations x86/boot: Allow a "silent" kaslr random byte fetch x86/tools/relocs: Support >64K section headers x86/sev: Make the #VC exception stacks part of the default stacks storage x86: Increase exception stack sizes x86/mm/64: Improve stack overflow warnings x86/iopl: Fake iopl(3) CLI/STI usage
2021-11-02Merge tag 'net-next-for-5.16' of ↵Linus Torvalds2296-44214/+209317
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Remove socket skb caches - Add a SO_RESERVE_MEM socket op to forward allocate buffer space and avoid memory accounting overhead on each message sent - Introduce managed neighbor entries - added by control plane and resolved by the kernel for use in acceleration paths (BPF / XDP right now, HW offload users will benefit as well) - Make neighbor eviction on link down controllable by userspace to work around WiFi networks with bad roaming implementations - vrf: Rework interaction with netfilter/conntrack - fq_codel: implement L4S style ce_threshold_ect1 marking - sch: Eliminate unnecessary RCU waits in mini_qdisc_pair_swap() BPF: - Add support for new btf kind BTF_KIND_TAG, arbitrary type tagging as implemented in LLVM14 - Introduce bpf_get_branch_snapshot() to capture Last Branch Records - Implement variadic trace_printk helper - Add a new Bloomfilter map type - Track <8-byte scalar spill and refill - Access hw timestamp through BPF's __sk_buff - Disallow unprivileged BPF by default - Document BPF licensing Netfilter: - Introduce egress hook for looking at raw outgoing packets - Allow matching on and modifying inner headers / payload data - Add NFT_META_IFTYPE to match on the interface type either from ingress or egress Protocols: - Multi-Path TCP: - increase default max additional subflows to 2 - rework forward memory allocation - add getsockopts: MPTCP_INFO, MPTCP_TCPINFO, MPTCP_SUBFLOW_ADDRS - MCTP flow support allowing lower layer drivers to configure msg muxing as needed - Automatic Multicast Tunneling (AMT) driver based on RFC7450 - HSR support the redbox supervision frames (IEC-62439-3:2018) - Support for the ip6ip6 encapsulation of IOAM - Netlink interface for CAN-FD's Transmitter Delay Compensation - Support SMC-Rv2 eliminating the current same-subnet restriction, by exploiting the UDP encapsulation feature of RoCE adapters - TLS: add SM4 GCM/CCM crypto support - Bluetooth: initial support for link quality and audio/codec offload Driver APIs: - Add a batched interface for RX buffer allocation in AF_XDP buffer pool - ethtool: Add ability to control transceiver modules' power mode - phy: Introduce supported interfaces bitmap to express MAC capabilities and simplify PHY code - Drop rtnl_lock from DSA .port_fdb_{add,del} callbacks New drivers: - WiFi driver for Realtek 8852AE 802.11ax devices (rtw89) - Ethernet driver for ASIX AX88796C SPI device (x88796c) Drivers: - Broadcom PHYs - support 72165, 7712 16nm PHYs - support IDDQ-SR for additional power savings - PHY support for QCA8081, QCA9561 PHYs - NXP DPAA2: support for IRQ coalescing - NXP Ethernet (enetc): support for software TCP segmentation - Renesas Ethernet (ravb) - support DMAC and EMAC blocks of Gigabit-capable IP found on RZ/G2L SoC - Intel 100G Ethernet - support for eswitch offload of TC/OvS flow API, including offload of GRE, VxLAN, Geneve tunneling - support application device queues - ability to assign Rx and Tx queues to application threads - PTP and PPS (pulse-per-second) extensions - Broadcom Ethernet (bnxt) - devlink health reporting and device reload extensions - Mellanox Ethernet (mlx5) - offload macvlan interfaces - support HW offload of TC rules involving OVS internal ports - support HW-GRO and header/data split - support application device queues - Marvell OcteonTx2: - add XDP support for PF - add PTP support for VF - Qualcomm Ethernet switch (qca8k): support for QCA8328 - Realtek Ethernet DSA switch (rtl8366rb) - support bridge offload - support STP, fast aging, disabling address learning - support for Realtek RTL8365MB-VC, a 4+1 port 10M/100M/1GE switch - Mellanox Ethernet/IB switch (mlxsw) - multi-level qdisc hierarchy offload (e.g. RED, prio and shaping) - offload root TBF qdisc as port shaper - support multiple routing interface MAC address prefixes - support for IP-in-IP with IPv6 underlay - MediaTek WiFi (mt76) - mt7921 - ASPM, 6GHz, SDIO and testmode support - mt7915 - LED and TWT support - Qualcomm WiFi (ath11k) - include channel rx and tx time in survey dump statistics - support for 80P80 and 160 MHz bandwidths - support channel 2 in 6 GHz band - spectral scan support for QCN9074 - support for rx decapsulation offload (data frames in 802.3 format) - Qualcomm phone SoC WiFi (wcn36xx) - enable Idle Mode Power Save (IMPS) to reduce power consumption during idle - Bluetooth driver support for MediaTek MT7922 and MT7921 - Enable support for AOSP Bluetooth extension in Qualcomm WCN399x and Realtek 8822C/8852A - Microsoft vNIC driver (mana) - support hibernation and kexec - Google vNIC driver (gve) - support for jumbo frames - implement Rx page reuse Refactor: - Make all writes to netdev->dev_addr go thru helpers, so that we can add this address to the address rbtree and handle the updates - Various TCP cleanups and optimizations including improvements to CPU cache use - Simplify the gnet_stats, Qdisc stats' handling and remove qdisc->running sequence counter - Driver changes and API updates to address devlink locking deficiencies" * tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2122 commits) Revert "net: avoid double accounting for pure zerocopy skbs" selftests: net: add arp_ndisc_evict_nocarrier net: ndisc: introduce ndisc_evict_nocarrier sysctl parameter net: arp: introduce arp_evict_nocarrier sysctl parameter libbpf: Deprecate AF_XDP support kbuild: Unify options for BTF generation for vmlinux and modules selftests/bpf: Add a testcase for 64-bit bounds propagation issue. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. net: vmxnet3: remove multiple false checks in vmxnet3_ethtool.c net: avoid double accounting for pure zerocopy skbs tcp: rename sk_wmem_free_skb netdevsim: fix uninit value in nsim_drv_configure_vfs() selftests/bpf: Fix also no-alu32 strobemeta selftest bpf: Add missing map_delete_elem method to bloom filter map selftests/bpf: Add bloom map success test for userspace calls bpf: Add alignment padding for "map_extra" + consolidate holes bpf: Bloom filter map naming fixups selftests/bpf: Add test cases for struct_ops prog bpf: Add dummy BPF STRUCT_OPS for test purpose ...
2021-11-01Revert "net: avoid double accounting for pure zerocopy skbs"Jakub Kicinski6-53/+9
This reverts commit f1a456f8f3fc5828d8abcad941860380ae147b1d. WARNING: CPU: 1 PID: 6819 at net/core/skbuff.c:5429 skb_try_coalesce+0x78b/0x7e0 CPU: 1 PID: 6819 Comm: xxxxxxx Kdump: loaded Tainted: G S 5.15.0-04194-gd852503f7711 #16 RIP: 0010:skb_try_coalesce+0x78b/0x7e0 Code: e8 2a bf 41 ff 44 8b b3 bc 00 00 00 48 8b 7c 24 30 e8 19 c0 41 ff 44 89 f0 48 03 83 c0 00 00 00 48 89 44 24 40 e9 47 fb ff ff <0f> 0b e9 ca fc ff ff 4c 8d 70 ff 48 83 c0 07 48 89 44 24 38 e9 61 RSP: 0018:ffff88881f449688 EFLAGS: 00010282 RAX: 00000000fffffe96 RBX: ffff8881566e4460 RCX: ffffffff82079f7e RDX: 0000000000000003 RSI: dffffc0000000000 RDI: ffff8881566e47b0 RBP: ffff8881566e46e0 R08: ffffed102619235d R09: ffffed102619235d R10: ffff888130c91ae3 R11: ffffed102619235c R12: ffff88881f4498a0 R13: 0000000000000056 R14: 0000000000000009 R15: ffff888130c91ac0 FS: 00007fec2cbb9700(0000) GS:ffff88881f440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fec1b060d80 CR3: 00000003acf94005 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> tcp_try_coalesce+0xeb/0x290 ? tcp_parse_options+0x610/0x610 ? mark_held_locks+0x79/0xa0 tcp_queue_rcv+0x69/0x2f0 tcp_rcv_established+0xa49/0xd40 ? tcp_data_queue+0x18a0/0x18a0 tcp_v6_do_rcv+0x1c9/0x880 ? rt6_mtu_change_route+0x100/0x100 tcp_v6_rcv+0x1624/0x1830 Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-01Merge branch 'linus' of ↵Linus Torvalds68-1158/+2130
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Delay boot-up self-test for built-in algorithms Algorithms: - Remove fallback path on arm64 as SIMD now runs with softirq off Drivers: - Add Keem Bay OCS ECC Driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (61 commits) crypto: testmgr - fix wrong key length for pkcs1pad crypto: pcrypt - Delay write to padata->info crypto: ccp - Make use of the helper macro kthread_run() crypto: sa2ul - Use the defined variable to clean code crypto: s5p-sss - Add error handling in s5p_aes_probe() crypto: keembay-ocs-ecc - Add Keem Bay OCS ECC Driver dt-bindings: crypto: Add Keem Bay ECC bindings crypto: ecc - Export additional helper functions crypto: ecc - Move ecc.h to include/crypto/internal crypto: engine - Add KPP Support to Crypto Engine crypto: api - Do not create test larvals if manager is disabled crypto: tcrypt - fix skcipher multi-buffer tests for 1420B blocks hwrng: s390 - replace snprintf in show functions with sysfs_emit crypto: octeontx2 - set assoclen in aead_do_fallback() crypto: ccp - Fix whitespace in sev_cmd_buffer_len() hwrng: mtk - Force runtime pm ops for sleep ops crypto: testmgr - Only disable migration in crypto_disable_simd_for_test() crypto: qat - share adf_enable_pf2vf_comms() from adf_pf2vf_msg.c crypto: qat - extract send and wait from adf_vf2pf_request_version() crypto: qat - add VF and PF wrappers to common send function ...
2021-11-01Merge tag 'audit-pr-20211101' of ↵Linus Torvalds23-98/+184
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit updates from Paul Moore: "Add some additional audit logging to capture the openat2() syscall open_how struct info. Previous variations of the open()/openat() syscalls allowed audit admins to inspect the syscall args to get the information contained in the new open_how struct used in openat2()" * tag 'audit-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: return early if the filter rule has a lower priority audit: add OPENAT2 record to list "how" info audit: add support for the openat2 syscall audit: replace magic audit syscall class numbers with macros lsm_audit: avoid overloading the "key" audit field audit: Convert to SPDX identifier audit: rename struct node to struct audit_node to prevent future name collisions
2021-11-01Merge tag 'selinux-pr-20211101' of ↵Linus Torvalds28-421/+884
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Add LSM/SELinux/Smack controls and auditing for io-uring. As usual, the individual commit descriptions have more detail, but we were basically missing two things which we're adding here: + establishment of a proper audit context so that auditing of io-uring ops works similarly to how it does for syscalls (with some io-uring additions because io-uring ops are *not* syscalls) + additional LSM hooks to enable access control points for some of the more unusual io-uring features, e.g. credential overrides. The additional audit callouts and LSM hooks were done in conjunction with the io-uring folks, based on conversations and RFC patches earlier in the year. - Fixup the binder credential handling so that the proper credentials are used in the LSM hooks; the commit description and the code comment which is removed in these patches are helpful to understand the background and why this is the proper fix. - Enable SELinux genfscon policy support for securityfs, allowing improved SELinux filesystem labeling for other subsystems which make use of securityfs, e.g. IMA. * tag 'selinux-pr-20211101' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: security: Return xattr name from security_dentry_init_security() selinux: fix a sock regression in selinux_ip_postroute_compat() binder: use cred instead of task for getsecid binder: use cred instead of task for selinux checks binder: use euid from cred instead of using task LSM: Avoid warnings about potentially unused hook variables selinux: fix all of the W=1 build warnings selinux: make better use of the nf_hook_state passed to the NF hooks selinux: fix race condition when computing ocontext SIDs selinux: remove unneeded ipv6 hook wrappers selinux: remove the SELinux lockdown implementation selinux: enable genfscon labeling for securityfs Smack: Brutalist io_uring support selinux: add support for the io_uring access controls lsm,io_uring: add LSM hooks to io_uring io_uring: convert io_uring to the secure anon inode interface fs: add anon_inode_getfile_secure() similar to anon_inode_getfd_secure() audit: add filtering for io_uring records audit,io_uring,io-wq: add some basic audit support to io_uring audit: prepare audit_context for use in calling contexts beyond syscalls
2021-11-01Merge tag 'rcu.2021.11.01a' of ↵Linus Torvalds22-200/+273
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu Pull RCU updates from Paul McKenney: - Miscellaneous fixes - Torture-test updates for smp_call_function(), most notably improved checking of module parameters. - Tasks-trace RCU updates that fix a number of rare but important race-condition bugs. - Other torture-test updates, most notably better checking of module parameters. In addition, rcutorture may once again be run on CONFIG_PREEMPT_RT kernels. - Torture-test scripting updates, most notably specifying the new CONFIG_KCSAN_STRICT kconfig option rather than maintaining an ever-changing list of individual KCSAN kconfig options. * tag 'rcu.2021.11.01a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu: (46 commits) rcu: Fix rcu_dynticks_curr_cpu_in_eqs() vs noinstr rcu: Always inline rcu_dynticks_task*_{enter,exit}() torture: Make kvm-remote.sh print size of downloaded tarball torture: Allot 1G of memory for scftorture runs tools/rcu: Add an extract-stall script scftorture: Warn on individual scf_torture_init() error conditions scftorture: Count reschedule IPIs scftorture: Account for weight_resched when checking for all zeroes scftorture: Shut down if nonsensical arguments given scftorture: Allow zero weight to exclude an smp_call_function*() category rcu: Avoid unneeded function call in rcu_read_unlock() rcu-tasks: Update comments to cond_resched_tasks_rcu_qs() rcu-tasks: Fix IPI failure handling in trc_wait_for_one_reader rcu-tasks: Fix read-side primitives comment for call_rcu_tasks_trace rcu-tasks: Clarify read side section info for rcu_tasks_rude GP primitives rcu-tasks: Correct comparisons for CPU numbers in show_stalled_task_trace rcu-tasks: Correct firstreport usage in check_all_holdout_tasks_trace rcu-tasks: Fix s/rcu_add_holdout/trc_add_holdout/ typo in comment rcu-tasks: Move RTGS_WAIT_CBS to beginning of rcu_tasks_kthread() loop rcu-tasks: Fix s/instruction/instructions/ typo in comment ...
2021-11-01Merge tag 'trace-v5.16' of ↵Linus Torvalds135-1525/+3008
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - kprobes: Restructured stack unwinder to show properly on x86 when a stack dump happens from a kretprobe callback. - Fix to bootconfig parsing - Have tracefs allow owner and group permissions by default (only denying others). There's been pressure to allow non root to tracefs in a controlled fashion, and using groups is probably the safest. - Bootconfig memory managament updates. - Bootconfig clean up to have the tools directory be less dependent on changes in the kernel tree. - Allow perf to be traced by function tracer. - Rewrite of function graph tracer to be a callback from the function tracer instead of having its own trampoline (this change will happen on an arch by arch basis, and currently only x86_64 implements it). - Allow multiple direct trampolines (bpf hooks to functions) be batched together in one synchronization. - Allow histogram triggers to add variables that can perform calculations against the event's fields. - Use the linker to determine architecture callbacks from the ftrace trampoline to allow for proper parameter prototypes and prevent warnings from the compiler. - Extend histogram triggers to key off of variables. - Have trace recursion use bit magic to determine preempt context over if branches. - Have trace recursion disable preemption as all use cases do anyway. - Added testing for verification of tracing utilities. - Various small clean ups and fixes. * tag 'trace-v5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (101 commits) tracing/histogram: Fix semicolon.cocci warnings tracing/histogram: Fix documentation inline emphasis warning tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker together tracing: Show size of requested perf buffer bootconfig: Initialize ret in xbc_parse_tree() ftrace: do CPU checking after preemption disabled ftrace: disable preemption when recursion locked tracing/histogram: Document expression arithmetic and constants tracing/histogram: Optimize division by a power of 2 tracing/histogram: Covert expr to const if both operands are constants tracing/histogram: Simplify handling of .sym-offset in expressions tracing: Fix operator precedence for hist triggers expression tracing: Add division and multiplication support for hist triggers tracing: Add support for creating hist trigger variables from literal selftests/ftrace: Stop tracing while reading the trace file by default MAINTAINERS: Update KPROBES and TRACING entries test_kprobes: Move it from kernel/ to lib/ docs, kprobes: Remove invalid URL and add new reference samples/kretprobes: Fix return value if register_kretprobe() failed lib/bootconfig: Fix the xbc_get_info kerneldoc ...
2021-11-01Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski19-39/+233
Merge in the fixes we had queued in case there was another -rc. Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-01Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextJakub Kicinski280-5879/+11791
Alexei Starovoitov says: ==================== pull-request: bpf-next 2021-11-01 We've added 181 non-merge commits during the last 28 day(s) which contain a total of 280 files changed, 11791 insertions(+), 5879 deletions(-). The main changes are: 1) Fix bpf verifier propagation of 64-bit bounds, from Alexei. 2) Parallelize bpf test_progs, from Yucong and Andrii. 3) Deprecate various libbpf apis including af_xdp, from Andrii, Hengqi, Magnus. 4) Improve bpf selftests on s390, from Ilya. 5) bloomfilter bpf map type, from Joanne. 6) Big improvements to JIT tests especially on Mips, from Johan. 7) Support kernel module function calls from bpf, from Kumar. 8) Support typeless and weak ksym in light skeleton, from Kumar. 9) Disallow unprivileged bpf by default, from Pawan. 10) BTF_KIND_DECL_TAG support, from Yonghong. 11) Various bpftool cleanups, from Quentin. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (181 commits) libbpf: Deprecate AF_XDP support kbuild: Unify options for BTF generation for vmlinux and modules selftests/bpf: Add a testcase for 64-bit bounds propagation issue. bpf: Fix propagation of signed bounds from 64-bit min/max into 32-bit. bpf: Fix propagation of bounds from 64-bit min/max into 32-bit and var_off. selftests/bpf: Fix also no-alu32 strobemeta selftest bpf: Add missing map_delete_elem method to bloom filter map selftests/bpf: Add bloom map success test for userspace calls bpf: Add alignment padding for "map_extra" + consolidate holes bpf: Bloom filter map naming fixups selftests/bpf: Add test cases for struct_ops prog bpf: Add dummy BPF STRUCT_OPS for test purpose bpf: Factor out helpers for ctx access checking bpf: Factor out a helper to prepare trampoline for struct_ops prog selftests, bpf: Fix broken riscv build riscv, libbpf: Add RISC-V (RV64) support to bpf_tracing.h tools, build: Add RISC-V to HOSTARCH parsing riscv, bpf: Increase the maximum number of iterations selftests, bpf: Add one test for sockmap with strparser selftests, bpf: Fix test_txmsg_ingress_parser error ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-01Merge branch 'make-neighbor-eviction-controllable-by-userspace'Jakub Kicinski11-2/+281
James Prestwood says: ==================== Make neighbor eviction controllable by userspace ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-01selftests: net: add arp_ndisc_evict_nocarrierJames Prestwood1-0/+220
This tests the sysctl options for ARP/ND: /net/ipv4/conf/<iface>/arp_evict_nocarrier /net/ipv4/conf/all/arp_evict_nocarrier /net/ipv6/conf/<iface>/ndisc_evict_nocarrier /net/ipv6/conf/all/ndisc_evict_nocarrier Signed-off-by: James Prestwood <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-01net: ndisc: introduce ndisc_evict_nocarrier sysctl parameterJames Prestwood5-1/+34
In most situations the neighbor discovery cache should be cleared on a NOCARRIER event which is currently done unconditionally. But for wireless roams the neighbor discovery cache can and should remain intact since the underlying network has not changed. This patch introduces a sysctl option ndisc_evict_nocarrier which can be disabled by a wireless supplicant during a roam. This allows packets to be sent after a roam immediately without having to wait for neighbor discovery. A user reported roughly a 1 second delay after a roam before packets could be sent out (note, on IPv4). This delay was due to the ARP cache being cleared. During testing of this same scenario using IPv6 no delay was noticed, but regardless there is no reason to clear the ndisc cache for wireless roams. Signed-off-by: James Prestwood <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-11-01net: arp: introduce arp_evict_nocarrier sysctl parameterJames Prestwood6-1/+27
This change introduces a new sysctl parameter, arp_evict_nocarrier. When set (default) the ARP cache will be cleared on a NOCARRIER event. This new option has been defaulted to '1' which maintains existing behavior. Clearing the ARP cache on NOCARRIER is relatively new, introduced by: commit 859bd2ef1fc1110a8031b967ee656c53a6260a76 Author: David Ahern <[email protected]> Date: Thu Oct 11 20:33:49 2018 -0700 net: Evict neighbor entries on carrier down The reason for this changes is to prevent the ARP cache from being cleared when a wireless device roams. Specifically for wireless roams the ARP cache should not be cleared because the underlying network has not changed. Clearing the ARP cache in this case can introduce significant delays sending out packets after a roam. A user reported such a situation here: https://lore.kernel.org/linux-wireless/CACsRnHWa47zpx3D1oDq9JYnZWniS8yBwW1h0WAVZ6vrbwL_S0w@mail.gmail.com/ After some investigation it was found that the kernel was holding onto packets until ARP finished which resulted in this 1 second delay. It was also found that the first ARP who-has was never responded to, which is actually what caues the delay. This change is more or less working around this behavior, but again, there is no reason to clear the cache on a roam anyways. As for the unanswered who-has, we know the packet made it OTA since it was seen while monitoring. Why it never received a response is unknown. In any case, since this is a problem on the AP side of things all that can be done is to work around it until it is solved. Some background on testing/reproducing the packet delay: Hardware: - 2 access points configured for Fast BSS Transition (Though I don't see why regular reassociation wouldn't have the same behavior) - Wireless station running IWD as supplicant - A device on network able to respond to pings (I used one of the APs) Procedure: - Connect to first AP - Ping once to establish an ARP entry - Start a tcpdump - Roam to second AP - Wait for operstate UP event, and note the timestamp - Start pinging Results: Below is the tcpdump after UP. It was recorded the interface went UP at 10:42:01.432875. 10:42:01.461871 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28 10:42:02.497976 ARP, Request who-has 192.168.254.1 tell 192.168.254.71, length 28 10:42:02.507162 ARP, Reply 192.168.254.1 is-at ac:86:74:55:b0:20, length 46 10:42:02.507185 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 1, length 64 10:42:02.507205 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 2, length 64 10:42:02.507212 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 3, length 64 10:42:02.507219 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 4, length 64 10:42:02.507225 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 5, length 64 10:42:02.507232 IP 192.168.254.71 > 192.168.254.1: ICMP echo request, id 52792, seq 6, length 64 10:42:02.515373 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 1, length 64 10:42:02.521399 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 2, length 64 10:42:02.521612 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 3, length 64 10:42:02.521941 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 4, length 64 10:42:02.522419 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 5, length 64 10:42:02.523085 IP 192.168.254.1 > 192.168.254.71: ICMP echo reply, id 52792, seq 6, length 64 You can see the first ARP who-has went out very quickly after UP, but was never responded to. Nearly a second later the kernel retries and gets a response. Only then do the ping packets go out. If an ARP entry is manually added prior to UP (after the cache is cleared) it is seen that the first ping is never responded to, so its not only an issue with ARP but with data packets in general. As mentioned prior, the wireless interface was also monitored to verify the ping/ARP packet made it OTA which was observed to be true. Signed-off-by: James Prestwood <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>