aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-01-19net: nfc: nci: fix the wrong NCI_CORE_INIT parametersBongsu Jeon1-1/+1
Fix the code because NCI_CORE_INIT_CMD includes two parameters in NCI2.0 but there is no parameters in NCI1.x. Fixes: bcd684aace34 ("net/nfc/nci: Support NCI 2.x initial sequence") Signed-off-by: Bongsu Jeon <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19sh_eth: Fix power down vs. is_opened flag orderingGeert Uytterhoeven1-2/+2
sh_eth_close() does a synchronous power down of the device before marking it closed. Revert the order, to make sure the device is never marked opened while suspended. While at it, use pm_runtime_put() instead of pm_runtime_put_sync(), as there is no reason to do a synchronous power down. Fixes: 7fa2955ff70ce453 ("sh_eth: Fix sleeping function called from invalid context") Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Sergei Shtylyov <[email protected]> Reviewed-by: Niklas Söderlund <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19net: Disable NETIF_F_HW_TLS_RX when RXCSUM is disabledTariq Toukan2-0/+8
With NETIF_F_HW_TLS_RX packets are decrypted in HW. This cannot be logically done when RXCSUM offload is off. Fixes: 14136564c8ee ("net: Add TLS RX offload feature") Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Boris Pismenny <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19Merge branch ↵Jakub Kicinski8-67/+12
'net-support-sctp-crc-csum-offload-for-tunneling-packets-in-some-drivers' Xin Long says: ==================== net: support SCTP CRC csum offload for tunneling packets in some drivers This patchset introduces inline function skb_csum_is_sctp(), and uses it to validate it's a sctp CRC csum offload packet, to make SCTP CRC csum offload for tunneling packets supported in some HW drivers. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19net: ixgbevf: use skb_csum_is_sctp instead of protocol checkXin Long1-13/+1
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC checksum offload packet, and yet it also makes ixgbevf support SCTP CRC checksum offload for UDP and GRE encapped packets, just as it does in igb driver. Signed-off-by: Xin Long <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19net: ixgbe: use skb_csum_is_sctp instead of protocol checkXin Long1-13/+1
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC checksum offload packet, and yet it also makes ixgbe support SCTP CRC checksum offload for UDP and GRE encapped packets, just as it does in igb driver. Signed-off-by: Xin Long <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19net: igc: use skb_csum_is_sctp instead of protocol checkXin Long1-13/+1
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC checksum offload packet, and yet it also makes igc support SCTP CRC checksum offload for UDP and GRE encapped packets, just as it does in igb driver. Signed-off-by: Xin Long <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19net: igbvf: use skb_csum_is_sctp instead of protocol checkXin Long1-13/+1
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC checksum offload packet, and yet it also makes igbvf support SCTP CRC checksum offload for UDP and GRE encapped packets, just as it does in igb driver. Signed-off-by: Xin Long <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19net: igb: use skb_csum_is_sctp instead of protocol checkXin Long1-13/+1
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC checksum offload packet, and there is no need to parse the packet to check its proto field, especially when it's a UDP or GRE encapped packet. So this patch also makes igb support SCTP CRC checksum offload for UDP and GRE encapped packets. Signed-off-by: Xin Long <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19net: add inline function skb_csum_is_sctpXin Long3-2/+7
This patch is to define a inline function skb_csum_is_sctp(), and also replace all places where it checks if it's a SCTP CSUM skb. This function would be used later in many networking drivers in the following patches. Suggested-by: Alexander Duyck <[email protected]> Signed-off-by: Xin Long <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19Merge branch 'ipv4-ensure-ecn-bits-don-t-influence-source-address-validation'Jakub Kicinski2-2/+3
Guillaume Nault says: ==================== ipv4: Ensure ECN bits don't influence source address validation Functions that end up calling fib_table_lookup() should clear the ECN bits from the TOS, otherwise ECT(0) and ECT(1) packets can be treated differently. Most functions already clear the ECN bits, but there are a few cases where this is not done. This series only fixes the ones related to source address validation. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19netfilter: rpfilter: mask ecn bits before fib lookupGuillaume Nault1-1/+1
RT_TOS() only masks one of the two ECN bits. Therefore rpfilter_mt() treats Not-ECT or ECT(1) packets in a different way than those with ECT(0) or CE. Reproducer: Create two netns, connected with a veth: $ ip netns add ns0 $ ip netns add ns1 $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1 $ ip -netns ns0 link set dev veth01 up $ ip -netns ns1 link set dev veth10 up $ ip -netns ns0 address add 192.0.2.10/32 dev veth01 $ ip -netns ns1 address add 192.0.2.11/32 dev veth10 Add a route to ns1 in ns0: $ ip -netns ns0 route add 192.0.2.11/32 dev veth01 In ns1, only packets with TOS 4 can be routed to ns0: $ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10 Ping from ns0 to ns1 works regardless of the ECN bits, as long as TOS is 4: $ ip netns exec ns0 ping -Q 4 192.0.2.11 # TOS 4, Not-ECT ... 0% packet loss ... $ ip netns exec ns0 ping -Q 5 192.0.2.11 # TOS 4, ECT(1) ... 0% packet loss ... $ ip netns exec ns0 ping -Q 6 192.0.2.11 # TOS 4, ECT(0) ... 0% packet loss ... $ ip netns exec ns0 ping -Q 7 192.0.2.11 # TOS 4, CE ... 0% packet loss ... Now use iptable's rpfilter module in ns1: $ ip netns exec ns1 iptables-legacy -t raw -A PREROUTING -m rpfilter --invert -j DROP Not-ECT and ECT(1) packets still pass: $ ip netns exec ns0 ping -Q 4 192.0.2.11 # TOS 4, Not-ECT ... 0% packet loss ... $ ip netns exec ns0 ping -Q 5 192.0.2.11 # TOS 4, ECT(1) ... 0% packet loss ... But ECT(0) and ECN packets are dropped: $ ip netns exec ns0 ping -Q 6 192.0.2.11 # TOS 4, ECT(0) ... 100% packet loss ... $ ip netns exec ns0 ping -Q 7 192.0.2.11 # TOS 4, CE ... 100% packet loss ... After this patch, rpfilter doesn't drop ECT(0) and CE packets anymore. Fixes: 8f97339d3feb ("netfilter: add ipv4 reverse path filter match") Signed-off-by: Guillaume Nault <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19udp: mask TOS bits in udp_v4_early_demux()Guillaume Nault1-1/+2
udp_v4_early_demux() is the only function that calls ip_mc_validate_source() with a TOS that hasn't been masked with IPTOS_RT_MASK. This results in different behaviours for incoming multicast UDPv4 packets, depending on if ip_mc_validate_source() is called from the early-demux path (udp_v4_early_demux) or from the regular input path (ip_route_input_noref). ECN would normally not be used with UDP multicast packets, so the practical consequences should be limited on that side. However, IPTOS_RT_MASK is used to also masks the TOS' high order bits, to align with the non-early-demux path behaviour. Reproducer: Setup two netns, connected with veth: $ ip netns add ns0 $ ip netns add ns1 $ ip -netns ns0 link set dev lo up $ ip -netns ns1 link set dev lo up $ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1 $ ip -netns ns0 link set dev veth01 up $ ip -netns ns1 link set dev veth10 up $ ip -netns ns0 address add 192.0.2.10 peer 192.0.2.11/32 dev veth01 $ ip -netns ns1 address add 192.0.2.11 peer 192.0.2.10/32 dev veth10 In ns0, add route to multicast address 224.0.2.0/24 using source address 198.51.100.10: $ ip -netns ns0 address add 198.51.100.10/32 dev lo $ ip -netns ns0 route add 224.0.2.0/24 dev veth01 src 198.51.100.10 In ns1, define route to 198.51.100.10, only for packets with TOS 4: $ ip -netns ns1 route add 198.51.100.10/32 tos 4 dev veth10 Also activate rp_filter in ns1, so that incoming packets not matching the above route get dropped: $ ip netns exec ns1 sysctl -wq net.ipv4.conf.veth10.rp_filter=1 Now try to receive packets on 224.0.2.11: $ ip netns exec ns1 socat UDP-RECVFROM:1111,ip-add-membership=224.0.2.11:veth10,ignoreeof - In ns0, send packet to 224.0.2.11 with TOS 4 and ECT(0) (that is, tos 6 for socat): $ echo test0 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6 The "test0" message is properly received by socat in ns1, because early-demux has no cached dst to use, so source address validation is done by ip_route_input_mc(), which receives a TOS that has the ECN bits masked. Now send another packet to 224.0.2.11, still with TOS 4 and ECT(0): $ echo test1 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6 The "test1" message isn't received by socat in ns1, because, now, early-demux has a cached dst to use and calls ip_mc_validate_source() immediately, without masking the ECN bits. Fixes: bc044e8db796 ("udp: perform source validation for mcast early demux") Signed-off-by: Guillaume Nault <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19xsk: Clear pool even for inactive queuesMaxim Mikityanskiy1-2/+2
The number of queues can change by other means, rather than ethtool. For example, attaching an mqprio qdisc with num_tc > 1 leads to creating multiple sets of TX queues, which may be then destroyed when mqprio is deleted. If an AF_XDP socket is created while mqprio is active, dev->_tx[queue_id].pool will be filled, but then real_num_tx_queues may decrease with deletion of mqprio, which will mean that the pool won't be NULLed, and a further increase of the number of TX queues may expose a dangling pointer. To avoid any potential misbehavior, this commit clears pool for RX and TX queues, regardless of real_num_*_queues, still taking into consideration num_*_queues to avoid overflows. Fixes: 1c1efc2af158 ("xsk: Create and free buffer pool independently from umem") Fixes: a41b4f3c58dd ("xsk: simplify xdp_clear_umem_at_qid implementation") Signed-off-by: Maxim Mikityanskiy <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Björn Töpel <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-01-19Merge tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+3
Pull task_work fix from Jens Axboe: "The TIF_NOTIFY_SIGNAL change inadvertently removed the unconditional task_work run we had in get_signal(). This caused a regression for some setups, since we're relying on eg ____fput() being run to close and release, for example, a pipe and wake the other end. For 5.11, I prefer the simple solution of just reinstating the unconditional run, even if it conceptually doesn't make much sense - if you need that kind of guarantee, you should be using TWA_SIGNAL instead of TWA_NOTIFY. But it's the trivial fix for 5.11, and would ensure that other potential gotchas/assumptions for task_work don't regress for 5.11. We're looking into further simplifying the task_work notifications for 5.12 which would resolve that too" * tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block: task_work: unconditionally run task_work from get_signal()
2021-01-19bpf: Fix helper bpf_map_peek_elem_proto pointing to wrong callbackMircea Cirjaliu1-1/+1
I assume this was obtained by copy/paste. Point it to bpf_map_peek_elem() instead of bpf_map_pop_elem(). In practice it may have been less likely hit when under JIT given shielded via 84430d4232c3 ("bpf, verifier: avoid retpoline for map push/pop/peek operation"). Fixes: f1a2e44a3aec ("bpf: add queue and stack maps") Signed-off-by: Mircea Cirjaliu <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Cc: Mauricio Vasquez <[email protected]> Link: https://lore.kernel.org/bpf/AM7PR02MB6082663DFDCCE8DA7A6DD6B1BBA30@AM7PR02MB6082.eurprd02.prod.outlook.com
2021-01-19Merge tag 'nfsd-5.11-2' of ↵Linus Torvalds3-9/+61
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Avoid exposing parent of root directory in NFSv3 READDIRPLUS results - Fix a tracepoint change that went in the initial 5.11 merge * tag 'nfsd-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: SUNRPC: Move the svc_xdr_recvfrom tracepoint again nfsd4: readdirplus shouldn't return parent of export
2021-01-19Merge tag 'hyperv-fixes-signed-20210119' of ↵Linus Torvalds1-3/+26
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fix from Wei Liu: "One patch from Dexuan to fix clockevent initialization" * tag 'hyperv-fixes-signed-20210119' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Initialize clockevents after LAPIC is initialized
2021-01-19Merge branch 'sh_eth-fix-reboot-crash'Jakub Kicinski3-2/+33
Geert Uytterhoeven says: ==================== sh_eth: Fix reboot crash This patch fixes a regression v5.11-rc1, where rebooting while a sh_eth device is not opened will cause a crash. Changes compared to v1: - Export mdiobb_{read,write}(), - Call mdiobb_{read,write}() now they are exported, - Use mii_bus.parent to avoid bb_info.dev copy, - Drop RFC state. Alternatively, mdio-bitbang could provide Runtime PM-aware wrappers itself, and use them either manually (through a new parameter to alloc_mdio_bitbang(), or a new alloc_mdio_bitbang_*() function), or automatically (e.g. if pm_runtime_enabled() returns true). Note that the latter requires a "struct device *" parameter to operate on. Currently there are only two drivers that call alloc_mdio_bitbang() and use Runtime PM: the Renesas sh_eth and ravb drivers. This series fixes the former, while the latter is not affected (it keeps the device powered all the time between driver probe and driver unbind, and changing that seems to be non-trivial). ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19sh_eth: Make PHY access aware of Runtime PM to fix reboot crashGeert Uytterhoeven1-0/+26
Wolfram reports that his R-Car H2-based Lager board can no longer be rebooted in v5.11-rc1, as it crashes with an imprecise external abort. The issue can be reproduced on other boards (e.g. Koelsch with R-Car M2-W) too, if CONFIG_IP_PNP is disabled, and the Ethernet interface is down at reboot time: Unhandled fault: imprecise external abort (0x1406) at 0x00000000 pgd = (ptrval) [00000000] *pgd=422b6835, *pte=00000000, *ppte=00000000 Internal error: : 1406 [#1] ARM Modules linked in: CPU: 0 PID: 1105 Comm: init Tainted: G W 5.10.0-rc1-00402-ge2f016cf7751 #1048 Hardware name: Generic R-Car Gen2 (Flattened Device Tree) PC is at sh_mdio_ctrl+0x44/0x60 LR is at sh_mmd_ctrl+0x20/0x24 ... Backtrace: [<c0451f30>] (sh_mdio_ctrl) from [<c0451fd4>] (sh_mmd_ctrl+0x20/0x24) r7:0000001f r6:00000020 r5:00000002 r4:c22a1dc4 [<c0451fb4>] (sh_mmd_ctrl) from [<c044fc18>] (mdiobb_cmd+0x38/0xa8) [<c044fbe0>] (mdiobb_cmd) from [<c044feb8>] (mdiobb_read+0x58/0xdc) r9:c229f844 r8:c0c329dc r7:c221e000 r6:00000001 r5:c22a1dc4 r4:00000001 [<c044fe60>] (mdiobb_read) from [<c044c854>] (__mdiobus_read+0x74/0xe0) r7:0000001f r6:00000001 r5:c221e000 r4:c221e000 [<c044c7e0>] (__mdiobus_read) from [<c044c9d8>] (mdiobus_read+0x40/0x54) r7:0000001f r6:00000001 r5:c221e000 r4:c221e458 [<c044c998>] (mdiobus_read) from [<c044d678>] (phy_read+0x1c/0x20) r7:ffffe000 r6:c221e470 r5:00000200 r4:c229f800 [<c044d65c>] (phy_read) from [<c044d94c>] (kszphy_config_intr+0x44/0x80) [<c044d908>] (kszphy_config_intr) from [<c044694c>] (phy_disable_interrupts+0x44/0x50) r5:c229f800 r4:c229f800 [<c0446908>] (phy_disable_interrupts) from [<c0449370>] (phy_shutdown+0x18/0x1c) r5:c229f800 r4:c229f804 [<c0449358>] (phy_shutdown) from [<c040066c>] (device_shutdown+0x168/0x1f8) [<c0400504>] (device_shutdown) from [<c013de44>] (kernel_restart_prepare+0x3c/0x48) r9:c22d2000 r8:c0100264 r7:c0b0d034 r6:00000000 r5:4321fedc r4:00000000 [<c013de08>] (kernel_restart_prepare) from [<c013dee0>] (kernel_restart+0x1c/0x60) [<c013dec4>] (kernel_restart) from [<c013e1d8>] (__do_sys_reboot+0x168/0x208) r5:4321fedc r4:01234567 [<c013e070>] (__do_sys_reboot) from [<c013e2e8>] (sys_reboot+0x18/0x1c) r7:00000058 r6:00000000 r5:00000000 r4:00000000 [<c013e2d0>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54) As of commit e2f016cf775129c0 ("net: phy: add a shutdown procedure"), system reboot calls phy_disable_interrupts() during shutdown. As this happens unconditionally, the PHY registers may be accessed while the device is suspended, causing undefined behavior, which may crash the system. Fix this by wrapping the PHY bitbang accessors in the sh_eth driver by wrappers that take care of Runtime PM, to resume the device when needed. Reported-by: Wolfram Sang <[email protected]> Suggested-by: Andrew Lunn <[email protected]> Signed-off-by: Geert Uytterhoeven <[email protected]> Tested-by: Wolfram Sang <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19mdio-bitbang: Export mdiobb_{read,write}()Geert Uytterhoeven2-2/+7
Export mdiobb_read() and mdiobb_write(), so Ethernet controller drivers can call them from their MDIO read/write wrappers. Signed-off-by: Geert Uytterhoeven <[email protected]> Tested-by: Wolfram Sang <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19mdio, phy: fix -Wshadow warnings triggered by nested container_of()Alexander Lobakin2-7/+23
container_of() macro hides a local variable '__mptr' inside. This becomes a problem when several container_of() are nested in each other within single line or plain macros. As C preprocessor doesn't support generating random variable names, the sole solution is to avoid defining macros that consist only of container_of() calls, or they will self-shadow '__mptr' each time: In file included from ./include/linux/bitmap.h:10, from drivers/net/phy/phy_device.c:12: drivers/net/phy/phy_device.c: In function ‘phy_device_release’: ./include/linux/kernel.h:693:8: warning: declaration of ‘__mptr’ shadows a previous local [-Wshadow] 693 | void *__mptr = (void *)(ptr); \ | ^~~~~~ ./include/linux/phy.h:647:26: note: in expansion of macro ‘container_of’ 647 | #define to_phy_device(d) container_of(to_mdio_device(d), \ | ^~~~~~~~~~~~ ./include/linux/mdio.h:52:27: note: in expansion of macro ‘container_of’ 52 | #define to_mdio_device(d) container_of(d, struct mdio_device, dev) | ^~~~~~~~~~~~ ./include/linux/phy.h:647:39: note: in expansion of macro ‘to_mdio_device’ 647 | #define to_phy_device(d) container_of(to_mdio_device(d), \ | ^~~~~~~~~~~~~~ drivers/net/phy/phy_device.c:217:8: note: in expansion of macro ‘to_phy_device’ 217 | kfree(to_phy_device(dev)); | ^~~~~~~~~~~~~ ./include/linux/kernel.h:693:8: note: shadowed declaration is here 693 | void *__mptr = (void *)(ptr); \ | ^~~~~~ ./include/linux/phy.h:647:26: note: in expansion of macro ‘container_of’ 647 | #define to_phy_device(d) container_of(to_mdio_device(d), \ | ^~~~~~~~~~~~ drivers/net/phy/phy_device.c:217:8: note: in expansion of macro ‘to_phy_device’ 217 | kfree(to_phy_device(dev)); | ^~~~~~~~~~~~~ As they are declared in header files, these warnings are highly repetitive and very annoying (along with the one from linux/pci.h). Convert the related macros from linux/{mdio,phy}.h to static inlines to avoid self-shadowing and potentially improve bug-catching. No functional changes implied. Signed-off-by: Alexander Lobakin <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19net: core: devlink: use right genl user_ptr when handling port param get/setOleksandr Mazur1-2/+2
Fix incorrect user_ptr dereferencing when handling port param get/set: idx [0] stores the 'struct devlink' pointer; idx [1] stores the 'struct devlink_port' pointer; Fixes: 637989b5d77e ("devlink: Always use user_ptr[0] for devlink and simplify post_doit") CC: Parav Pandit <[email protected]> Signed-off-by: Oleksandr Mazur <[email protected]> Signed-off-by: Vadym Kochan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-19vhost_net: avoid tx queue stuck when sendmsg failsYunjian Wang1-12/+14
Currently the driver doesn't drop a packet which can't be sent by tun (e.g bad packet). In this case, the driver will always process the same packet lead to the tx queue stuck. To fix this issue: 1. in the case of persistent failure (e.g bad packet), the driver can skip this descriptor by ignoring the error. 2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM), the driver schedules the worker to try again. Signed-off-by: Yunjian Wang <[email protected]> Acked-by: Jason Wang <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18net: hns: fix variable used when DEBUG is definedTom Rix1-1/+1
When DEBUG is defined this error occurs drivers/net/ethernet/hisilicon/hns/hns_enet.c:1505:36: error: ‘struct net_device’ has no member named ‘ae_handle’; did you mean ‘rx_handler’? assert(skb->queue_mapping < ndev->ae_handle->q_num); ^~~~~~~~~ ae_handle is an element of struct hns_nic_priv, so change ndev to priv. Signed-off-by: Tom Rix <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18arcnet: fix macro name when DEBUG is definedTom Rix1-1/+1
When DEBUG is defined this error occurs drivers/net/arcnet/com20020_cs.c:70:15: error: ‘com20020_REG_W_ADDR_HI’ undeclared (first use in this function); did you mean ‘COM20020_REG_W_ADDR_HI’? ioaddr, com20020_REG_W_ADDR_HI); ^~~~~~~~~~~~~~~~~~~~~~ From reviewing the context, the suggestion is what is meant. Signed-off-by: Tom Rix <[email protected]> Acked-by: Joe Perches <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18Merge branch 'tls-device-offload-for-bond'Jakub Kicinski7-16/+211
Tariq Toukan says: ==================== TLS device offload for Bond This series opens TX and RX TLS device offload for bond interfaces. This allows bond interfaces to benefit from capable lower devices. We add a new ndo_sk_get_lower_dev() to be used to get the lower dev that corresponds to a given socket. The TLS module uses it to interact directly with the lowest device in chain, and invoke the control operations in tlsdev_ops. This means that the bond interface doesn't have his own struct tlsdev_ops instance and derived logic/callbacks. To keep simple track of the HW and SW TLS contexts, we bind each socket to a specific lower device for the socket's whole lifetime. This is logically valid (and similar to the SW kTLS behavior) in the following bond configuration, so we restrict the offload support to it: ((mode == balance-xor) or (mode == 802.3ad)) and xmit_hash_policy == layer3+4. In this design, TLS TX/RX offload feature flags of the bond device are independent from the lower devices. They reflect the current features state, but are not directly controllable. This is because the bond driver is bypassed by the call to ndo_sk_get_lower_dev(), without him knowing who the caller is. The bond TLS feature flags are set/cleared only according to the configuration of the mode and xmit_hash_policy. Bypass is true only for the control flow. Packets in fast path still go through the bond logic. The design here differs from the xfrm/ipsec offload, where the bond driver has his own copy of struct xfrmdev_ops and callbacks. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18net/tls: Except bond interface from some TLS checksTariq Toukan2-1/+3
In the tls_dev_event handler, ignore tlsdev_ops requirement for bond interfaces, they do not exist as the interaction is done directly with the lower device. Also, make the validate function pass when it's called with the upper bond interface. Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Boris Pismenny <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18net/tls: Device offload to use lowest netdevice in chainTariq Toukan1-1/+1
Do not call the tls_dev_ops of upper devices. Instead, ask them for the proper lowest device and communicate with it directly. Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Boris Pismenny <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18net/bonding: Declare TLS RX device offload supportTariq Toukan1-1/+1
Following the description in previous patch (for TX): As the bond interface is being bypassed by the TLS module, interacting directly against the lower devs, there is no way for the bond interface to disable its device offload capabilities, as long as the mode/policy config allows it. Hence, the feature flag is not directly controllable, but just reflects the offload status based on the logic under bond_sk_check(). Here we just declare RX device offload support, and expose it via the NETIF_F_HW_TLS_RX flag. Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Boris Pismenny <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18net/bonding: Implement TLS TX device offloadTariq Toukan3-2/+56
Implement TLS TX device offload for bonding interfaces. This allows kTLS sockets running on a bond to benefit from the device offload on capable lower devices. To allow a simple and fast maintenance of the TLS context in SW and lower devices, we bind the TLS socket to a specific lower dev. To achieve a behavior similar to SW kTLS, we support only balance-xor and 802.3ad modes, with xmit_hash_policy=layer3+4. This is enforced in bond_sk_check(), done in a previous patch. For the above configuration, the SW implementation keeps picking the same exact lower dev for all the socket's SKBs. The device offload behaves similarly, making the decision once at the connection creation. Per socket, the TLS module should work directly with the lowest netdev in chain, to call the tls_dev_ops operations. As the bond interface is being bypassed by the TLS module, interacting directly against the lower devs, there is no way for the bond interface to disable its device offload capabilities, as long as the mode/policy config allows it. Hence, the feature flag is not directly controllable, but just reflects the current offload status based on the logic under bond_sk_check(). Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Boris Pismenny <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18net/bonding: Take update_features call out of XFRM funcitonTariq Toukan1-9/+10
In preparation for more cases that call netdev_update_features(). While here, move the features logic to the stage where struct bond is already updated, and pass it as the only parameter to function bond_set_xfrm_features(). Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Boris Pismenny <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18net/bonding: Implement ndo_sk_get_lower_devTariq Toukan2-0/+95
Add ndo_sk_get_lower_dev() implementation for bond interfaces. Support only for the cases where the socket's and SKBs' hash yields identical value for the whole connection lifetime. Here we restrict it to L3+4 sockets only, with xmit_hash_policy==LAYER34 and bond modes xor/802.3ad. Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Boris Pismenny <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18net/bonding: Take IP hash logic into a helperTariq Toukan1-5/+11
Hash logic on L3 will be used in a downstream patch for one more use case. Take it to a function for a better code reuse. Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Boris Pismenny <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18net: netdevice: Add operation ndo_sk_get_lower_devTariq Toukan2-0/+37
ndo_sk_get_lower_dev returns the lower netdev that corresponds to a given socket. Additionally, we implement a helper netdev_sk_get_lowest_dev() to get the lowest one in chain. Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Boris Pismenny <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18net/qla3xxx: switch from 'pci_' to 'dma_' APIChristophe JAILLET1-109/+87
The wrappers in include/linux/pci-dma-compat.h should go away. The patch has been generated with the coccinelle script below and has been hand modified to replace GFP_ with a correct flag. It has been compile tested. When memory is allocated in 'ql_alloc_net_req_rsp_queues()' GFP_KERNEL can be used because it is only called from 'ql_alloc_mem_resources()' which already calls 'ql_alloc_buffer_queues()' which uses GFP_KERNEL. (see below) When memory is allocated in 'ql_alloc_buffer_queues()' GFP_KERNEL can be used because this flag is already used just a few line above. When memory is allocated in 'ql_alloc_small_buffers()' GFP_KERNEL can be used because it is only called from 'ql_alloc_mem_resources()' which already calls 'ql_alloc_buffer_queues()' which uses GFP_KERNEL. (see above) When memory is allocated in 'ql_alloc_mem_resources()' GFP_KERNEL can be used because this function already calls 'ql_alloc_buffer_queues()' which uses GFP_KERNEL. (see above) While at it, use 'dma_set_mask_and_coherent()' instead of 'dma_set_mask()/ dma_set_coherent_mask()' in order to slightly simplify code. @@ @@ - PCI_DMA_BIDIRECTIONAL + DMA_BIDIRECTIONAL @@ @@ - PCI_DMA_TODEVICE + DMA_TO_DEVICE @@ @@ - PCI_DMA_FROMDEVICE + DMA_FROM_DEVICE @@ @@ - PCI_DMA_NONE + DMA_NONE @@ expression e1, e2, e3; @@ - pci_alloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3; @@ - pci_zalloc_consistent(e1, e2, e3) + dma_alloc_coherent(&e1->dev, e2, e3, GFP_) @@ expression e1, e2, e3, e4; @@ - pci_free_consistent(e1, e2, e3, e4) + dma_free_coherent(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_single(e1, e2, e3, e4) + dma_map_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_single(e1, e2, e3, e4) + dma_unmap_single(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4, e5; @@ - pci_map_page(e1, e2, e3, e4, e5) + dma_map_page(&e1->dev, e2, e3, e4, e5) @@ expression e1, e2, e3, e4; @@ - pci_unmap_page(e1, e2, e3, e4) + dma_unmap_page(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_map_sg(e1, e2, e3, e4) + dma_map_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_unmap_sg(e1, e2, e3, e4) + dma_unmap_sg(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_cpu(e1, e2, e3, e4) + dma_sync_single_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_single_for_device(e1, e2, e3, e4) + dma_sync_single_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_cpu(e1, e2, e3, e4) + dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4) @@ expression e1, e2, e3, e4; @@ - pci_dma_sync_sg_for_device(e1, e2, e3, e4) + dma_sync_sg_for_device(&e1->dev, e2, e3, e4) @@ expression e1, e2; @@ - pci_dma_mapping_error(e1, e2) + dma_mapping_error(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_dma_mask(e1, e2) + dma_set_mask(&e1->dev, e2) @@ expression e1, e2; @@ - pci_set_consistent_dma_mask(e1, e2) + dma_set_coherent_mask(&e1->dev, e2) Signed-off-by: Christophe JAILLET <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18net_sched: fix RTNL deadlock again caused by request_module()Cong Wang3-41/+79
tcf_action_init_1() loads tc action modules automatically with request_module() after parsing the tc action names, and it drops RTNL lock and re-holds it before and after request_module(). This causes a lot of troubles, as discovered by syzbot, because we can be in the middle of batch initializations when we create an array of tc actions. One of the problem is deadlock: CPU 0 CPU 1 rtnl_lock(); for (...) { tcf_action_init_1(); -> rtnl_unlock(); -> request_module(); rtnl_lock(); for (...) { tcf_action_init_1(); -> tcf_idr_check_alloc(); // Insert one action into idr, // but it is not committed until // tcf_idr_insert_many(), then drop // the RTNL lock in the _next_ // iteration -> rtnl_unlock(); -> rtnl_lock(); -> a_o->init(); -> tcf_idr_check_alloc(); // Now waiting for the same index // to be committed -> request_module(); -> rtnl_lock() // Now waiting for RTNL lock } rtnl_unlock(); } rtnl_unlock(); This is not easy to solve, we can move the request_module() before this loop and pre-load all the modules we need for this netlink message and then do the rest initializations. So the loop breaks down to two now: for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { struct tc_action_ops *a_o; a_o = tc_action_load_ops(name, tb[i]...); ops[i - 1] = a_o; } for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) { act = tcf_action_init_1(ops[i - 1]...); } Although this looks serious, it only has been reported by syzbot, so it seems hard to trigger this by humans. And given the size of this patch, I'd suggest to make it to net-next and not to backport to stable. This patch has been tested by syzbot and tested with tdc.py by me. Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together") Reported-and-tested-by: [email protected] Reported-and-tested-by: [email protected] Reported-by: [email protected] Cc: Jiri Pirko <[email protected]> Signed-off-by: Cong Wang <[email protected]> Tested-by: Jamal Hadi Salim <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18net: phy: national: remove definition of DEBUGTom Rix1-2/+0
Defining DEBUG should only be done in development. So remove DEBUG. Signed-off-by: Tom Rix <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18tcp: fix TCP_USER_TIMEOUT with zero windowEnke Chen6-7/+14
The TCP session does not terminate with TCP_USER_TIMEOUT when data remain untransmitted due to zero window. The number of unanswered zero-window probes (tcp_probes_out) is reset to zero with incoming acks irrespective of the window size, as described in tcp_probe_timer(): RFC 1122 4.2.2.17 requires the sender to stay open indefinitely as long as the receiver continues to respond probes. We support this by default and reset icsk_probes_out with incoming ACKs. This counter, however, is the wrong one to be used in calculating the duration that the window remains closed and data remain untransmitted. Thanks to Jonathan Maxwell <[email protected]> for diagnosing the actual issue. In this patch a new timestamp is introduced for the socket in order to track the elapsed time for the zero-window probes that have not been answered with any non-zero window ack. Fixes: 9721e709fa68 ("tcp: simplify window probe aborting on USER_TIMEOUT") Reported-by: William McCall <[email protected]> Co-developed-by: Neal Cardwell <[email protected]> Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Enke Chen <[email protected]> Reviewed-by: Yuchung Cheng <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18Merge branch 'net-make-udp-tunnel-devices-support-fraglist'Jakub Kicinski3-6/+9
Xin Long says: ==================== net: make udp tunnel devices support fraglist Like GRE device, UDP tunnel devices should also support fraglist, so that some protocol (like SCTP) HW GSO that requires NETIF_F_FRAGLIST in the dev can work. Especially when the lower device support both NETIF_F_GSO_UDP_TUNNEL and NETIF_F_GSO_SCTP. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18bareudp: add NETIF_F_FRAGLIST flag for dev featuresXin Long1-2/+3
Like vxlan and geneve, bareudp also needs this dev feature to support some protocol's HW GSO. Signed-off-by: Xin Long <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18geneve: add NETIF_F_FRAGLIST flag for dev featuresXin Long1-2/+3
Some protocol HW GSO requires fraglist supported by the device, like SCTP. Without NETIF_F_FRAGLIST set in the dev features of geneve, it would have to do SW GSO before the packets enter the driver, even when the geneve dev and lower dev (like veth) both have the feature of NETIF_F_GSO_SCTP. So this patch is to add it for geneve. Signed-off-by: Xin Long <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18vxlan: add NETIF_F_FRAGLIST flag for dev featuresXin Long1-2/+3
Some protocol HW GSO requires fraglist supported by the device, like SCTP. Without NETIF_F_FRAGLIST set in the dev features of vxlan, it would have to do SW GSO before the packets enter the driver, even when the vxlan dev and lower dev (like veth) both have the feature of NETIF_F_GSO_SCTP. So this patch is to add it for vxlan. Signed-off-by: Xin Long <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18Merge branch 'ipv6-fixes-for-the-multicast-routes'Jakub Kicinski1-1/+2
Matteo Croce says: ==================== ipv6: fixes for the multicast routes Fix two wrong flags in the IPv6 multicast routes created by the autoconf code. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18ipv6: set multicast flag on the multicast routeMatteo Croce1-1/+1
The multicast route ff00::/8 is created with type RTN_UNICAST: $ ip -6 -d route unicast ::1 dev lo proto kernel scope global metric 256 pref medium unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium unicast ff00::/8 dev eth0 proto kernel scope global metric 256 pref medium Set the type to RTN_MULTICAST which is more appropriate. Fixes: e8478e80e5a7 ("net/ipv6: Save route type in rt6_info") Signed-off-by: Matteo Croce <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18ipv6: create multicast route with RTPROT_KERNELMatteo Croce1-0/+1
The ff00::/8 multicast route is created without specifying the fc_protocol field, so the default RTPROT_BOOT value is used: $ ip -6 -d route unicast ::1 dev lo proto kernel scope global metric 256 pref medium unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium unicast ff00::/8 dev eth0 proto boot scope global metric 256 pref medium As the documentation says, this value identifies routes installed during boot, but the route is created when interface is set up. Change the value to RTPROT_KERNEL which is a better value. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Matteo Croce <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18hv_netvsc: Add (more) validation for untrusted Hyper-V valuesAndrea Parri (Microsoft)4-62/+136
For additional robustness in the face of Hyper-V errors or malicious behavior, validate all values that originate from packets that Hyper-V has sent to the guest. Ensure that invalid values cannot cause indexing off the end of an array, or subvert an existing validation via integer overflow. Ensure that outgoing packets do not have any leftover guest memory that has not been zeroed out. Reported-by: Juan Vazquez <[email protected]> Signed-off-by: Andrea Parri (Microsoft) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18net: bridge: check vlan with eth_type_vlan() methodMenglong Dong3-12/+5
Replace some checks for ETH_P_8021Q and ETH_P_8021AD with eth_type_vlan(). Signed-off-by: Menglong Dong <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18Merge tag 'mac80211-for-net-2021-01-18.2' of ↵Jakub Kicinski6-40/+54
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 Johannes Berg says: ==================== Various fixes: * kernel-doc parsing fixes * incorrect debugfs string checks * locking fix in regulatory * some encryption-related fixes * tag 'mac80211-for-net-2021-01-18.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211: mac80211: check if atf has been disabled in __ieee80211_schedule_txq mac80211: do not drop tx nulldata packets on encrypted links mac80211: fix encryption key selection for 802.3 xmit mac80211: fix fast-rx encryption check mac80211: fix incorrect strlen of .write in debugfs cfg80211: fix a kerneldoc markup cfg80211: Save the regulatory domain with a lock cfg80211/mac80211: fix kernel-doc for SAR APIs ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2021-01-18net: dsa: mv88e6xxx: also read STU state in mv88e6250_g1_vtu_getnextRasmus Villemoes1-0/+4
mv88e6xxx_port_vlan_join checks whether the VTU already contains an entry for the given vid (via mv88e6xxx_vtu_getnext), and if so, merely changes the relevant .member[] element and loads the updated entry into the VTU. However, at least for the mv88e6250, the on-stack struct mv88e6xxx_vtu_entry vlan never has its .state[] array explicitly initialized, neither in mv88e6xxx_port_vlan_join() nor inside the getnext implementation. So the new entry has random garbage for the STU bits, breaking VLAN filtering. When the VTU entry is initially created, those bits are all zero, and we should make sure to keep them that way when the entry is updated. Fixes: 92307069a96c (net: dsa: mv88e6xxx: Avoid VTU corruption on 6097) Signed-off-by: Rasmus Villemoes <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Tobias Waldekranz <[email protected]> Tested-by: Tobias Waldekranz <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>