Age | Commit message (Collapse) | Author | Files | Lines |
|
Fix the code because NCI_CORE_INIT_CMD includes two parameters in NCI2.0
but there is no parameters in NCI1.x.
Fixes: bcd684aace34 ("net/nfc/nci: Support NCI 2.x initial sequence")
Signed-off-by: Bongsu Jeon <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
sh_eth_close() does a synchronous power down of the device before
marking it closed. Revert the order, to make sure the device is never
marked opened while suspended.
While at it, use pm_runtime_put() instead of pm_runtime_put_sync(), as
there is no reason to do a synchronous power down.
Fixes: 7fa2955ff70ce453 ("sh_eth: Fix sleeping function called from invalid context")
Signed-off-by: Geert Uytterhoeven <[email protected]>
Reviewed-by: Sergei Shtylyov <[email protected]>
Reviewed-by: Niklas Söderlund <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
With NETIF_F_HW_TLS_RX packets are decrypted in HW. This cannot be
logically done when RXCSUM offload is off.
Fixes: 14136564c8ee ("net: Add TLS RX offload feature")
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
'net-support-sctp-crc-csum-offload-for-tunneling-packets-in-some-drivers'
Xin Long says:
====================
net: support SCTP CRC csum offload for tunneling packets in some drivers
This patchset introduces inline function skb_csum_is_sctp(), and uses it
to validate it's a sctp CRC csum offload packet, to make SCTP CRC csum
offload for tunneling packets supported in some HW drivers.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes ixgbevf support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.
Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes ixgbe support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.
Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes igc support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.
Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Using skb_csum_is_sctp is a easier way to validate it's a SCTP CRC
checksum offload packet, and yet it also makes igbvf support SCTP
CRC checksum offload for UDP and GRE encapped packets, just as it
does in igb driver.
Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Using skb_csum_is_sctp is a easier way to validate it's a SCTP
CRC checksum offload packet, and there is no need to parse the
packet to check its proto field, especially when it's a UDP or
GRE encapped packet.
So this patch also makes igb support SCTP CRC checksum offload
for UDP and GRE encapped packets.
Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
This patch is to define a inline function skb_csum_is_sctp(), and
also replace all places where it checks if it's a SCTP CSUM skb.
This function would be used later in many networking drivers in
the following patches.
Suggested-by: Alexander Duyck <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Reviewed-by: Alexander Duyck <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Guillaume Nault says:
====================
ipv4: Ensure ECN bits don't influence source address validation
Functions that end up calling fib_table_lookup() should clear the ECN
bits from the TOS, otherwise ECT(0) and ECT(1) packets can be treated
differently.
Most functions already clear the ECN bits, but there are a few cases
where this is not done. This series only fixes the ones related to
source address validation.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
RT_TOS() only masks one of the two ECN bits. Therefore rpfilter_mt()
treats Not-ECT or ECT(1) packets in a different way than those with
ECT(0) or CE.
Reproducer:
Create two netns, connected with a veth:
$ ip netns add ns0
$ ip netns add ns1
$ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
$ ip -netns ns0 link set dev veth01 up
$ ip -netns ns1 link set dev veth10 up
$ ip -netns ns0 address add 192.0.2.10/32 dev veth01
$ ip -netns ns1 address add 192.0.2.11/32 dev veth10
Add a route to ns1 in ns0:
$ ip -netns ns0 route add 192.0.2.11/32 dev veth01
In ns1, only packets with TOS 4 can be routed to ns0:
$ ip -netns ns1 route add 192.0.2.10/32 tos 4 dev veth10
Ping from ns0 to ns1 works regardless of the ECN bits, as long as TOS
is 4:
$ ip netns exec ns0 ping -Q 4 192.0.2.11 # TOS 4, Not-ECT
... 0% packet loss ...
$ ip netns exec ns0 ping -Q 5 192.0.2.11 # TOS 4, ECT(1)
... 0% packet loss ...
$ ip netns exec ns0 ping -Q 6 192.0.2.11 # TOS 4, ECT(0)
... 0% packet loss ...
$ ip netns exec ns0 ping -Q 7 192.0.2.11 # TOS 4, CE
... 0% packet loss ...
Now use iptable's rpfilter module in ns1:
$ ip netns exec ns1 iptables-legacy -t raw -A PREROUTING -m rpfilter --invert -j DROP
Not-ECT and ECT(1) packets still pass:
$ ip netns exec ns0 ping -Q 4 192.0.2.11 # TOS 4, Not-ECT
... 0% packet loss ...
$ ip netns exec ns0 ping -Q 5 192.0.2.11 # TOS 4, ECT(1)
... 0% packet loss ...
But ECT(0) and ECN packets are dropped:
$ ip netns exec ns0 ping -Q 6 192.0.2.11 # TOS 4, ECT(0)
... 100% packet loss ...
$ ip netns exec ns0 ping -Q 7 192.0.2.11 # TOS 4, CE
... 100% packet loss ...
After this patch, rpfilter doesn't drop ECT(0) and CE packets anymore.
Fixes: 8f97339d3feb ("netfilter: add ipv4 reverse path filter match")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
udp_v4_early_demux() is the only function that calls
ip_mc_validate_source() with a TOS that hasn't been masked with
IPTOS_RT_MASK.
This results in different behaviours for incoming multicast UDPv4
packets, depending on if ip_mc_validate_source() is called from the
early-demux path (udp_v4_early_demux) or from the regular input path
(ip_route_input_noref).
ECN would normally not be used with UDP multicast packets, so the
practical consequences should be limited on that side. However,
IPTOS_RT_MASK is used to also masks the TOS' high order bits, to align
with the non-early-demux path behaviour.
Reproducer:
Setup two netns, connected with veth:
$ ip netns add ns0
$ ip netns add ns1
$ ip -netns ns0 link set dev lo up
$ ip -netns ns1 link set dev lo up
$ ip link add name veth01 netns ns0 type veth peer name veth10 netns ns1
$ ip -netns ns0 link set dev veth01 up
$ ip -netns ns1 link set dev veth10 up
$ ip -netns ns0 address add 192.0.2.10 peer 192.0.2.11/32 dev veth01
$ ip -netns ns1 address add 192.0.2.11 peer 192.0.2.10/32 dev veth10
In ns0, add route to multicast address 224.0.2.0/24 using source
address 198.51.100.10:
$ ip -netns ns0 address add 198.51.100.10/32 dev lo
$ ip -netns ns0 route add 224.0.2.0/24 dev veth01 src 198.51.100.10
In ns1, define route to 198.51.100.10, only for packets with TOS 4:
$ ip -netns ns1 route add 198.51.100.10/32 tos 4 dev veth10
Also activate rp_filter in ns1, so that incoming packets not matching
the above route get dropped:
$ ip netns exec ns1 sysctl -wq net.ipv4.conf.veth10.rp_filter=1
Now try to receive packets on 224.0.2.11:
$ ip netns exec ns1 socat UDP-RECVFROM:1111,ip-add-membership=224.0.2.11:veth10,ignoreeof -
In ns0, send packet to 224.0.2.11 with TOS 4 and ECT(0) (that is,
tos 6 for socat):
$ echo test0 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6
The "test0" message is properly received by socat in ns1, because
early-demux has no cached dst to use, so source address validation
is done by ip_route_input_mc(), which receives a TOS that has the
ECN bits masked.
Now send another packet to 224.0.2.11, still with TOS 4 and ECT(0):
$ echo test1 | ip netns exec ns0 socat - UDP-DATAGRAM:224.0.2.11:1111,bind=:1111,tos=6
The "test1" message isn't received by socat in ns1, because, now,
early-demux has a cached dst to use and calls ip_mc_validate_source()
immediately, without masking the ECN bits.
Fixes: bc044e8db796 ("udp: perform source validation for mcast early demux")
Signed-off-by: Guillaume Nault <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The number of queues can change by other means, rather than ethtool. For
example, attaching an mqprio qdisc with num_tc > 1 leads to creating
multiple sets of TX queues, which may be then destroyed when mqprio is
deleted. If an AF_XDP socket is created while mqprio is active,
dev->_tx[queue_id].pool will be filled, but then real_num_tx_queues may
decrease with deletion of mqprio, which will mean that the pool won't be
NULLed, and a further increase of the number of TX queues may expose a
dangling pointer.
To avoid any potential misbehavior, this commit clears pool for RX and
TX queues, regardless of real_num_*_queues, still taking into
consideration num_*_queues to avoid overflows.
Fixes: 1c1efc2af158 ("xsk: Create and free buffer pool independently from umem")
Fixes: a41b4f3c58dd ("xsk: simplify xdp_clear_umem_at_qid implementation")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
Pull task_work fix from Jens Axboe:
"The TIF_NOTIFY_SIGNAL change inadvertently removed the unconditional
task_work run we had in get_signal().
This caused a regression for some setups, since we're relying on eg
____fput() being run to close and release, for example, a pipe and
wake the other end.
For 5.11, I prefer the simple solution of just reinstating the
unconditional run, even if it conceptually doesn't make much sense -
if you need that kind of guarantee, you should be using TWA_SIGNAL
instead of TWA_NOTIFY. But it's the trivial fix for 5.11, and would
ensure that other potential gotchas/assumptions for task_work don't
regress for 5.11.
We're looking into further simplifying the task_work notifications for
5.12 which would resolve that too"
* tag 'task_work-2021-01-19' of git://git.kernel.dk/linux-block:
task_work: unconditionally run task_work from get_signal()
|
|
I assume this was obtained by copy/paste. Point it to bpf_map_peek_elem()
instead of bpf_map_pop_elem(). In practice it may have been less likely
hit when under JIT given shielded via 84430d4232c3 ("bpf, verifier: avoid
retpoline for map push/pop/peek operation").
Fixes: f1a2e44a3aec ("bpf: add queue and stack maps")
Signed-off-by: Mircea Cirjaliu <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Cc: Mauricio Vasquez <[email protected]>
Link: https://lore.kernel.org/bpf/AM7PR02MB6082663DFDCCE8DA7A6DD6B1BBA30@AM7PR02MB6082.eurprd02.prod.outlook.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Avoid exposing parent of root directory in NFSv3 READDIRPLUS results
- Fix a tracepoint change that went in the initial 5.11 merge
* tag 'nfsd-5.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
SUNRPC: Move the svc_xdr_recvfrom tracepoint again
nfsd4: readdirplus shouldn't return parent of export
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull hyperv fix from Wei Liu:
"One patch from Dexuan to fix clockevent initialization"
* tag 'hyperv-fixes-signed-20210119' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux:
x86/hyperv: Initialize clockevents after LAPIC is initialized
|
|
Geert Uytterhoeven says:
====================
sh_eth: Fix reboot crash
This patch fixes a regression v5.11-rc1, where rebooting while a sh_eth
device is not opened will cause a crash.
Changes compared to v1:
- Export mdiobb_{read,write}(),
- Call mdiobb_{read,write}() now they are exported,
- Use mii_bus.parent to avoid bb_info.dev copy,
- Drop RFC state.
Alternatively, mdio-bitbang could provide Runtime PM-aware wrappers
itself, and use them either manually (through a new parameter to
alloc_mdio_bitbang(), or a new alloc_mdio_bitbang_*() function), or
automatically (e.g. if pm_runtime_enabled() returns true). Note that
the latter requires a "struct device *" parameter to operate on.
Currently there are only two drivers that call alloc_mdio_bitbang() and
use Runtime PM: the Renesas sh_eth and ravb drivers. This series fixes
the former, while the latter is not affected (it keeps the device
powered all the time between driver probe and driver unbind, and
changing that seems to be non-trivial).
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Wolfram reports that his R-Car H2-based Lager board can no longer be
rebooted in v5.11-rc1, as it crashes with an imprecise external abort.
The issue can be reproduced on other boards (e.g. Koelsch with R-Car
M2-W) too, if CONFIG_IP_PNP is disabled, and the Ethernet interface is
down at reboot time:
Unhandled fault: imprecise external abort (0x1406) at 0x00000000
pgd = (ptrval)
[00000000] *pgd=422b6835, *pte=00000000, *ppte=00000000
Internal error: : 1406 [#1] ARM
Modules linked in:
CPU: 0 PID: 1105 Comm: init Tainted: G W 5.10.0-rc1-00402-ge2f016cf7751 #1048
Hardware name: Generic R-Car Gen2 (Flattened Device Tree)
PC is at sh_mdio_ctrl+0x44/0x60
LR is at sh_mmd_ctrl+0x20/0x24
...
Backtrace:
[<c0451f30>] (sh_mdio_ctrl) from [<c0451fd4>] (sh_mmd_ctrl+0x20/0x24)
r7:0000001f r6:00000020 r5:00000002 r4:c22a1dc4
[<c0451fb4>] (sh_mmd_ctrl) from [<c044fc18>] (mdiobb_cmd+0x38/0xa8)
[<c044fbe0>] (mdiobb_cmd) from [<c044feb8>] (mdiobb_read+0x58/0xdc)
r9:c229f844 r8:c0c329dc r7:c221e000 r6:00000001 r5:c22a1dc4 r4:00000001
[<c044fe60>] (mdiobb_read) from [<c044c854>] (__mdiobus_read+0x74/0xe0)
r7:0000001f r6:00000001 r5:c221e000 r4:c221e000
[<c044c7e0>] (__mdiobus_read) from [<c044c9d8>] (mdiobus_read+0x40/0x54)
r7:0000001f r6:00000001 r5:c221e000 r4:c221e458
[<c044c998>] (mdiobus_read) from [<c044d678>] (phy_read+0x1c/0x20)
r7:ffffe000 r6:c221e470 r5:00000200 r4:c229f800
[<c044d65c>] (phy_read) from [<c044d94c>] (kszphy_config_intr+0x44/0x80)
[<c044d908>] (kszphy_config_intr) from [<c044694c>] (phy_disable_interrupts+0x44/0x50)
r5:c229f800 r4:c229f800
[<c0446908>] (phy_disable_interrupts) from [<c0449370>] (phy_shutdown+0x18/0x1c)
r5:c229f800 r4:c229f804
[<c0449358>] (phy_shutdown) from [<c040066c>] (device_shutdown+0x168/0x1f8)
[<c0400504>] (device_shutdown) from [<c013de44>] (kernel_restart_prepare+0x3c/0x48)
r9:c22d2000 r8:c0100264 r7:c0b0d034 r6:00000000 r5:4321fedc r4:00000000
[<c013de08>] (kernel_restart_prepare) from [<c013dee0>] (kernel_restart+0x1c/0x60)
[<c013dec4>] (kernel_restart) from [<c013e1d8>] (__do_sys_reboot+0x168/0x208)
r5:4321fedc r4:01234567
[<c013e070>] (__do_sys_reboot) from [<c013e2e8>] (sys_reboot+0x18/0x1c)
r7:00000058 r6:00000000 r5:00000000 r4:00000000
[<c013e2d0>] (sys_reboot) from [<c0100060>] (ret_fast_syscall+0x0/0x54)
As of commit e2f016cf775129c0 ("net: phy: add a shutdown procedure"),
system reboot calls phy_disable_interrupts() during shutdown. As this
happens unconditionally, the PHY registers may be accessed while the
device is suspended, causing undefined behavior, which may crash the
system.
Fix this by wrapping the PHY bitbang accessors in the sh_eth driver by
wrappers that take care of Runtime PM, to resume the device when needed.
Reported-by: Wolfram Sang <[email protected]>
Suggested-by: Andrew Lunn <[email protected]>
Signed-off-by: Geert Uytterhoeven <[email protected]>
Tested-by: Wolfram Sang <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Export mdiobb_read() and mdiobb_write(), so Ethernet controller drivers
can call them from their MDIO read/write wrappers.
Signed-off-by: Geert Uytterhoeven <[email protected]>
Tested-by: Wolfram Sang <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
container_of() macro hides a local variable '__mptr' inside. This
becomes a problem when several container_of() are nested in each
other within single line or plain macros.
As C preprocessor doesn't support generating random variable names,
the sole solution is to avoid defining macros that consist only of
container_of() calls, or they will self-shadow '__mptr' each time:
In file included from ./include/linux/bitmap.h:10,
from drivers/net/phy/phy_device.c:12:
drivers/net/phy/phy_device.c: In function ‘phy_device_release’:
./include/linux/kernel.h:693:8: warning: declaration of ‘__mptr’ shadows a previous local [-Wshadow]
693 | void *__mptr = (void *)(ptr); \
| ^~~~~~
./include/linux/phy.h:647:26: note: in expansion of macro ‘container_of’
647 | #define to_phy_device(d) container_of(to_mdio_device(d), \
| ^~~~~~~~~~~~
./include/linux/mdio.h:52:27: note: in expansion of macro ‘container_of’
52 | #define to_mdio_device(d) container_of(d, struct mdio_device, dev)
| ^~~~~~~~~~~~
./include/linux/phy.h:647:39: note: in expansion of macro ‘to_mdio_device’
647 | #define to_phy_device(d) container_of(to_mdio_device(d), \
| ^~~~~~~~~~~~~~
drivers/net/phy/phy_device.c:217:8: note: in expansion of macro ‘to_phy_device’
217 | kfree(to_phy_device(dev));
| ^~~~~~~~~~~~~
./include/linux/kernel.h:693:8: note: shadowed declaration is here
693 | void *__mptr = (void *)(ptr); \
| ^~~~~~
./include/linux/phy.h:647:26: note: in expansion of macro ‘container_of’
647 | #define to_phy_device(d) container_of(to_mdio_device(d), \
| ^~~~~~~~~~~~
drivers/net/phy/phy_device.c:217:8: note: in expansion of macro ‘to_phy_device’
217 | kfree(to_phy_device(dev));
| ^~~~~~~~~~~~~
As they are declared in header files, these warnings are highly
repetitive and very annoying (along with the one from linux/pci.h).
Convert the related macros from linux/{mdio,phy}.h to static inlines
to avoid self-shadowing and potentially improve bug-catching.
No functional changes implied.
Signed-off-by: Alexander Lobakin <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Fix incorrect user_ptr dereferencing when handling port param get/set:
idx [0] stores the 'struct devlink' pointer;
idx [1] stores the 'struct devlink_port' pointer;
Fixes: 637989b5d77e ("devlink: Always use user_ptr[0] for devlink and simplify post_doit")
CC: Parav Pandit <[email protected]>
Signed-off-by: Oleksandr Mazur <[email protected]>
Signed-off-by: Vadym Kochan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Currently the driver doesn't drop a packet which can't be sent by tun
(e.g bad packet). In this case, the driver will always process the
same packet lead to the tx queue stuck.
To fix this issue:
1. in the case of persistent failure (e.g bad packet), the driver
can skip this descriptor by ignoring the error.
2. in the case of transient failure (e.g -ENOBUFS, -EAGAIN and -ENOMEM),
the driver schedules the worker to try again.
Signed-off-by: Yunjian Wang <[email protected]>
Acked-by: Jason Wang <[email protected]>
Acked-by: Willem de Bruijn <[email protected]>
Acked-by: Michael S. Tsirkin <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When DEBUG is defined this error occurs
drivers/net/ethernet/hisilicon/hns/hns_enet.c:1505:36: error:
‘struct net_device’ has no member named ‘ae_handle’;
did you mean ‘rx_handler’?
assert(skb->queue_mapping < ndev->ae_handle->q_num);
^~~~~~~~~
ae_handle is an element of struct hns_nic_priv, so change
ndev to priv.
Signed-off-by: Tom Rix <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When DEBUG is defined this error occurs
drivers/net/arcnet/com20020_cs.c:70:15: error: ‘com20020_REG_W_ADDR_HI’
undeclared (first use in this function);
did you mean ‘COM20020_REG_W_ADDR_HI’?
ioaddr, com20020_REG_W_ADDR_HI);
^~~~~~~~~~~~~~~~~~~~~~
From reviewing the context, the suggestion is what is meant.
Signed-off-by: Tom Rix <[email protected]>
Acked-by: Joe Perches <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Tariq Toukan says:
====================
TLS device offload for Bond
This series opens TX and RX TLS device offload for bond interfaces.
This allows bond interfaces to benefit from capable lower devices.
We add a new ndo_sk_get_lower_dev() to be used to get the lower dev that
corresponds to a given socket.
The TLS module uses it to interact directly with the lowest device in
chain, and invoke the control operations in tlsdev_ops. This means that the
bond interface doesn't have his own struct tlsdev_ops instance and
derived logic/callbacks.
To keep simple track of the HW and SW TLS contexts, we bind each socket to
a specific lower device for the socket's whole lifetime. This is logically
valid (and similar to the SW kTLS behavior) in the following bond configuration,
so we restrict the offload support to it:
((mode == balance-xor) or (mode == 802.3ad))
and xmit_hash_policy == layer3+4.
In this design, TLS TX/RX offload feature flags of the bond device are
independent from the lower devices. They reflect the current features state,
but are not directly controllable.
This is because the bond driver is bypassed by the call to
ndo_sk_get_lower_dev(), without him knowing who the caller is.
The bond TLS feature flags are set/cleared only according to the configuration
of the mode and xmit_hash_policy.
Bypass is true only for the control flow. Packets in fast path still go through
the bond logic.
The design here differs from the xfrm/ipsec offload, where the bond driver
has his own copy of struct xfrmdev_ops and callbacks.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
In the tls_dev_event handler, ignore tlsdev_ops requirement for bond
interfaces, they do not exist as the interaction is done directly with
the lower device.
Also, make the validate function pass when it's called with the upper
bond interface.
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Do not call the tls_dev_ops of upper devices. Instead, ask them
for the proper lowest device and communicate with it directly.
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Following the description in previous patch (for TX):
As the bond interface is being bypassed by the TLS module, interacting
directly against the lower devs, there is no way for the bond interface
to disable its device offload capabilities, as long as the mode/policy
config allows it.
Hence, the feature flag is not directly controllable, but just reflects
the offload status based on the logic under bond_sk_check().
Here we just declare RX device offload support, and expose it via the
NETIF_F_HW_TLS_RX flag.
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Implement TLS TX device offload for bonding interfaces.
This allows kTLS sockets running on a bond to benefit from the
device offload on capable lower devices.
To allow a simple and fast maintenance of the TLS context in SW and
lower devices, we bind the TLS socket to a specific lower dev.
To achieve a behavior similar to SW kTLS, we support only balance-xor
and 802.3ad modes, with xmit_hash_policy=layer3+4. This is enforced
in bond_sk_check(), done in a previous patch.
For the above configuration, the SW implementation keeps picking the
same exact lower dev for all the socket's SKBs. The device offload
behaves similarly, making the decision once at the connection creation.
Per socket, the TLS module should work directly with the lowest netdev
in chain, to call the tls_dev_ops operations.
As the bond interface is being bypassed by the TLS module, interacting
directly against the lower devs, there is no way for the bond interface
to disable its device offload capabilities, as long as the mode/policy
config allows it.
Hence, the feature flag is not directly controllable, but just reflects
the current offload status based on the logic under bond_sk_check().
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
In preparation for more cases that call netdev_update_features().
While here, move the features logic to the stage where struct bond
is already updated, and pass it as the only parameter to function
bond_set_xfrm_features().
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add ndo_sk_get_lower_dev() implementation for bond interfaces.
Support only for the cases where the socket's and SKBs' hash
yields identical value for the whole connection lifetime.
Here we restrict it to L3+4 sockets only, with
xmit_hash_policy==LAYER34 and bond modes xor/802.3ad.
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Hash logic on L3 will be used in a downstream patch for one more use
case.
Take it to a function for a better code reuse.
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
ndo_sk_get_lower_dev returns the lower netdev that corresponds to
a given socket.
Additionally, we implement a helper netdev_sk_get_lowest_dev() to get
the lowest one in chain.
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Boris Pismenny <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The wrappers in include/linux/pci-dma-compat.h should go away.
The patch has been generated with the coccinelle script below and has been
hand modified to replace GFP_ with a correct flag.
It has been compile tested.
When memory is allocated in 'ql_alloc_net_req_rsp_queues()' GFP_KERNEL can
be used because it is only called from 'ql_alloc_mem_resources()' which
already calls 'ql_alloc_buffer_queues()' which uses GFP_KERNEL. (see below)
When memory is allocated in 'ql_alloc_buffer_queues()' GFP_KERNEL can be
used because this flag is already used just a few line above.
When memory is allocated in 'ql_alloc_small_buffers()' GFP_KERNEL can
be used because it is only called from 'ql_alloc_mem_resources()' which
already calls 'ql_alloc_buffer_queues()' which uses GFP_KERNEL. (see above)
When memory is allocated in 'ql_alloc_mem_resources()' GFP_KERNEL can be
used because this function already calls 'ql_alloc_buffer_queues()' which
uses GFP_KERNEL. (see above)
While at it, use 'dma_set_mask_and_coherent()' instead of 'dma_set_mask()/
dma_set_coherent_mask()' in order to slightly simplify code.
@@
@@
- PCI_DMA_BIDIRECTIONAL
+ DMA_BIDIRECTIONAL
@@
@@
- PCI_DMA_TODEVICE
+ DMA_TO_DEVICE
@@
@@
- PCI_DMA_FROMDEVICE
+ DMA_FROM_DEVICE
@@
@@
- PCI_DMA_NONE
+ DMA_NONE
@@
expression e1, e2, e3;
@@
- pci_alloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
@@
expression e1, e2, e3;
@@
- pci_zalloc_consistent(e1, e2, e3)
+ dma_alloc_coherent(&e1->dev, e2, e3, GFP_)
@@
expression e1, e2, e3, e4;
@@
- pci_free_consistent(e1, e2, e3, e4)
+ dma_free_coherent(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_map_single(e1, e2, e3, e4)
+ dma_map_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_single(e1, e2, e3, e4)
+ dma_unmap_single(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4, e5;
@@
- pci_map_page(e1, e2, e3, e4, e5)
+ dma_map_page(&e1->dev, e2, e3, e4, e5)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_page(e1, e2, e3, e4)
+ dma_unmap_page(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_map_sg(e1, e2, e3, e4)
+ dma_map_sg(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_unmap_sg(e1, e2, e3, e4)
+ dma_unmap_sg(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_cpu(e1, e2, e3, e4)
+ dma_sync_single_for_cpu(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_single_for_device(e1, e2, e3, e4)
+ dma_sync_single_for_device(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_cpu(e1, e2, e3, e4)
+ dma_sync_sg_for_cpu(&e1->dev, e2, e3, e4)
@@
expression e1, e2, e3, e4;
@@
- pci_dma_sync_sg_for_device(e1, e2, e3, e4)
+ dma_sync_sg_for_device(&e1->dev, e2, e3, e4)
@@
expression e1, e2;
@@
- pci_dma_mapping_error(e1, e2)
+ dma_mapping_error(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_dma_mask(e1, e2)
+ dma_set_mask(&e1->dev, e2)
@@
expression e1, e2;
@@
- pci_set_consistent_dma_mask(e1, e2)
+ dma_set_coherent_mask(&e1->dev, e2)
Signed-off-by: Christophe JAILLET <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
tcf_action_init_1() loads tc action modules automatically with
request_module() after parsing the tc action names, and it drops RTNL
lock and re-holds it before and after request_module(). This causes a
lot of troubles, as discovered by syzbot, because we can be in the
middle of batch initializations when we create an array of tc actions.
One of the problem is deadlock:
CPU 0 CPU 1
rtnl_lock();
for (...) {
tcf_action_init_1();
-> rtnl_unlock();
-> request_module();
rtnl_lock();
for (...) {
tcf_action_init_1();
-> tcf_idr_check_alloc();
// Insert one action into idr,
// but it is not committed until
// tcf_idr_insert_many(), then drop
// the RTNL lock in the _next_
// iteration
-> rtnl_unlock();
-> rtnl_lock();
-> a_o->init();
-> tcf_idr_check_alloc();
// Now waiting for the same index
// to be committed
-> request_module();
-> rtnl_lock()
// Now waiting for RTNL lock
}
rtnl_unlock();
}
rtnl_unlock();
This is not easy to solve, we can move the request_module() before
this loop and pre-load all the modules we need for this netlink
message and then do the rest initializations. So the loop breaks down
to two now:
for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
struct tc_action_ops *a_o;
a_o = tc_action_load_ops(name, tb[i]...);
ops[i - 1] = a_o;
}
for (i = 1; i <= TCA_ACT_MAX_PRIO && tb[i]; i++) {
act = tcf_action_init_1(ops[i - 1]...);
}
Although this looks serious, it only has been reported by syzbot, so it
seems hard to trigger this by humans. And given the size of this patch,
I'd suggest to make it to net-next and not to backport to stable.
This patch has been tested by syzbot and tested with tdc.py by me.
Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together")
Reported-and-tested-by: [email protected]
Reported-and-tested-by: [email protected]
Reported-by: [email protected]
Cc: Jiri Pirko <[email protected]>
Signed-off-by: Cong Wang <[email protected]>
Tested-by: Jamal Hadi Salim <[email protected]>
Acked-by: Jamal Hadi Salim <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Defining DEBUG should only be done in development.
So remove DEBUG.
Signed-off-by: Tom Rix <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The TCP session does not terminate with TCP_USER_TIMEOUT when data
remain untransmitted due to zero window.
The number of unanswered zero-window probes (tcp_probes_out) is
reset to zero with incoming acks irrespective of the window size,
as described in tcp_probe_timer():
RFC 1122 4.2.2.17 requires the sender to stay open indefinitely
as long as the receiver continues to respond probes. We support
this by default and reset icsk_probes_out with incoming ACKs.
This counter, however, is the wrong one to be used in calculating the
duration that the window remains closed and data remain untransmitted.
Thanks to Jonathan Maxwell <[email protected]> for diagnosing the
actual issue.
In this patch a new timestamp is introduced for the socket in order to
track the elapsed time for the zero-window probes that have not been
answered with any non-zero window ack.
Fixes: 9721e709fa68 ("tcp: simplify window probe aborting on USER_TIMEOUT")
Reported-by: William McCall <[email protected]>
Co-developed-by: Neal Cardwell <[email protected]>
Signed-off-by: Neal Cardwell <[email protected]>
Signed-off-by: Enke Chen <[email protected]>
Reviewed-by: Yuchung Cheng <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Xin Long says:
====================
net: make udp tunnel devices support fraglist
Like GRE device, UDP tunnel devices should also support fraglist, so
that some protocol (like SCTP) HW GSO that requires NETIF_F_FRAGLIST
in the dev can work. Especially when the lower device support both
NETIF_F_GSO_UDP_TUNNEL and NETIF_F_GSO_SCTP.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Like vxlan and geneve, bareudp also needs this dev feature
to support some protocol's HW GSO.
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Some protocol HW GSO requires fraglist supported by the device, like
SCTP. Without NETIF_F_FRAGLIST set in the dev features of geneve, it
would have to do SW GSO before the packets enter the driver, even
when the geneve dev and lower dev (like veth) both have the feature
of NETIF_F_GSO_SCTP.
So this patch is to add it for geneve.
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Some protocol HW GSO requires fraglist supported by the device, like
SCTP. Without NETIF_F_FRAGLIST set in the dev features of vxlan, it
would have to do SW GSO before the packets enter the driver, even
when the vxlan dev and lower dev (like veth) both have the feature
of NETIF_F_GSO_SCTP.
So this patch is to add it for vxlan.
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Matteo Croce says:
====================
ipv6: fixes for the multicast routes
Fix two wrong flags in the IPv6 multicast routes created
by the autoconf code.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The multicast route ff00::/8 is created with type RTN_UNICAST:
$ ip -6 -d route
unicast ::1 dev lo proto kernel scope global metric 256 pref medium
unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium
unicast ff00::/8 dev eth0 proto kernel scope global metric 256 pref medium
Set the type to RTN_MULTICAST which is more appropriate.
Fixes: e8478e80e5a7 ("net/ipv6: Save route type in rt6_info")
Signed-off-by: Matteo Croce <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The ff00::/8 multicast route is created without specifying the fc_protocol
field, so the default RTPROT_BOOT value is used:
$ ip -6 -d route
unicast ::1 dev lo proto kernel scope global metric 256 pref medium
unicast fe80::/64 dev eth0 proto kernel scope global metric 256 pref medium
unicast ff00::/8 dev eth0 proto boot scope global metric 256 pref medium
As the documentation says, this value identifies routes installed during
boot, but the route is created when interface is set up.
Change the value to RTPROT_KERNEL which is a better value.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Matteo Croce <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
For additional robustness in the face of Hyper-V errors or malicious
behavior, validate all values that originate from packets that Hyper-V
has sent to the guest. Ensure that invalid values cannot cause indexing
off the end of an array, or subvert an existing validation via integer
overflow. Ensure that outgoing packets do not have any leftover guest
memory that has not been zeroed out.
Reported-by: Juan Vazquez <[email protected]>
Signed-off-by: Andrea Parri (Microsoft) <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Replace some checks for ETH_P_8021Q and ETH_P_8021AD with
eth_type_vlan().
Signed-off-by: Menglong Dong <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Various fixes:
* kernel-doc parsing fixes
* incorrect debugfs string checks
* locking fix in regulatory
* some encryption-related fixes
* tag 'mac80211-for-net-2021-01-18.2' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211:
mac80211: check if atf has been disabled in __ieee80211_schedule_txq
mac80211: do not drop tx nulldata packets on encrypted links
mac80211: fix encryption key selection for 802.3 xmit
mac80211: fix fast-rx encryption check
mac80211: fix incorrect strlen of .write in debugfs
cfg80211: fix a kerneldoc markup
cfg80211: Save the regulatory domain with a lock
cfg80211/mac80211: fix kernel-doc for SAR APIs
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
mv88e6xxx_port_vlan_join checks whether the VTU already contains an
entry for the given vid (via mv88e6xxx_vtu_getnext), and if so, merely
changes the relevant .member[] element and loads the updated entry
into the VTU.
However, at least for the mv88e6250, the on-stack struct
mv88e6xxx_vtu_entry vlan never has its .state[] array explicitly
initialized, neither in mv88e6xxx_port_vlan_join() nor inside the
getnext implementation. So the new entry has random garbage for the
STU bits, breaking VLAN filtering.
When the VTU entry is initially created, those bits are all zero, and
we should make sure to keep them that way when the entry is updated.
Fixes: 92307069a96c (net: dsa: mv88e6xxx: Avoid VTU corruption on 6097)
Signed-off-by: Rasmus Villemoes <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Tobias Waldekranz <[email protected]>
Tested-by: Tobias Waldekranz <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|