Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2022-05-19
Oliver Hartkopp contributes a patch for the ISO-TP CAN protocol to
update the validation of address information during bind.
The next patch is by Jakub Kicinski and converts the CAN network
drivers from netif_napi_add() to the netif_napi_add_weight() function.
Another patch by Oliver Hartkopp removes obsolete CAN specific LED
support.
Vincent Mailhol's patch for the mcp251xfd driver fixes a
-Wunaligned-access warning by clang-14.
* tag 'linux-can-next-for-5.19-20220519' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
can: mcp251xfd: silence clang's -Wunaligned-access warning
can: can-dev: remove obsolete CAN LED support
can: can-dev: move to netif_napi_add_weight()
can: isotp: isotp_bind(): do not validate unused address information
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Since commit 30f3b42147ba6f ("can: mark led trigger as broken") the
CAN specific LED support was disabled and marked as BROKEN. As the
common LED support with CONFIG_LEDS_TRIGGER_NETDEV should do this work
now the code can be removed as preparation for a CAN netdevice Kconfig
rework.
Link: https://lore.kernel.org/all/[email protected]
Suggested-by: Vincent Mailhol <[email protected]>
Signed-off-by: Oliver Hartkopp <[email protected]>
[mkl: remove led.h from MAINTAINERS]
Signed-off-by: Marc Kleine-Budde <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v5.19
Second set of patches for v5.19 and most likely the last one. rtw89
got support for 8852ce devices and mt76 now supports Wireless Ethernet
Dispatch.
Major changes:
cfg80211/mac80211
- support disabling EHT mode
rtw89
- add support for Realtek 8852ce devices
mt76
- Wireless Ethernet Dispatch support for flow offload
- non-standard VHT MCS10-11 support
- mt7921 AP mode support
- mt7921 ipv6 NS offload support
ath11k
- enable keepalive during WoWLAN suspend
- implement remain-on-channel support
* tag 'wireless-next-2022-05-19' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (135 commits)
iwlwifi: mei: fix potential NULL-ptr deref
iwlwifi: mei: clear the sap data header before sending
iwlwifi: mvm: remove vif_count
iwlwifi: mvm: always tell the firmware to accept MCAST frames in BSS
iwlwifi: mvm: add OTP info in case of init failure
iwlwifi: mvm: fix assert 1F04 upon reconfig
iwlwifi: fw: init SAR GEO table only if data is present
iwlwifi: mvm: clean up authorized condition
iwlwifi: mvm: use NULL instead of ERR_PTR when parsing wowlan status
iwlwifi: pcie: simplify MSI-X cause mapping
rtw89: pci: only mask out INT indicator register for disable interrupt v1
rtw89: convert rtw89_band to nl80211_band precisely
rtw89: 8852c: update txpwr tables to HALRF_027_00_052
rtw89: cfo: check mac_id to avoid out-of-bounds
rtw89: 8852c: set TX antenna path
rtw89: add ieee80211::sta_rc_update ops
wireless: Fix Makefile to be in alphabetical order
mac80211: refactor freeing the next_beacon
cfg80211: fix kernel-doc for cfg80211_beacon_data
mac80211: minstrel_ht: support ieee80211_rate_status
...
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
drivers/net/ethernet/mellanox/mlx5/core/main.c
b33886971dbc ("net/mlx5: Initialize flow steering during driver probe")
40379a0084c2 ("net/mlx5_fpga: Drop INNOVA TLS support")
f2b41b32cde8 ("net/mlx5: Remove ipsec_ops function table")
https://lore.kernel.org/all/20220519040345.6yrjromcdistu7vh@sx1/
16d42d313350 ("net/mlx5: Drain fw_reset when removing device")
8324a02c342a ("net/mlx5: Add exit route when waiting for FW")
https://lore.kernel.org/all/[email protected]/
tools/testing/selftests/net/mptcp/mptcp_join.sh
e274f7154008 ("selftests: mptcp: add subflow limits test-cases")
b6e074e171bc ("selftests: mptcp: add infinite map testcase")
5ac1d2d63451 ("selftests: mptcp: Add tests for userspace PM type")
https://lore.kernel.org/all/[email protected]/
net/mptcp/options.c
ba2c89e0ea74 ("mptcp: fix checksum byte order")
1e39e5a32ad7 ("mptcp: infinite mapping sending")
ea66758c1795 ("tcp: allow MPTCP to update the announced window")
https://lore.kernel.org/all/[email protected]/
net/mptcp/pm.c
95d686517884 ("mptcp: fix subflow accounting on close")
4d25247d3ae4 ("mptcp: bypass in-kernel PM restrictions for non-kernel PMs")
https://lore.kernel.org/all/[email protected]/
net/mptcp/subflow.c
ae66fb2ba6c3 ("mptcp: Do TCP fallback on early DSS checksum failure")
0348c690ed37 ("mptcp: add the fallback check")
f8d4bcacff3b ("mptcp: infinite mapping receiving")
https://lore.kernel.org/all/[email protected]/
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from can, xfrm and netfilter subtrees.
Notably this reverts a recent TCP/DCCP netns-related change to address
a possible UaF.
Current release - regressions:
- tcp: revert "tcp/dccp: get rid of inet_twsk_purge()"
- xfrm: set dst dev to blackhole_netdev instead of loopback_dev in
ifdown
Previous releases - regressions:
- netfilter: flowtable: fix TCP flow teardown
- can: revert "can: m_can: pci: use custom bit timings for Elkhart
Lake"
- xfrm: check encryption module availability consistency
- eth: vmxnet3: fix possible use-after-free bugs in
vmxnet3_rq_alloc_rx_buf()
- eth: mlx5: initialize flow steering during driver probe
- eth: ice: fix crash when writing timestamp on RX rings
Previous releases - always broken:
- mptcp: fix checksum byte order
- eth: lan966x: fix assignment of the MAC address
- eth: mlx5: remove HW-GRO from reported features
- eth: ftgmac100: disable hardware checksum on AST2600"
* tag 'net-5.18-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
net: bridge: Clear offload_fwd_mark when passing frame up bridge interface.
ptp: ocp: change sysfs attr group handling
selftests: forwarding: fix missing backslash
netfilter: nf_tables: disable expression reduction infra
netfilter: flowtable: move dst_check to packet path
netfilter: flowtable: fix TCP flow teardown
net: ftgmac100: Disable hardware checksum on AST2600
igb: skip phy status check where unavailable
nfc: pn533: Fix buggy cleanup order
mptcp: Do TCP fallback on early DSS checksum failure
mptcp: fix checksum byte order
net: af_key: check encryption module availability consistency
net: af_key: add check for pfkey_broadcast in function pfkey_process
net/mlx5: Drain fw_reset when removing device
net/mlx5e: CT: Fix setting flow_source for smfs ct tuples
net/mlx5e: CT: Fix support for GRE tuples
net/mlx5e: Remove HW-GRO from reported features
net/mlx5e: Properly block HW GRO when XDP is enabled
net/mlx5e: Properly block LRO when XDP is enabled
net/mlx5e: Block rx-gro-hw feature in switchdev mode
...
|
|
TLS device offload copies sendfile data to a bounce buffer before
transmitting. It allows to maintain the valid MAC on TLS records when
the file contents change and a part of TLS record has to be
retransmitted on TCP level.
In many common use cases (like serving static files over HTTPS) the file
contents are not changed on the fly. In many use cases breaking the
connection is totally acceptable if the file is changed during
transmission, because it would be received corrupted in any case.
This commit allows to optimize performance for such use cases to
providing a new optional mode of TLS sendfile(), in which the extra copy
is skipped. Removing this copy improves performance significantly, as
TLS and TCP sendfile perform the same operations, and the only overhead
is TLS header/trailer insertion.
The new mode can only be enabled with the new socket option named
TLS_TX_ZEROCOPY_SENDFILE on per-socket basis. It preserves backwards
compatibility with existing applications that rely on the copying
behavior.
The new mode is safe, meaning that unsolicited modifications of the file
being sent can't break integrity of the kernel. The worst thing that can
happen is sending a corrupted TLS record, which is in any case not
forbidden when using regular TCP sockets.
Sockets other than TLS device offload are not affected by the new socket
option. The actual status of zerocopy sendfile can be queried with
sock_diag.
Performance numbers in a single-core test with 24 HTTPS streams on
nginx, under 100% CPU load:
* non-zerocopy: 33.6 Gbit/s
* zerocopy: 79.92 Gbit/s
CPU: Intel(R) Xeon(R) Platinum 8380 CPU @ 2.30GHz
Signed-off-by: Boris Pismenny <[email protected]>
Signed-off-by: Tariq Toukan <[email protected]>
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Pablo Neira Ayuso says:
====================
Netfilter fixes for net
1) Reduce number of hardware offload retries from flowtable datapath
which might hog system with retries, from Felix Fietkau.
2) Skip neighbour lookup for PPPoE device, fill_forward_path() already
provides this and set on destination address from fill_forward_path for
PPPoE device, also from Felix.
4) When combining PPPoE on top of a VLAN device, set info->outdev to the
PPPoE device so software offload works, from Felix.
5) Fix TCP teardown flowtable state, races with conntrack gc might result
in resetting the state to ESTABLISHED and the time to one day. Joint
work with Oz Shlomo and Sven Auhagen.
6) Call dst_check() from flowtable datapath to check if dst is stale
instead of doing it from garbage collector path.
7) Disable register tracking infrastructure, either user-space or
kernel need to pre-fetch keys inconditionally, otherwise register
tracking assumes data is already available in register that might
not well be there, leading to incorrect reductions.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: disable expression reduction infra
netfilter: flowtable: move dst_check to packet path
netfilter: flowtable: fix TCP flow teardown
netfilter: nft_flow_offload: fix offload with pppoe + vlan
net: fix dev_fill_forward_path with pppoe + bridge
netfilter: nft_flow_offload: skip dst neigh lookup for ppp devices
netfilter: flowtable: fix excessive hw offload attempts after failure
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Pull io_uring fixes from Jens Axboe:
"Two small changes fixing issues from the 5.18 merge window:
- Fix wrong ordering of a tracepoint (Dylan)
- Fix MSG_RING on IOPOLL rings (me)"
* tag 'io_uring-5.18-2022-05-18' of git://git.kernel.dk/linux-block:
io_uring: don't attempt to IOPOLL for MSG_RING requests
io_uring: fix ordering of args in io_uring_queue_async_work
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec
Steffen Klassert says:
====================
pull request (net): ipsec 2022-05-18
1) Fix "disable_policy" flag use when arriving from different devices.
From Eyal Birger.
2) Fix error handling of pfkey_broadcast in function pfkey_process.
From Jiasheng Jiang.
3) Check the encryption module availability consistency in pfkey.
From Thomas Bartschies.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Multiport eswitch mode is a LAG mode that allows to add rules that
forward traffic to a specific physical port without being affected by LAG
affinity configuration.
This mode of operation is mutual exclusive with the other LAG modes used
by multipath and bonding.
To make the transition between the modes, we maintain a counter on the
number of rules specifying one of the uplink representors as the target
of mirred egress redirect action.
An example of such rule would be:
$ tc filter add dev enp8s0f0_0 prot all root flower dst_mac \
00:11:22:33:44:55 action mirred egress redirect dev enp8s0f0
If the reference count just grows to one and LAG is not in use, we
create the LAG in multiport eswitch mode. Other mode changes are not
allowed while in this mode. When the reference count reaches zero, we
destroy the LAG and let other modes be used if needed.
logic also changed such that if forwarding to some uplink destination
cannot be guaranteed, we fail the operation so the rule will eventually
be in software and not in hardware.
Signed-off-by: Eli Cohen <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Take the wrapper version which picks default node into a header file.
This reduces the number of exported functions.
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Add syndrome of last command failure per command type to debugfs to ease
debugging of such failure.
last_failed_syndrome - last command failed syndrome returned by FW.
Signed-off-by: Moshe Shemesh <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Use TOD_READ_SECONDARY for extts to keep TOD_READ_PRIMARY
for gettime and settime exclusively. Before this change,
TOD_READ_PRIMARY was used for both extts and gettime/settime,
which would result in changing TOD read/write triggers between
operations. Using TOD_READ_SECONDARY would make extts
independent of gettime/settime operation
Signed-off-by: Min Li <[email protected]>
Acked-by: Richard Cochran <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Not calling the function for dummy contexts will cause the context to
not be reset. During the next syscall, this will cause an error in
__audit_syscall_entry:
WARN_ON(context->context != AUDIT_CTX_UNUSED);
WARN_ON(context->name_count);
if (context->context != AUDIT_CTX_UNUSED || context->name_count) {
audit_panic("unrecoverable error in audit_syscall_entry()");
return;
}
These problematic dummy contexts are created via the following call
chain:
exit_to_user_mode_prepare
-> arch_do_signal_or_restart
-> get_signal
-> task_work_run
-> tctx_task_work
-> io_req_task_submit
-> io_issue_sqe
-> audit_uring_entry
Cc: [email protected]
Fixes: 5bd2182d58e9 ("audit,io_uring,io-wq: add some basic audit support to io_uring")
Signed-off-by: Julian Orth <[email protected]>
[PM: subject line tweaks]
Signed-off-by: Paul Moore <[email protected]>
|
|
The kernel-doc comment is formatted badly, resulting
in a warning:
include/net/cfg80211.h:1188: warning: bad line: [...]
Fix that.
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Johannes Berg <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
pull-request: can-next 2022-05-16
the first 2 patches are by me and target the CAN raw protocol. The 1st
removes an unneeded assignment, the other one adds support for
SO_TXTIME/SCM_TXTIME.
Oliver Hartkopp contributes 2 patches for the ISOTP protocol. The 1st
adds support for transmission without flow control, the other let's
bind() return an error on incorrect CAN ID formatting.
Geert Uytterhoeven contributes a patch to clean up ctucanfd's Kconfig
file.
Vincent Mailhol's patch for the slcan driver uses the proper function
to check for invalid CAN frames in the xmit callback.
The next patch is by Geert Uytterhoeven and makes the interrupt-names
of the renesas,rcar-canfd dt bindings mandatory.
A patch by my update the ctucanfd dt bindings to include the common
CAN controller bindings.
The last patch is by Akira Yokosawa and fixes a breakage the
ctucanfd's documentation.
* tag 'linux-can-next-for-5.19-20220516' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
docs: ctucanfd: Use 'kernel-figure' directive instead of 'figure'
dt-bindings: can: ctucanfd: include common CAN controller bindings
dt-bindings: can: renesas,rcar-canfd: Make interrupt-names required
can: slcan: slc_xmit(): use can_dropped_invalid_skb() instead of manual check
can: ctucanfd: Let users select instead of depend on CAN_CTUCANFD
can: isotp: isotp_bind(): return -EINVAL on incorrect CAN ID formatting
can: isotp: add support for transmission without flow control
can: raw: add support for SO_TXTIME/SCM_TXTIME
can: raw: raw_sendmsg(): remove not needed setting of skb->sk
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
skb_data_area_size() is not needed. As Jakub pointed out [1]:
For Rx, drivers can use the size passed during skb allocation or
use skb_tailroom().
For Tx, drivers should use skb_headlen().
[1] https://lore.kernel.org/netdev/CAHNKnsTmH-rGgWi3jtyC=ktM1DW2W1VJkYoTMJV2Z_Bt498bsg@mail.gmail.com/
Signed-off-by: Ricardo Martinez <[email protected]>
Reviewed-by: Andy Shevchenko <[email protected]>
Reviewed-by: Sergey Ryazanov <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Usually the ISO 15765-2 protocol is a point-to-point protocol to transfer
segmented PDUs to a dedicated receiver. This receiver sends a flow control
message to specify protocol options and timings (e.g. block size / STmin).
The so called functional addressing communication allows a 1:N
communication but is limited to a single frame length.
This new CAN_ISOTP_CF_BROADCAST allows an unconfirmed 1:N communication
with PDU length that would not fit into a single frame. This feature is
not covered by the ISO 15765-2 standard.
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Oliver Hartkopp <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
|
|
When calling dev_fill_forward_path on a pppoe device, the provided destination
address is invalid. In order for the bridge fdb lookup to succeed, the pppoe
code needs to update ctx->daddr to the correct value.
Fix this by storing the address inside struct net_device_path_ctx
Fixes: f6efc675c9dd ("net: ppp: resolve forwarding path for bridge pppoe devices")
Signed-off-by: Felix Fietkau <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
A cpu can observe sd->defer_count reaching 128,
and call smp_call_function_single_async()
Problem is that the remote CPU can clear sd->defer_count
before the IPI is run/acknowledged.
Other cpus can queue more packets and also decide
to call smp_call_function_single_async() while the pending
IPI was not yet delivered.
This is a common issue with smp_call_function_single_async().
Callers must ensure correct synchronization and serialization.
I triggered this issue while experimenting smaller threshold.
Performing the call to smp_call_function_single_async()
under sd->defer_lock protection did not solve the problem.
Commit 5a18ceca6350 ("smp: Allow smp_call_function_single_async()
to insert locked csd") replaced an informative WARN_ON_ONCE()
with a return of -EBUSY, which is often ignored.
Test of CSD_FLAG_LOCK presence is racy anyway.
Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The SKB_DR_OR() is used to set the drop reason to a value when it is
not set or specified yet. SKB_NOT_DROPPED_YET should also be considered
as not set.
Reviewed-by: Jiang Biao <[email protected]>
Reviewed-by: Hao Peng <[email protected]>
Signed-off-by: Menglong Dong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This is no longer a macro, but an inlined function.
INET_MATCH() -> inet_match()
Signed-off-by: Eric Dumazet <[email protected]>
Suggested-by: Olivier Hartkopp <[email protected]>
Suggested-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
INET6_MATCH() runs without holding a lock on the socket.
We probably need to annotate most reads.
This patch makes INET6_MATCH() an inline function
to ease our changes.
v2: inline function only defined if IS_ENABLED(CONFIG_IPV6)
Change the name to inet6_match(), this is no longer a macro.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
inet_request_bound_dev_if() reads sk->sk_bound_dev_if twice
while listener socket is not locked.
Another cpu could change this field under us.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
UDP sendmsg() is lockless, and reads sk->sk_bound_dev_if while
this field can be changed by another thread.
Adds minimal annotations to avoid KCSAN splats for UDP.
Following patches will add more annotations to potential lockless readers.
BUG: KCSAN: data-race in __ip6_datagram_connect / udpv6_sendmsg
write to 0xffff888136d47a94 of 4 bytes by task 7681 on cpu 0:
__ip6_datagram_connect+0x6e2/0x930 net/ipv6/datagram.c:221
ip6_datagram_connect+0x2a/0x40 net/ipv6/datagram.c:272
inet_dgram_connect+0x107/0x190 net/ipv4/af_inet.c:576
__sys_connect_file net/socket.c:1900 [inline]
__sys_connect+0x197/0x1b0 net/socket.c:1917
__do_sys_connect net/socket.c:1927 [inline]
__se_sys_connect net/socket.c:1924 [inline]
__x64_sys_connect+0x3d/0x50 net/socket.c:1924
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x50 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffff888136d47a94 of 4 bytes by task 7670 on cpu 1:
udpv6_sendmsg+0xc60/0x16e0 net/ipv6/udp.c:1436
inet6_sendmsg+0x5f/0x80 net/ipv6/af_inet6.c:652
sock_sendmsg_nosec net/socket.c:705 [inline]
sock_sendmsg net/socket.c:725 [inline]
____sys_sendmsg+0x39a/0x510 net/socket.c:2413
___sys_sendmsg net/socket.c:2467 [inline]
__sys_sendmmsg+0x267/0x4c0 net/socket.c:2553
__do_sys_sendmmsg net/socket.c:2582 [inline]
__se_sys_sendmmsg net/socket.c:2579 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2579
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x2b/0x50 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x00000000 -> 0xffffff9b
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 7670 Comm: syz-executor.3 Tainted: G W 5.18.0-rc1-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
I chose to not add Fixes: tag because race has minor consequences
and stable teams busy enough.
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Instead of simply forcing a 0 payload_len in IPv6 header,
implement RFC 2675 and insert a custom extension header.
Note that only TCP stack is currently potentially generating
jumbograms, and that this extension header is purely local,
it wont be sent on a physical link.
This is needed so that packet capture (tcpdump and friends)
can properly dissect these large packets.
Signed-off-by: Coco Li <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Allow the gro_max_size to exceed a value larger than 65536.
There weren't really any external limitations that prevented this other
than the fact that IPv4 only supports a 16 bit length field. Since we have
the option of adding a hop-by-hop header for IPv6 we can allow IPv6 to
exceed this value and for IPv4 and non-TCP flows we can cap things at 65536
via a constant rather than relying on gro_max_size.
[edumazet] limit GRO_MAX_SIZE to (8 * 65535) to avoid overflows.
Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
ipv6 tcp and gro stacks will soon be able to build big TCP packets,
with an added temporary Hop By Hop header.
If GSO is involved for these large packets, we need to remove
the temporary HBH header before segmentation happens.
v2: perform HBH removal from ipv6_gso_segment() instead of
skb_segment() (Alexander feedback)
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Following patches will need to add and remove local IPv6 jumbogram
options to enable BIG TCP.
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Make sure we will not overflow shinfo->gso_segs
Minimal TCP MSS size is 8 bytes, and shinfo->gso_segs
is a 16bit field.
TCP_MIN_GSO_SIZE is currently defined in include/net/tcp.h,
it seems cleaner to not bring tcp details into include/linux/netdevice.h
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The code for gso_max_size was added originally to allow for debugging and
workaround of buggy devices that couldn't support TSO with blocks 64K in
size. The original reason for limiting it to 64K was because that was the
existing limits of IPv4 and non-jumbogram IPv6 length fields.
With the addition of Big TCP we can remove this limit and allow the value
to potentially go up to UINT_MAX and instead be limited by the tso_max_size
value.
So in order to support this we need to go through and clean up the
remaining users of the gso_max_size value so that the values will cap at
64K for non-TCPv6 flows. In addition we can clean up the GSO_MAX_SIZE value
so that 64K becomes GSO_LEGACY_MAX_SIZE and UINT_MAX will now be the upper
limit for GSO_MAX_SIZE.
v6: (edumazet) fixed a compile error if CONFIG_IPV6=n,
in a new sk_trim_gso_size() helper.
netif_set_tso_max_size() caps the requested TSO size
with GSO_MAX_SIZE.
Signed-off-by: Alexander Duyck <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
New netlink attributes IFLA_TSO_MAX_SIZE and IFLA_TSO_MAX_SEGS
are used to report to user-space the device TSO limits.
ip -d link sh dev eth1
...
tso_max_size 65536 tso_max_segs 65535
Signed-off-by: Eric Dumazet <[email protected]>
Acked-by: Alexander Duyck <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
This is v2 including deadlock fix in conntrack ecache rework
reported by Jakub Kicinski.
The following patchset contains Netfilter updates for net-next,
mostly updates to conntrack from Florian Westphal.
1) Add a dedicated list for conntrack event redelivery.
2) Include event redelivery list in conntrack dumps of dying type.
3) Remove per-cpu dying list for event redelivery, not used anymore.
4) Add netns .pre_exit to cttimeout to zap timeout objects before
synchronize_rcu() call.
5) Remove nf_ct_unconfirmed_destroy.
6) Add generation id for conntrack extensions for conntrack
timeout and helpers.
7) Detach timeout policy from conntrack on cttimeout module removal.
8) Remove __nf_ct_unconfirmed_destroy.
9) Remove unconfirmed list.
10) Remove unconditional local_bh_disable in init_conntrack().
11) Consolidate conntrack iterator nf_ct_iterate_cleanup().
12) Detect if ctnetlink listeners exist to short-circuit event
path early.
13) Un-inline nf_ct_ecache_ext_add().
14) Add nf_conntrack_events autodetect ctnetlink listener mode
and make it default.
15) Add nf_ct_ecache_exist() to check for event cache extension.
16) Extend flowtable reverse route lookup to include source, iif,
tos and mark, from Sven Auhagen.
17) Do not verify zero checksum UDP packets in nf_reject,
from Kevin Mitchell.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds the new struct ieee80211_rate_status and replaces
'struct rate_info *rate' in ieee80211_tx_status with pointer and length
annotation.
The struct ieee80211_rate_status allows to:
(1) receive tx power status feedback for transmit power control (TPC)
per packet or packet retry
(2) dynamic mapping of wifi chip specific multi-rate retry (mrr)
chains with different lengths
(3) increase the limit of annotatable rate indices to support
IEEE802.11ac rate sets and beyond
ieee80211_tx_info, control and status buffer, and ieee80211_tx_rate
cannot be used to achieve these goals due to fixed size limitations.
Our new struct contains a struct rate_info to annotate the rate that was
used, retry count of the rate and tx power. It is intended for all
information related to RC and TPC that needs to be passed from driver to
mac80211 and its RC/TPC algorithms like Minstrel_HT. It corresponds to
one stage in an mrr. Multiple subsequent instances of this struct can be
included in struct ieee80211_tx_status via a pointer and a length variable.
Those instances can be allocated on-stack. The former reference to a single
instance of struct rate_info is replaced with our new annotation.
An extension is introduced to struct ieee80211_hw. There are two new
members called 'tx_power_levels' and 'max_txpwr_levels_idx' acting as a
tx power level table. When a wifi device is registered, the driver shall
supply all supported power levels in this list. This allows to support
several quirks like differing power steps in power level ranges or
alike. TPC can use this for algorithm and thus be designed more abstract
instead of handling all possible step widths individually.
Further mandatory changes in status.c, mt76 and ath11k drivers due to the
removal of 'struct rate_info *rate' are also included.
status.c already uses the information in ieee80211_tx_status->rate in
radiotap, this is now changed to use ieee80211_rate_status->rate_idx.
mt76 driver already uses struct rate_info to pass the tx rate to status
path. The new members of the ieee80211_tx_status are set to NULL and 0
because the previously passed rate is not relevant to rate control and
accurate information is passed via tx_info->status.rates.
For ath11k, the txrate can be passed via this struct because ath11k uses
firmware RC and thus the information does not interfere with software RC.
Compile-Tested: current wireless-next tree with all flags on
Tested-on: Xiaomi 4A Gigabit (MediaTek MT7603E, MT7612E) with OpenWrt
Linux 5.10.113
Signed-off-by: Jonas Jelonek <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Johannes Berg <[email protected]>
|
|
NL80211_ATTR_HE_BSS_COLOR attribute can be included in both
NL80211_CMD_START_AP and NL80211_CMD_SET_BEACON commands.
Move he_bss_color from cfg80211_ap_settings to cfg80211_beacon_data
and parse NL80211_ATTR_HE_BSS_COLOR as a part of nl80211_parse_beacon()
to have bss color settings parsed for both start ap and set beacon
commands.
Add a new flag he_bss_color_valid to indicate whether
NL80211_ATTR_HE_BSS_COLOR attribute is included.
Signed-off-by: Rameshkumar Sundaram <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[fix build ...]
Signed-off-by: Johannes Berg <[email protected]>
|
|
In IPv4 setting the "disable_policy" flag on a device means no policy
should be enforced for traffic originating from the device. This was
implemented by seting the DST_NOPOLICY flag in the dst based on the
originating device.
However, dsts are cached in nexthops regardless of the originating
devices, in which case, the DST_NOPOLICY flag value may be incorrect.
Consider the following setup:
+------------------------------+
| ROUTER |
+-------------+ | +-----------------+ |
| ipsec src |----|-|ipsec0 | |
+-------------+ | |disable_policy=0 | +----+ |
| +-----------------+ |eth1|-|-----
+-------------+ | +-----------------+ +----+ |
| noipsec src |----|-|eth0 | |
+-------------+ | |disable_policy=1 | |
| +-----------------+ |
+------------------------------+
Where ROUTER has a default route towards eth1.
dst entries for traffic arriving from eth0 would have DST_NOPOLICY
and would be cached and therefore can be reused by traffic originating
from ipsec0, skipping policy check.
Fix by setting a IPSKB_NOPOLICY flag in IPCB and observing it instead
of the DST in IN/FWD IPv4 policy checks.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Shmulik Ladkani <[email protected]>
Signed-off-by: Eyal Birger <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
|
|
This field doesn't exist, remove the docs for it.
Signed-off-by: Johannes Berg <[email protected]>
|
|
This is called offload_flags, remove the extra 'a'.
Signed-off-by: Johannes Berg <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"The recent expansion of the sched switch tracepoint inserted a new
argument in the middle of the arguments. This reordering broke BPF
programs which relied on the old argument list.
While tracepoints are not considered stable ABI, it's not trivial to
make BPF cope with such a change, but it's being worked on. For now
restore the original argument order and move the new argument to the
end of the argument list"
* tag 'sched-urgent-2022-05-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/tracing: Append prev_state to tp args instead
|
|
Pull NFS client bugfixes from Trond Myklebust:
"One more pull request. There was a bug in the fix to ensure that gss-
proxy continues to work correctly after we fixed the AF_LOCAL socket
leak in the RPC code. This therefore reverts that broken patch, and
replaces it with one that works correctly.
Stable fixes:
- SUNRPC: Ensure that the gssproxy client can start in a connected
state
Bugfixes:
- Revert "SUNRPC: Ensure gss-proxy connects on setup"
- nfs: fix broken handling of the softreval mount option"
* tag 'nfs-for-5.18-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
nfs: fix broken handling of the softreval mount option
SUNRPC: Ensure that the gssproxy client can start in a connected state
Revert "SUNRPC: Ensure gss-proxy connects on setup"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
pull request (net-next): ipsec-next 2022-05-13
1) Cleanups for the code behind the XFRM offload API. This is a
preparation for the extension of the API for policy offload.
From Leon Romanovsky.
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next:
xfrm: drop not needed flags variable in XFRM offload struct
net/mlx5e: Use XFRM state direction instead of flags
netdevsim: rely on XFRM state direction instead of flags
ixgbe: propagate XFRM offload state direction instead of flags
xfrm: store and rely on direction to construct offload flags
xfrm: rename xfrm_state_offload struct to allow reuse
xfrm: delete not used number of external headers
xfrm: free not used XFRM_ESP_NO_TRAILER flag
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The checksum is optional for UDP packets. However nf_reject would
previously require a valid checksum to elicit a response such as
ICMP_DEST_UNREACH.
Add some logic to nf_reject_verify_csum to determine if a UDP packet has
a zero checksum and should therefore not be verified.
Signed-off-by: Kevin Mitchell <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
The pointer check usually results in a 'false positive': its likely
that the ctnetlink module is loaded but no event monitoring is enabled.
After recent change to autodetect ctnetlink usage and only allocate
the ecache extension if a listener is active, check if the extension
is present on a given conntrack.
If its not there, there is nothing to report and calls to the
notification framework can be elided.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Only called when new ct is allocated or the extension isn't present.
This function will be extended, place this in the conntrack module
instead of inlining.
The callers already depend on nf_conntrack module.
Return value is changed to bool, noone used the returned pointer.
Make sure that the core drops the newly allocated conntrack
if the extension is requested but can't be added.
This makes it necessary to ifdef the section, as the stub
always returns false we'd drop every new conntrack if the
the ecache extension is disabled in kconfig.
Add from data path (xt_CT, nft_ct) is unchanged.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
At this time, every new conntrack gets the 'event cache extension'
enabled for it.
This is because the 'net.netfilter.nf_conntrack_events' sysctl defaults
to 1.
Changing the default to 0 means that commands that rely on the event
notification extension, e.g. 'conntrack -E' or conntrackd, stop working.
We COULD detect if there is a listener by means of
'nfnetlink_has_listeners()' and only add the extension if this is true.
The downside is a dependency from conntrack module to nfnetlink module.
This adds a different way: inc/dec a counter whenever a ctnetlink group
is being (un)subscribed and toggle a flag in struct net.
Next patches will take advantage of this and will only add the event
extension if the flag is set.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
This patch adds a structure to collect all the context data that is
passed to the cleanup iterator.
struct nf_ct_iter_data {
struct net *net;
void *data;
u32 portid;
int report;
};
There is a netns field that allows to clean up conntrack entries
specifically owned by the specified netns.
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
It has no function anymore and can be removed.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Multiple netfilter extensions store pointers to external data
in their extension area struct.
Examples:
1. Timeout policies
2. Connection tracking helpers.
No references are taken for these.
When a helper or timeout policy is removed, the conntrack table gets
traversed and affected extensions are cleared.
Conntrack entries not yet in the hashtable are referenced via a special
list, the unconfirmed list.
On removal of a policy or connection tracking helper, the unconfirmed
list gets traversed an all entries are marked as dying, this prevents
them from getting committed to the table at insertion time: core checks
for dying bit, if set, the conntrack entry gets destroyed at confirm
time.
The disadvantage is that each new conntrack has to be added to the percpu
unconfirmed list, and each insertion needs to remove it from this list.
The list is only ever needed when a policy or helper is removed -- a rare
occurrence.
Add a generation ID count: Instead of adding to the list and then
traversing that list on policy/helper removal, increment a counter
that is stored in the extension area.
For unconfirmed conntracks, the extension has the genid valid at ct
allocation time.
Removal of a helper/policy etc. increments the counter.
At confirmation time, validate that ext->genid == global_id.
If the stored number is not the same, do not allow the conntrack
insertion, just like as if a confirmed-list traversal would have flagged
the entry as dying.
After insertion, the genid is no longer relevant (conntrack entries
are now reachable via the conntrack table iterators and is set to 0.
This allows removal of the percpu unconfirmed list.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
This helper tags connections not yet in the conntrack table as
dying. These nf_conn entries will be dropped instead when the
core attempts to insert them from the input or postrouting
'confirm' hook.
After the previous change, the entries get unlinked from the
list earlier, so that by the time the actual exit hook runs,
new connections no longer have a timeout policy assigned.
Its enough to walk the hashtable instead.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Make it so netns pre_exit unlinks the objects from the pernet list, so
they cannot be found anymore.
netns core issues a synchronize_rcu() before calling the exit hooks so
any the time the exit hooks run unconfirmed nf_conn entries have been
free'd or they have been committed to the hashtable.
The exit hook still tags unconfirmed entries as dying, this can
now be removed in a followup change.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|