Age | Commit message (Collapse) | Author | Files | Lines |
|
Add a kthread_stop_put() helper that stops a thread and puts its task
struct. Use it to replace the various instances of kthread_stop()
followed by put_task_struct().
Remove the kthread_stop_put() macro in usbip that is similar but doesn't
return the result of kthread_stop().
[[email protected]: fix kerneldoc comment]
Link: https://lkml.kernel.org/r/[email protected]
[[email protected]: document kthread_stop_put()'s argument]
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Andreas Gruenbacher <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use new APIs to dynamically allocate the sunrpc_cred shrinker.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Qi Zheng <[email protected]>
Reviewed-by: Muchun Song <[email protected]>
Cc: Jeff Layton <[email protected]>
Cc: Neil Brown <[email protected]>
Cc: Olga Kornievskaia <[email protected]>
Cc: Dai Ngo <[email protected]>
Cc: Tom Talpey <[email protected]>
Cc: Trond Myklebust <[email protected]>
Cc: Anna Schumaker <[email protected]>
Cc: Abhinav Kumar <[email protected]>
Cc: Alasdair Kergon <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Alyssa Rosenzweig <[email protected]>
Cc: Andreas Dilger <[email protected]>
Cc: Andreas Gruenbacher <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Bob Peterson <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Carlos Llamas <[email protected]>
Cc: Chandan Babu R <[email protected]>
Cc: Chao Yu <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Christian Brauner <[email protected]>
Cc: Christian Koenig <[email protected]>
Cc: Chuck Lever <[email protected]>
Cc: Coly Li <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: Daniel Vetter <[email protected]>
Cc: "Darrick J. Wong" <[email protected]>
Cc: Dave Chinner <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: David Airlie <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: David Sterba <[email protected]>
Cc: Dmitry Baryshkov <[email protected]>
Cc: Gao Xiang <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Huang Rui <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jaegeuk Kim <[email protected]>
Cc: Jani Nikula <[email protected]>
Cc: Jan Kara <[email protected]>
Cc: Jason Wang <[email protected]>
Cc: Jeffle Xu <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Joonas Lahtinen <[email protected]>
Cc: Josef Bacik <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Kent Overstreet <[email protected]>
Cc: Kirill Tkhai <[email protected]>
Cc: Marijn Suijten <[email protected]>
Cc: "Michael S. Tsirkin" <[email protected]>
Cc: Mike Snitzer <[email protected]>
Cc: Minchan Kim <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Oleksandr Tyshchenko <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rob Clark <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Rodrigo Vivi <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Sean Paul <[email protected]>
Cc: Sergey Senozhatsky <[email protected]>
Cc: Song Liu <[email protected]>
Cc: Stefano Stabellini <[email protected]>
Cc: Steven Price <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tomeu Vizoso <[email protected]>
Cc: Tvrtko Ursulin <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Xuan Zhuo <[email protected]>
Cc: Yue Hu <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Daniel Borkmann says:
====================
pull-request: bpf 2023-10-02
We've added 11 non-merge commits during the last 12 day(s) which contain
a total of 12 files changed, 176 insertions(+), 41 deletions(-).
The main changes are:
1) Fix BPF verifier to reset backtrack_state masks on global function
exit as otherwise subsequent precision tracking would reuse them,
from Andrii Nakryiko.
2) Several sockmap fixes for available bytes accounting,
from John Fastabend.
3) Reject sk_msg egress redirects to non-TCP sockets given this
is only supported for TCP sockets today, from Jakub Sitnicki.
4) Fix a syzkaller splat in bpf_mprog when hitting maximum program
limits with BPF_F_BEFORE directive, from Daniel Borkmann
and Nikolay Aleksandrov.
5) Fix BPF memory allocator to use kmalloc_size_roundup() to adjust
size_index for selecting a bpf_mem_cache, from Hou Tao.
6) Fix arch_prepare_bpf_trampoline return code for s390 JIT,
from Song Liu.
7) Fix bpf_trampoline_get when CONFIG_BPF_JIT is turned off,
from Leon Hwang.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Use kmalloc_size_roundup() to adjust size_index
selftest/bpf: Add various selftests for program limits
bpf, mprog: Fix maximum program check on mprog attachment
bpf, sockmap: Reject sk_msg egress redirects to non-TCP sockets
bpf, sockmap: Add tests for MSG_F_PEEK
bpf, sockmap: Do not inc copied_seq when PEEK flag set
bpf: tcp_read_skb needs to pop skb regardless of seq
bpf: unconditionally reset backtrack_state masks on global func exit
bpf: Fix tr dereferencing
selftests/bpf: Check bpf_cubic_acked() is called via struct_ops
s390/bpf: Let arch_prepare_bpf_trampoline return program size
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
nft_rbtree_gc_elem() walks back and removes the end interval element that
comes before the expired element.
There is a small chance that we've cached this element as 'rbe_ge'.
If this happens, we hold and test a pointer that has been queued for
freeing.
It also causes spurious insertion failures:
$ cat test-testcases-sets-0044interval_overlap_0.1/testout.log
Error: Could not process rule: File exists
add element t s { 0 - 2 }
^^^^^^
Failed to insert 0 - 2 given:
table ip t {
set s {
type inet_service
flags interval,timeout
timeout 2s
gc-interval 2s
}
}
The set (rbtree) is empty. The 'failure' doesn't happen on next attempt.
Reason is that when we try to insert, the tree may hold an expired
element that collides with the range we're adding.
While we do evict/erase this element, we can trip over this check:
if (rbe_ge && nft_rbtree_interval_end(rbe_ge) && nft_rbtree_interval_end(new))
return -ENOTEMPTY;
rbe_ge was erased by the synchronous gc, we should not have done this
check. Next attempt won't find it, so retry results in successful
insertion.
Restart in-kernel to avoid such spurious errors.
Such restart are rare, unless userspace intentionally adds very large
numbers of elements with very short timeouts while setting a huge
gc interval.
Even in this case, this cannot loop forever, on each retry an existing
element has been removed.
As the caller is holding the transaction mutex, its impossible
for a second entity to add more expiring elements to the tree.
After this it also becomes feasible to remove the async gc worker
and perform all garbage collection from the commit path.
Fixes: c9e6978e2725 ("netfilter: nft_set_rbtree: Switch to node list walk for overlap detection")
Signed-off-by: Florian Westphal <[email protected]>
|
|
When adding/updating an object, the transaction handler emits suitable
audit log entries already, the one in nft_obj_notify() is redundant. To
fix that (and retain the audit logging from objects' 'update' callback),
Introduce an "audit log free" variant for internal use.
Fixes: c520292f29b8 ("audit: log nftables configuration change events once per table")
Signed-off-by: Phil Sutter <[email protected]>
Reviewed-by: Richard Guy Briggs <[email protected]>
Acked-by: Paul Moore <[email protected]> (Audit)
Signed-off-by: Florian Westphal <[email protected]>
|
|
In Scenario A and B below, as the delayed INIT_ACK always changes the peer
vtag, SCTP ct with the incorrect vtag may cause packet loss.
Scenario A: INIT_ACK is delayed until the peer receives its own INIT_ACK
192.168.1.2 > 192.168.1.1: [INIT] [init tag: 1328086772]
192.168.1.1 > 192.168.1.2: [INIT] [init tag: 1414468151]
192.168.1.2 > 192.168.1.1: [INIT ACK] [init tag: 1328086772]
192.168.1.1 > 192.168.1.2: [INIT ACK] [init tag: 1650211246] *
192.168.1.2 > 192.168.1.1: [COOKIE ECHO]
192.168.1.1 > 192.168.1.2: [COOKIE ECHO]
192.168.1.2 > 192.168.1.1: [COOKIE ACK]
Scenario B: INIT_ACK is delayed until the peer completes its own handshake
192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408]
192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885]
192.168.1.2 > 192.168.1.1: sctp (1) [INIT ACK] [init tag: 3922216408]
192.168.1.1 > 192.168.1.2: sctp (1) [COOKIE ECHO]
192.168.1.2 > 192.168.1.1: sctp (1) [COOKIE ACK]
192.168.1.1 > 192.168.1.2: sctp (1) [INIT ACK] [init tag: 3914796021] *
This patch fixes it as below:
In SCTP_CID_INIT processing:
- clear ct->proto.sctp.init[!dir] if ct->proto.sctp.init[dir] &&
ct->proto.sctp.init[!dir]. (Scenario E)
- set ct->proto.sctp.init[dir].
In SCTP_CID_INIT_ACK processing:
- drop it if !ct->proto.sctp.init[!dir] && ct->proto.sctp.vtag[!dir] &&
ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario B, Scenario C)
- drop it if ct->proto.sctp.init[dir] && ct->proto.sctp.init[!dir] &&
ct->proto.sctp.vtag[!dir] != ih->init_tag. (Scenario A)
In SCTP_CID_COOKIE_ACK processing:
- clear ct->proto.sctp.init[dir] and ct->proto.sctp.init[!dir].
(Scenario D)
Also, it's important to allow the ct state to move forward with cookie_echo
and cookie_ack from the opposite dir for the collision scenarios.
There are also other Scenarios where it should allow the packet through,
addressed by the processing above:
Scenario C: new CT is created by INIT_ACK.
Scenario D: start INIT on the existing ESTABLISHED ct.
Scenario E: start INIT after the old collision on the existing ESTABLISHED
ct.
192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 3922216408]
192.168.1.1 > 192.168.1.2: sctp (1) [INIT] [init tag: 144230885]
(both side are stopped, then start new connection again in hours)
192.168.1.2 > 192.168.1.1: sctp (1) [INIT] [init tag: 242308742]
Fixes: 9fb9cbb1082d ("[NETFILTER]: Add nf_conntrack subsystem.")
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
|
|
nft can perform merging of adjacent payload requests.
This means that:
ether saddr 00:11 ... ether type 8021ad ...
is a single payload expression, for 8 bytes, starting at the
ethernet source offset.
Check that offset+length is fully within the source/destination mac
addersses.
This bug prevents 'ether type' from matching the correct h_proto in case
vlan tag got stripped.
Fixes: de6843be3082 ("netfilter: nft_payload: rebuild vlan header when needed")
Reported-by: David Ward <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
|
|
The call netdev_{put, hold} of dev_{put, hold} will check NULL, so there
is no need to check before using dev_{put, hold}, remove it to silence
the warning:
./net/can/raw.c:497:2-9: WARNING: NULL check before dev_{put, hold} functions is not needed.
Reported-by: Abaci Robot <[email protected]>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6231
Signed-off-by: Jiapeng Chong <[email protected]>
Reported-by: Simon Horman <[email protected]>
Acked-by: Oliver Hartkopp <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
|
|
This change adds a sysctl to opt-out of RFC4862 section 5.5.3e's valid
lifetime derivation mechanism.
RFC4862 section 5.5.3e prescribes that the valid lifetime in a Router
Advertisement PIO shall be ignored if it less than 2 hours and to reset
the lifetime of the corresponding address to 2 hours. An in-progress
6man draft (see draft-ietf-6man-slaac-renum-07 section 4.2) is currently
looking to remove this mechanism. While this draft has not been moving
particularly quickly for other reasons, there is widespread consensus on
section 4.2 which updates RFC4862 section 5.5.3e.
Cc: Maciej Żenczykowski <[email protected]>
Cc: Lorenzo Colitti <[email protected]>
Cc: Jen Linkova <[email protected]>
Signed-off-by: Patrick Rohr <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The device list needs its associated lock held when modifying it, or the
list could become corrupted, as syzbot discovered.
Reported-and-tested-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=c1d0a03d305972dbbe14
Signed-off-by: Jeremy Cline <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Fixes: 6709d4b7bc2e ("net: nfc: Fix use-after-free caused by nfc_llcp_find_local")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The ETHTOOL_A_PLCA_ENABLED data type is u8. But while parsing the
value from the attribute, nla_get_u32() is used in the plca_update_sint()
function instead of nla_get_u8(). So plca_cfg.enabled variable is updated
with some garbage value instead of 0 or 1 and always enables plca even
though plca is disabled through ethtool application. This bug has been
fixed by parsing the values based on the attributes type in the policy.
Fixes: 8580e16c28f3 ("net/ethtool: add netlink interface for the PLCA RS")
Signed-off-by: Parthiban Veerasooran <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The KSZ9477 has support for HSR (High-Availability Seamless Redundancy).
One of its offloading (i.e. performed in the switch IC hardware) features
is to duplicate received frame to both HSR aware switch ports.
To achieve this goal - the tail TAG needs to be modified. To be more
specific, both ports must be marked as destination (egress) ones.
The NETIF_F_HW_HSR_DUP flag indicates that the device supports HSR and
assures (in HSR core code) that frame is sent only once from HOST to
switch with tail tag indicating both ports.
Signed-off-by: Lukasz Majewski <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
In some cases, drivers may need to veto the changing of a MAC address on
a user port. Such is the case with KSZ9477 when it offloads a HSR device,
because it programs the MAC address of multiple ports to a shared
hardware register. Those ports need to have equal MAC addresses for the
lifetime of the HSR offload.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Lukasz Majewski <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Drivers can provide meaningful error messages which state a reason why
they can't perform an offload, and dsa_slave_changeupper() already has
the infrastructure to propagate these over netlink rather than printing
to the kernel log. So pass the extack argument and modify the xrs700x
driver's port_hsr_join() prototype.
Also take the opportunity and use the extack for the 2 -EOPNOTSUPP cases
from xrs700x_hsr_join().
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Lukasz Majewski <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The function doesn't modify the addresses passed as input, mark them
as 'const' to make that clear.
Signed-off-by: Beniamino Galvani <[email protected]>
Reviewed-by: Guillaume Nault <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
'n_tables' is small, UDP_TUNNEL_NIC_MAX_TABLES = 4 as a maximum. So there
is no real point to allocate the 'entries' pointers array with a dedicate
memory allocation.
Using a flexible array for struct udp_tunnel_nic->entries avoids the
overhead of an additional memory allocation.
This also saves an indirection when the array is accessed.
Finally, __counted_by() can be used for run-time bounds checking if
configured and supported by the compiler.
Signed-off-by: Christophe JAILLET <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://lore.kernel.org/r/4a096ba9cf981a588aa87235bb91e933ee162b3d.1695542544.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Paolo Abeni <[email protected]>
|
|
This is inspired by several syzbot reports where
tcp_metrics_flush_all() was seen in the traces.
We can avoid acquiring tcp_metrics_lock for empty buckets,
and we should add one cond_resched() to break potential long loops.
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
tcp_init_metrics() only wants to get metrics if they were
previously stored in the cache. Creating an entry is adding
useless costs, especially when tcp_no_metrics_save is set.
Fixes: 51c5d0c4b169 ("tcp: Maintain dynamic metrics in local cache.")
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
We need to set tp->snd_ssthresh to TCP_INFINITE_SSTHRESH
in the case tcp_get_metrics() fails for some reason.
Fixes: 9ad7c049f0f7 ("tcp: RFC2988bis + taking RTT sample from 3WHS for the passive open side")
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
When removing an item from RCU protected list, we must prevent
store-tearing, using rcu_assign_pointer() or WRITE_ONCE().
Fixes: 04f721c671656 ("tcp_metrics: Rewrite tcp_metrics_flush_all")
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Commit b0e214d21203 ("netfilter: keep conntrack reference until
IPsecv6 policy checks are done") is a direct copy of the old
commit b59c270104f0 ("[NETFILTER]: Keep conntrack reference until
IPsec policy checks are done") but for IPv6. However, it also
copies a bug that this old commit had. That is: when the third
packet of 3WHS connection establishment contains payload, it is
added into socket receive queue without the XFRM check and the
drop of connection tracking context.
That leads to nf_conntrack module being impossible to unload as
it waits for all the conntrack references to be dropped while
the packet release is deferred in per-cpu cache indefinitely, if
not consumed by the application.
The issue for IPv4 was fixed in commit 6f0012e35160 ("tcp: add a
missing nf_reset_ct() in 3WHS handling") by adding a missing XFRM
check and correctly dropping the conntrack context. However, the
issue was introduced to IPv6 code afterwards. Fixing it the
same way for IPv6 now.
Fixes: b0e214d21203 ("netfilter: keep conntrack reference until IPsecv6 policy checks are done")
Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Ilya Maximets <[email protected]>
Acked-by: Florian Westphal <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
After deleting an interface address in fib_del_ifaddr(), the function
scans the fib_info list for stray entries and calls fib_flush() and
fib_table_flush(). Then the stray entries will be deleted silently and no
RTM_DELROUTE notification will be sent.
This lack of notification can make routing daemons, or monitor like
`ip monitor route` miss the routing changes. e.g.
+ ip link add dummy1 type dummy
+ ip link add dummy2 type dummy
+ ip link set dummy1 up
+ ip link set dummy2 up
+ ip addr add 192.168.5.5/24 dev dummy1
+ ip route add 7.7.7.0/24 dev dummy2 src 192.168.5.5
+ ip -4 route
7.7.7.0/24 dev dummy2 scope link src 192.168.5.5
192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
+ ip monitor route
+ ip addr del 192.168.5.5/24 dev dummy1
Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5
Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5
As Ido reminded, fib_table_flush() isn't only called when an address is
deleted, but also when an interface is deleted or put down. The lack of
notification in these cases is deliberate. And commit 7c6bb7d2faaf
("net/ipv6: Add knob to skip DELROUTE message on device down") introduced
a sysctl to make IPv6 behave like IPv4 in this regard. So we can't send
the route delete notify blindly in fib_table_flush().
To fix this issue, let's add a new flag in "struct fib_info" to track the
deleted prefer source address routes, and only send notify for them.
After update:
+ ip monitor route
+ ip addr del 192.168.5.5/24 dev dummy1
Deleted 192.168.5.0/24 dev dummy1 proto kernel scope link src 192.168.5.5
Deleted broadcast 192.168.5.255 dev dummy1 table local proto kernel scope link src 192.168.5.5
Deleted local 192.168.5.5 dev dummy1 table local proto kernel scope host src 192.168.5.5
Deleted 7.7.7.0/24 dev dummy2 scope link src 192.168.5.5
Suggested-by: Thomas Haller <[email protected]>
Signed-off-by: Hangbin Liu <[email protected]>
Acked-by: Nicolas Dichtel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
key_serial_t fields are signed integers. Use nla_get/put_s32 for
those to avoid implicit signed conversion in the netlink protocol.
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/169530167716.8905.645746457741372879.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Socket file descriptors are signed integers. Use nla_get/put_s32 for
those to avoid implicit signed conversion in the netlink protocol.
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/169530165057.8905.8650469415145814828.stgit@oracle-102.nfsv4bat.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct dp_meter.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Pravin B Shelar <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Gustavo A. R. Silva <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct dp_meter_instance.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Pravin B Shelar <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Reviewed-by: Gustavo A. R. Silva <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add missing annotations to inet->mc_index and inet->mc_addr
to fix data-races.
getsockopt(IP_MULTICAST_IF) can be lockless.
setsockopt() side is left for later.
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Current implementation is already lockless, because the socket
lock is released before reading socket fields.
Add missing READ_ONCE() annotations.
Note that corresponding WRITE_ONCE() are needed, the order
of the patches do not really matter.
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add missing READ_ONCE() annotations when reading inet->uc_index
Implementing getsockopt(IP_UNICAST_IF) locklessly seems possible,
the setsockopt() part might not be possible at the moment.
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
sk_dst_get() does not require socket lock.
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
inet->inet_opt being RCU protected, we can use RCU instead
of locking the socket.
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Some reads of inet->tos are racy.
Add needed READ_ONCE() annotations and convert IP_TOS option lockless.
v2: missing changes in include/net/route.h (David Ahern)
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
inet->pmtudisc can be read locklessly.
Implement proper lockless reads and writes to inet->pmtudisc
ip_sock_set_mtu_discover() can now be called from arbitrary
contexts.
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
inet->mc_ttl can be read locklessly.
Implement proper lockless reads and writes to inet->mc_ttl
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Similar to the change in commit 0bdf399342c5("net: Avoid address
overwrite in kernel_connect"), BPF hooks run on bind may rewrite the
address passed to kernel_bind(). This change
1) Makes a copy of the bind address in kernel_bind() to insulate
callers.
2) Replaces direct calls to sock->ops->bind() in net with kernel_bind()
Link: https://lore.kernel.org/netdev/[email protected]/
Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind")
Cc: [email protected]
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Jordan Rife <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Callers of sock_sendmsg(), and similarly kernel_sendmsg(), in kernel
space may observe their value of msg_name change in cases where BPF
sendmsg hooks rewrite the send address. This has been confirmed to break
NFS mounts running in UDP mode and has the potential to break other
systems.
This patch:
1) Creates a new function called __sock_sendmsg() with same logic as the
old sock_sendmsg() function.
2) Replaces calls to sock_sendmsg() made by __sys_sendto() and
__sys_sendmsg() with __sock_sendmsg() to avoid an unnecessary copy,
as these system calls are already protected.
3) Modifies sock_sendmsg() so that it makes a copy of msg_name if
present before passing it down the stack to insulate callers from
changes to the send address.
Link: https://lore.kernel.org/netdev/[email protected]/
Fixes: 1cedee13d25a ("bpf: Hooks for sys_sendmsg")
Cc: [email protected]
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Jordan Rife <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
commit 0bdf399342c5 ("net: Avoid address overwrite in kernel_connect")
ensured that kernel_connect() will not overwrite the address parameter
in cases where BPF connect hooks perform an address rewrite. This change
replaces direct calls to sock->ops->connect() in net with kernel_connect()
to make these call safe.
Link: https://lore.kernel.org/netdev/[email protected]/
Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect")
Cc: [email protected]
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Jordan Rife <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This field can be read or written without socket lock being held.
Add annotations to avoid load-store tearing.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
sk->sk_txrehash readers are already safe against
concurrent change of this field.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
SO_MAX_PACING_RATE setsockopt() does not need to hold
the socket lock, because sk->sk_pacing_rate readers
can run fine if the value is changed by other threads,
after adding READ_ONCE() accessors.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
SO_BUSY_POLL_BUDGET
Setting sk->sk_ll_usec, sk_prefer_busy_poll and sk_busy_poll_budget
do not require the socket lock, readers are lockless anyway.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This options can not be set and return -ENOPROTOOPT,
no need to acqure socket lock.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
sock->flags are atomic, no need to hold the socket lock
in sk_setsockopt() for SO_PASSCRED, SO_PASSPIDFD and SO_PASSSEC.
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This is a followup of 8bf43be799d4 ("net: annotate data-races
around sk->sk_priority").
sk->sk_priority can be read and written without holding the socket lock.
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: Wenjia Zhang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
do_execute_actions() function can be called recursively multiple
times while executing actions that require pipeline forking or
recirculations. It may also be re-entered multiple times if the packet
leaves openvswitch module and re-enters it through a different port.
Currently, there is a 256-byte array allocated on stack in this
function that is supposed to hold NSH header. Compilers tend to
pre-allocate that space right at the beginning of the function:
a88: 48 81 ec b0 01 00 00 sub $0x1b0,%rsp
NSH is not a very common protocol, but the space is allocated on every
recursive call or re-entry multiplying the wasted stack space.
Move the stack allocation to push_nsh() function that is only used
if NSH actions are actually present. push_nsh() is also a simple
function without a possibility for re-entry, so the stack is returned
right away.
With this change the preallocated space is reduced by 256 B per call:
b18: 48 81 ec b0 00 00 00 sub $0xb0,%rsp
Signed-off-by: Ilya Maximets <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Reviewed-by: Eelco Chaudron [email protected]
Reviewed-by: Aaron Conole <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Including the transhdrlen in length is a problem when the packet is
partially filled (e.g. something like send(MSG_MORE) happened previously)
when appending to an IPv4 or IPv6 packet as we don't want to repeat the
transport header or account for it twice. This can happen under some
circumstances, such as splicing into an L2TP socket.
The symptom observed is a warning in __ip6_append_data():
WARNING: CPU: 1 PID: 5042 at net/ipv6/ip6_output.c:1800 __ip6_append_data.isra.0+0x1be8/0x47f0 net/ipv6/ip6_output.c:1800
that occurs when MSG_SPLICE_PAGES is used to append more data to an already
partially occupied skbuff. The warning occurs when 'copy' is larger than
the amount of data in the message iterator. This is because the requested
length includes the transport header length when it shouldn't. This can be
triggered by, for example:
sfd = socket(AF_INET6, SOCK_DGRAM, IPPROTO_L2TP);
bind(sfd, ...); // ::1
connect(sfd, ...); // ::1 port 7
send(sfd, buffer, 4100, MSG_MORE);
sendfile(sfd, dfd, NULL, 1024);
Fix this by only adding transhdrlen into the length if the write queue is
empty in l2tp_ip6_sendmsg(), analogously to how UDP does things.
l2tp_ip_sendmsg() looks like it won't suffer from this problem as it builds
the UDP packet itself.
Fixes: a32e0eec7042 ("l2tp: introduce L2TPv3 IP encapsulation support for IPv6")
Reported-by: [email protected]
Link: https://lore.kernel.org/r/[email protected]/
Suggested-by: Willem de Bruijn <[email protected]>
Signed-off-by: David Howells <[email protected]>
cc: Eric Dumazet <[email protected]>
cc: Willem de Bruijn <[email protected]>
cc: "David S. Miller" <[email protected]>
cc: David Ahern <[email protected]>
cc: Paolo Abeni <[email protected]>
cc: Jakub Kicinski <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
n->output field can be read locklessly, while a writer
might change the pointer concurrently.
Add missing annotations to prevent load-store tearing.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Core networking has opt-in atomic variant of dev->stats,
simply use DEV_STATS_INC(), DEV_STATS_ADD() and DEV_STATS_READ().
v2: removed @priv local var in l2tp_eth_dev_recv() (Simon)
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
While looking at a related syzbot report involving neigh_periodic_work(),
I found that I forgot to add an annotation when deleting an
RCU protected item from a list.
Readers use rcu_deference(*np), we need to use either
rcu_assign_pointer() or WRITE_ONCE() on writer side
to prevent store tearing.
I use rcu_assign_pointer() to have lockdep support,
this was the choice made in neigh_flush_dev().
Fixes: 767e97e1e0db ("neigh: RCU conversion of struct neighbour")
Signed-off-by: Eric Dumazet <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
bluetooth pull request for net:
- Fix handling of HCI_QUIRK_STRICT_DUPLICATE_FILTER
- Fix handling of listen for ISO unicast
- Fix build warnings
- Fix leaking content of local_codecs
- Add shutdown function for QCA6174
- Delete unused hci_req_prepare_suspend() declaration
- Fix hci_link_tx_to RCU lock usage
- Avoid redundant authentication
Signed-off-by: David S. Miller <[email protected]>
|