Age | Commit message (Collapse) | Author | Files | Lines |
|
If a command is already sent, we take care of freeing it, but we
also need to cancel the timeout as well.
Signed-off-by: Archie Pusaka <[email protected]>
Reviewed-by: Abhishek Pandit-Subedi <[email protected]>
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
|
|
force_static_address shall be writable while hdev is initing but is not
considered powered yet since the static address is written only when
powered.
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Tested-by: Brian Gix <[email protected]>
|
|
This attempts to program the address stored in hdev->static_addr after
the init sequence has been complete:
@ MGMT Command: Set Static A.. (0x002b) plen 6
Address: C0:55:44:33:22:11 (Static)
@ MGMT Event: Command Complete (0x0001) plen 7
Set Static Address (0x002b) plen 4
Status: Success (0x00)
Current settings: 0x00008200
Low Energy
Static Address
@ MGMT Event: New Settings (0x0006) plen 4
Current settings: 0x00008200
Low Energy
Static Address
< HCI Command: LE Set Random.. (0x08|0x0005) plen 6
Address: C0:55:44:33:22:11 (Static)
> HCI Event: Command Complete (0x0e) plen 4
LE Set Random Address (0x08|0x0005) ncmd 1
Status: Success (0x00)
@ MGMT Event: Command Complete (0x0001) plen 7
Set Powered (0x0005) plen 4
Status: Success (0x00)
Current settings: 0x00008201
Powered
Low Energy
Static Address
@ MGMT Event: New Settings (0x0006) plen 4
Current settings: 0x00008201
Powered
Low Energy
Static Address
Signed-off-by: Luiz Augusto von Dentz <[email protected]>
Tested-by: Brian Gix <[email protected]>
|
|
Recently, a customer reported that from their container whose
net namespace is different to the host's init_net, they can't set
the container's net.sctp.rto_max to any value smaller than
init_net.sctp.rto_min.
For instance,
Host:
sudo sysctl net.sctp.rto_min
net.sctp.rto_min = 1000
Container:
echo 100 > /mnt/proc-net/sctp/rto_min
echo 400 > /mnt/proc-net/sctp/rto_max
echo: write error: Invalid argument
This is caused by the check made from this'commit 4f3fdf3bc59c
("sctp: add check rto_min and rto_max in sysctl")'
When validating the input value, it's always referring the boundary
value set for the init_net namespace.
Having container's rto_max smaller than host's init_net.sctp.rto_min
does make sense. Consider that the rto between two containers on the
same host is very likely smaller than it for two hosts.
So to fix this problem, as suggested by Marcelo, this patch makes the
extra pointers of rto_min, rto_max, pf_retrans, and ps_retrans point
to the corresponding variables from the newly created net namespace while
the new net namespace is being registered in sctp_sysctl_net_register.
Fixes: 4f3fdf3bc59c ("sctp: add check rto_min and rto_max in sysctl")
Reviewed-by: Marcelo Ricardo Leitner <[email protected]>
Reviewed-by: Jakub Kicinski <[email protected]>
Acked-by: Marcelo Ricardo Leitner <[email protected]>
Signed-off-by: Firo Yang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:
====================
pull-request: bpf-next 2022-12-11
We've added 74 non-merge commits during the last 11 day(s) which contain
a total of 88 files changed, 3362 insertions(+), 789 deletions(-).
The main changes are:
1) Decouple prune and jump points handling in the verifier, from Andrii.
2) Do not rely on ALLOW_ERROR_INJECTION for fmod_ret, from Benjamin.
Merged from hid tree.
3) Do not zero-extend kfunc return values. Necessary fix for 32-bit archs,
from Björn.
4) Don't use rcu_users to refcount in task kfuncs, from David.
5) Three reg_state->id fixes in the verifier, from Eduard.
6) Optimize bpf_mem_alloc by reusing elements from free_by_rcu, from Hou.
7) Refactor dynptr handling in the verifier, from Kumar.
8) Remove the "/sys" mount and umount dance in {open,close}_netns
in bpf selftests, from Martin.
9) Enable sleepable support for cgrp local storage, from Yonghong.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (74 commits)
selftests/bpf: test case for relaxed prunning of active_lock.id
selftests/bpf: Add pruning test case for bpf_spin_lock
bpf: use check_ids() for active_lock comparison
selftests/bpf: verify states_equal() maintains idmap across all frames
bpf: states_equal() must build idmap for all function frames
selftests/bpf: test cases for regsafe() bug skipping check_id()
bpf: regsafe() must not skip check_ids()
docs/bpf: Add documentation for BPF_MAP_TYPE_SK_STORAGE
selftests/bpf: Add test for dynptr reinit in user_ringbuf callback
bpf: Use memmove for bpf_dynptr_{read,write}
bpf: Move PTR_TO_STACK alignment check to process_dynptr_func
bpf: Rework check_func_arg_reg_off
bpf: Rework process_dynptr_func
bpf: Propagate errors from process_* checks in check_func_arg
bpf: Refactor ARG_PTR_TO_DYNPTR checks into process_dynptr_func
bpf: Skip rcu_barrier() if rcu_trace_implies_rcu_gp() is true
bpf: Reuse freed element in free_by_rcu during allocation
selftests/bpf: Bring test_offload.py back to life
bpf: Fix comment error in fixup_kfunc_call function
bpf: Do not zero-extend kfunc return values
...
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
linux-can-next-for-6.2-20221212
this is a pull request of 39 patches for net-next/master.
The first 2 patches are by me fix a warning and coding style in the
kvaser_usb driver.
Vivek Yadav's patch sorts the includes of the m_can driver.
Biju Das contributes 5 patches for the rcar_canfd driver improve the
support for different IP core variants.
Jean Delvare's patch for the ctucanfd drops the dependency on
COMPILE_TEST.
Vincent Mailhol's patch sorts the includes of the etas_es58x driver.
Haibo Chen's contributes 2 patches that add i.MX93 support to the
flexcan driver.
Lad Prabhakar's patch updates the dt-bindings documentation of the
rcar_canfd driver.
Minghao Chi's patch converts the c_can platform driver to
devm_platform_get_and_ioremap_resource().
In the next 7 patches Vincent Mailhol adds devlink support to the
etas_es58x driver to report firmware, bootloader and hardware version.
Xu Panda's patch converts a strncpy() -> strscpy() in the ucan driver.
Ye Bin's patch removes a useless parameter from the AF_CAN protocol.
The next 2 patches by Vincent Mailhol and remove unneeded or unused
pointers to struct usb_interface in device's priv struct in the ucan
and gs_usb driver.
Vivek Yadav's patch cleans up the usage of the RAM initialization in
the m_can driver.
A patch by me add support for SO_MARK to the AF_CAN protocol.
Geert Uytterhoeven's patch fixes the number of CAN channels in the
rcan_canfd bindings documentation.
In the last 11 patches Markus Schneider-Pargmann optimizes the
register access in the t_can driver and cleans up the tcan glue
driver.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Add support for SO_MARK to the CAN_RAW protocol. This makes it
possible to add traffic control filters based on the fwmark.
Link: https://lore.kernel.org/all/[email protected]
Acked-by: Oliver Hartkopp <[email protected]>
Signed-off-by: Marc Kleine-Budde <[email protected]>
|
|
Since commit bdfb5765e45b remove NULL-ptr checks from users of
can_dev_rcv_lists_find(). 'err' parameter is useless, so remove it.
Signed-off-by: Ye Bin <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
|
|
There are two nat functions are nearly the same in both OVS and
TC code, (ovs_)ct_nat_execute() and ovs_ct_nat/tcf_ct_act_nat().
This patch creates nf_nat_ovs.c under netfilter and moves them
there then exports nf_ct_nat() so that it can be shared by both
OVS and TC, and keeps the nat (type) check and nat flag update
in OVS and TC's own place, as these parts are different between
OVS and TC.
Note that in OVS nat function it was using skb->protocol to get
the proto as it already skips vlans in key_extract(), while it
doesn't in TC, and TC has to call skb_protocol() to get proto.
So in nf_ct_nat_execute(), we keep using skb_protocol() which
works for both OVS and TC contrack.
Signed-off-by: Xin Long <[email protected]>
Acked-by: Aaron Conole <[email protected]>
Acked-by: Pablo Neira Ayuso <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In ovs_ct_nat_execute(), the packet flow key nat flags are updated
when it processes ICMP(v6) error packets translation successfully.
In ct_nat_execute() when processing ICMP(v6) error packets translation
successfully, it should have done the same in ct_nat_execute() to set
post_ct_s/dnat flag, which will be used to update flow key nat flags
in OVS module later.
Reviewed-by: Saeed Mahameed <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When it fails to allocate nat ext, the packet should be dropped, like
the memory allocation failures in other places in ovs_ct_nat().
This patch changes to return NF_DROP when fails to add nat ext before
doing NAT in ovs_ct_nat(), also it would keep consistent with tc
action ct' processing in tcf_ct_act_nat().
Signed-off-by: Xin Long <[email protected]>
Acked-by: Aaron Conole <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Either OVS_CT_SRC_NAT or OVS_CT_DST_NAT is set, OVS_CT_NAT must be
set in info->nat. Thus, if OVS_CT_NAT is not set in info->nat, it
will definitely not do NAT but returns NF_ACCEPT in ovs_ct_nat().
This patch changes nothing funcational but only makes this return
earlier in ovs_ct_nat() to keep consistent with TC's processing
in tcf_ct_act_nat().
Reviewed-by: Saeed Mahameed <[email protected]>
Acked-by: Aaron Conole <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The calls to ovs_ct_nat_execute() are as below:
ovs_ct_execute()
ovs_ct_lookup()
__ovs_ct_lookup()
ovs_ct_nat()
ovs_ct_nat_execute()
ovs_ct_commit()
__ovs_ct_lookup()
ovs_ct_nat()
ovs_ct_nat_execute()
and since skb_pull_rcsum() and skb_push_rcsum() are already
called in ovs_ct_execute(), there's no need to do it again
in ovs_ct_nat_execute().
Reviewed-by: Saeed Mahameed <[email protected]>
Acked-by: Aaron Conole <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Allow UDP_L4 for robust packets.
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Andrew Melnychenko <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next
Steffen Klassert says:
====================
ipsec-next 2022-12-09
1) Add xfrm packet offload core API.
From Leon Romanovsky.
2) Add xfrm packet offload support for mlx5.
From Leon Romanovsky and Raed Salem.
3) Fix a typto in a error message.
From Colin Ian King.
* tag 'ipsec-next-2022-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: (38 commits)
xfrm: Fix spelling mistake "oflload" -> "offload"
net/mlx5e: Open mlx5 driver to accept IPsec packet offload
net/mlx5e: Handle ESN update events
net/mlx5e: Handle hardware IPsec limits events
net/mlx5e: Update IPsec soft and hard limits
net/mlx5e: Store all XFRM SAs in Xarray
net/mlx5e: Provide intermediate pointer to access IPsec struct
net/mlx5e: Skip IPsec encryption for TX path without matching policy
net/mlx5e: Add statistics for Rx/Tx IPsec offloaded flows
net/mlx5e: Improve IPsec flow steering autogroup
net/mlx5e: Configure IPsec packet offload flow steering
net/mlx5e: Use same coding pattern for Rx and Tx flows
net/mlx5e: Add XFRM policy offload logic
net/mlx5e: Create IPsec policy offload tables
net/mlx5e: Generalize creation of default IPsec miss group and rule
net/mlx5e: Group IPsec miss handles into separate struct
net/mlx5e: Make clear what IPsec rx_err does
net/mlx5e: Flatten the IPsec RX add rule path
net/mlx5e: Refactor FTE setup code to be more clear
net/mlx5e: Move IPsec flow table creation to separate function
...
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When the resource size changes, the return value of the
'nla_put_u64_64bit' function is not checked. That has been fixed to avoid
rechecking at the next step.
Found by InfoTeCS on behalf of Linux Verification Center
(linuxtesting.org) with SVACE.
Note that this is harmless, we'd error out at the next put().
Signed-off-by: Ilia.Gavrilov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
syzkaller reported:
BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295
For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
build_skb().
When build_skb() is passed a frag_size of 0, it means the buffer came
from kmalloc. In these cases, ksize() is used to find its actual size,
but since the allocation may not have been made to that size, actually
perform the krealloc() call so that all the associated buffer size
checking will be correctly notified (and use the "new" pointer so that
compiler hinting works correctly). Split this logic out into a new
interface, slab_build_skb(), but leave the original 0 checking for now
to catch any stragglers.
Reported-by: [email protected]
Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ
Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function")
Cc: Pavel Begunkov <[email protected]>
Cc: pepsipu <[email protected]>
Cc: [email protected]
Cc: Vlastimil Babka <[email protected]>
Cc: kasan-dev <[email protected]>
Cc: Andrii Nakryiko <[email protected]>
Cc: [email protected]
Cc: Daniel Borkmann <[email protected]>
Cc: Hao Luo <[email protected]>
Cc: Jesper Dangaard Brouer <[email protected]>
Cc: John Fastabend <[email protected]>
Cc: [email protected]
Cc: KP Singh <[email protected]>
Cc: [email protected]
Cc: Stanislav Fomichev <[email protected]>
Cc: [email protected]
Cc: Yonghong Song <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
When 'err' is 0, it looks clearer to return '0' instead of the variable
called 'err'.
The behaviour is then not modified, just a clearer code.
By doing this, we can also avoid false positive smatch warnings like
this one:
net/mptcp/pm_netlink.c:1169 mptcp_pm_parse_pm_addr_attr() warn: missing error code? 'err'
Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Suggested-by: Mat Martineau <[email protected]>
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Matthieu Baerts <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Use nlmsg_free() instead of kfree_skb() in pm_netlink.c.
The SKB's have been created by nlmsg_new(). The proper cleaning way
should then be done with nlmsg_free().
For the moment, nlmsg_free() is simply calling kfree_skb() so we don't
change the behaviour here.
Suggested-by: Jakub Kicinski <[email protected]>
Reviewed-by: Matthieu Baerts <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: Mat Martineau <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Add support to count upall packets, when kmod of openvswitch
upcall to count the number of packets for upcall succeed and
failed, which is a better way to see how many packets upcalled
on every interfaces.
Signed-off-by: wangchuanlei <[email protected]>
Acked-by: Eelco Chaudron <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Expose the necessary tc classifier functions and wire up cls_api to use
direct calls in retpoline kernels.
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Jamal Hadi Salim <[email protected]>
Reviewed-by: Victor Nogueira <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Expose the necessary tc act functions and wire up act_api to use
direct calls in retpoline kernels.
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Jamal Hadi Salim <[email protected]>
Reviewed-by: Victor Nogueira <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
On kernels using retpoline as a spectrev2 mitigation,
optimize actions and filters that are compiled as built-ins into a direct call.
On subsequent patches we expose the classifiers and actions functions
and wire up the wrapper into tc.
Signed-off-by: Pedro Tammela <[email protected]>
Reviewed-by: Jamal Hadi Salim <[email protected]>
Reviewed-by: Victor Nogueira <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
There is a spelling mistake in a NL_SET_ERR_MSG message. Fix it.
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
|
|
Add an option to initialize SOF_TIMESTAMPING_OPT_ID for TCP from
write_seq sockets instead of snd_una.
This should have been the behavior from the start. Because processes
may now exist that rely on the established behavior, do not change
behavior of the existing option, but add the right behavior with a new
flag. It is encouraged to always set SOF_TIMESTAMPING_OPT_ID_TCP on
stream sockets along with the existing SOF_TIMESTAMPING_OPT_ID.
Intuitively the contract is that the counter is zero after the
setsockopt, so that the next write N results in a notification for
the last byte N - 1.
On idle sockets snd_una == write_seq and this holds for both. But on
sockets with data in transmission, snd_una records the unacked offset
in the stream. This depends on the ACK response from the peer. A
process cannot learn this in a race free manner (ioctl SIOCOUTQ is one
racy approach).
write_seq records the offset at the last byte written by the process.
This is a better starting point. It matches the intuitive contract in
all circumstances, unaffected by external behavior.
The new timestamp flag necessitates increasing sk_tsflags to 32 bits.
Move the field in struct sock to avoid growing the socket (for some
common CONFIG variants). The UAPI interface so_timestamping.flags is
already int, so 32 bits wide.
Reported-by: Sotirios Delimanolis <[email protected]>
Signed-off-by: Willem de Bruijn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Blamed commit claimed rcu_read_lock() was held by ip6_fragment() callers.
It seems to not be always true, at least for UDP stack.
syzbot reported:
BUG: KASAN: use-after-free in ip6_dst_idev include/net/ip6_fib.h:245 [inline]
BUG: KASAN: use-after-free in ip6_fragment+0x2724/0x2770 net/ipv6/ip6_output.c:951
Read of size 8 at addr ffff88801d403e80 by task syz-executor.3/7618
CPU: 1 PID: 7618 Comm: syz-executor.3 Not tainted 6.1.0-rc6-syzkaller-00012-g4312098baf37 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xd1/0x138 lib/dump_stack.c:106
print_address_description mm/kasan/report.c:284 [inline]
print_report+0x15e/0x45d mm/kasan/report.c:395
kasan_report+0xbf/0x1f0 mm/kasan/report.c:495
ip6_dst_idev include/net/ip6_fib.h:245 [inline]
ip6_fragment+0x2724/0x2770 net/ipv6/ip6_output.c:951
__ip6_finish_output net/ipv6/ip6_output.c:193 [inline]
ip6_finish_output+0x9a3/0x1170 net/ipv6/ip6_output.c:206
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227
dst_output include/net/dst.h:445 [inline]
ip6_local_out+0xb3/0x1a0 net/ipv6/output_core.c:161
ip6_send_skb+0xbb/0x340 net/ipv6/ip6_output.c:1966
udp_v6_send_skb+0x82a/0x18a0 net/ipv6/udp.c:1286
udp_v6_push_pending_frames+0x140/0x200 net/ipv6/udp.c:1313
udpv6_sendmsg+0x18da/0x2c80 net/ipv6/udp.c:1606
inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xd3/0x120 net/socket.c:734
sock_write_iter+0x295/0x3d0 net/socket.c:1108
call_write_iter include/linux/fs.h:2191 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x9ed/0xdd0 fs/read_write.c:584
ksys_write+0x1ec/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
RIP: 0033:0x7fde3588c0d9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fde365b6168 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 00007fde359ac050 RCX: 00007fde3588c0d9
RDX: 000000000000ffdc RSI: 00000000200000c0 RDI: 000000000000000a
RBP: 00007fde358e7ae9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007fde35acfb1f R14: 00007fde365b6300 R15: 0000000000022000
</TASK>
Allocated by task 7618:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
__kasan_slab_alloc+0x82/0x90 mm/kasan/common.c:325
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slab.h:737 [inline]
slab_alloc_node mm/slub.c:3398 [inline]
slab_alloc mm/slub.c:3406 [inline]
__kmem_cache_alloc_lru mm/slub.c:3413 [inline]
kmem_cache_alloc+0x2b4/0x3d0 mm/slub.c:3422
dst_alloc+0x14a/0x1f0 net/core/dst.c:92
ip6_dst_alloc+0x32/0xa0 net/ipv6/route.c:344
ip6_rt_pcpu_alloc net/ipv6/route.c:1369 [inline]
rt6_make_pcpu_route net/ipv6/route.c:1417 [inline]
ip6_pol_route+0x901/0x1190 net/ipv6/route.c:2254
pol_lookup_func include/net/ip6_fib.h:582 [inline]
fib6_rule_lookup+0x52e/0x6f0 net/ipv6/fib6_rules.c:121
ip6_route_output_flags_noref+0x2e6/0x380 net/ipv6/route.c:2625
ip6_route_output_flags+0x76/0x320 net/ipv6/route.c:2638
ip6_route_output include/net/ip6_route.h:98 [inline]
ip6_dst_lookup_tail+0x5ab/0x1620 net/ipv6/ip6_output.c:1092
ip6_dst_lookup_flow+0x90/0x1d0 net/ipv6/ip6_output.c:1222
ip6_sk_dst_lookup_flow+0x553/0x980 net/ipv6/ip6_output.c:1260
udpv6_sendmsg+0x151d/0x2c80 net/ipv6/udp.c:1554
inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xd3/0x120 net/socket.c:734
__sys_sendto+0x23a/0x340 net/socket.c:2117
__do_sys_sendto net/socket.c:2129 [inline]
__se_sys_sendto net/socket.c:2125 [inline]
__x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Freed by task 7599:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
kasan_set_track+0x25/0x30 mm/kasan/common.c:52
kasan_save_free_info+0x2e/0x40 mm/kasan/generic.c:511
____kasan_slab_free mm/kasan/common.c:236 [inline]
____kasan_slab_free+0x160/0x1c0 mm/kasan/common.c:200
kasan_slab_free include/linux/kasan.h:177 [inline]
slab_free_hook mm/slub.c:1724 [inline]
slab_free_freelist_hook+0x8b/0x1c0 mm/slub.c:1750
slab_free mm/slub.c:3661 [inline]
kmem_cache_free+0xee/0x5c0 mm/slub.c:3683
dst_destroy+0x2ea/0x400 net/core/dst.c:127
rcu_do_batch kernel/rcu/tree.c:2250 [inline]
rcu_core+0x81f/0x1980 kernel/rcu/tree.c:2510
__do_softirq+0x1fb/0xadc kernel/softirq.c:571
Last potentially related work creation:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
__kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:481
call_rcu+0x9d/0x820 kernel/rcu/tree.c:2798
dst_release net/core/dst.c:177 [inline]
dst_release+0x7d/0xe0 net/core/dst.c:167
refdst_drop include/net/dst.h:256 [inline]
skb_dst_drop include/net/dst.h:268 [inline]
skb_release_head_state+0x250/0x2a0 net/core/skbuff.c:838
skb_release_all net/core/skbuff.c:852 [inline]
__kfree_skb net/core/skbuff.c:868 [inline]
kfree_skb_reason+0x151/0x4b0 net/core/skbuff.c:891
kfree_skb_list_reason+0x4b/0x70 net/core/skbuff.c:901
kfree_skb_list include/linux/skbuff.h:1227 [inline]
ip6_fragment+0x2026/0x2770 net/ipv6/ip6_output.c:949
__ip6_finish_output net/ipv6/ip6_output.c:193 [inline]
ip6_finish_output+0x9a3/0x1170 net/ipv6/ip6_output.c:206
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227
dst_output include/net/dst.h:445 [inline]
ip6_local_out+0xb3/0x1a0 net/ipv6/output_core.c:161
ip6_send_skb+0xbb/0x340 net/ipv6/ip6_output.c:1966
udp_v6_send_skb+0x82a/0x18a0 net/ipv6/udp.c:1286
udp_v6_push_pending_frames+0x140/0x200 net/ipv6/udp.c:1313
udpv6_sendmsg+0x18da/0x2c80 net/ipv6/udp.c:1606
inet6_sendmsg+0x9d/0xe0 net/ipv6/af_inet6.c:665
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xd3/0x120 net/socket.c:734
sock_write_iter+0x295/0x3d0 net/socket.c:1108
call_write_iter include/linux/fs.h:2191 [inline]
new_sync_write fs/read_write.c:491 [inline]
vfs_write+0x9ed/0xdd0 fs/read_write.c:584
ksys_write+0x1ec/0x250 fs/read_write.c:637
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Second to last potentially related work creation:
kasan_save_stack+0x22/0x40 mm/kasan/common.c:45
__kasan_record_aux_stack+0xbc/0xd0 mm/kasan/generic.c:481
call_rcu+0x9d/0x820 kernel/rcu/tree.c:2798
dst_release net/core/dst.c:177 [inline]
dst_release+0x7d/0xe0 net/core/dst.c:167
refdst_drop include/net/dst.h:256 [inline]
skb_dst_drop include/net/dst.h:268 [inline]
__dev_queue_xmit+0x1b9d/0x3ba0 net/core/dev.c:4211
dev_queue_xmit include/linux/netdevice.h:3008 [inline]
neigh_resolve_output net/core/neighbour.c:1552 [inline]
neigh_resolve_output+0x51b/0x840 net/core/neighbour.c:1532
neigh_output include/net/neighbour.h:546 [inline]
ip6_finish_output2+0x56c/0x1530 net/ipv6/ip6_output.c:134
__ip6_finish_output net/ipv6/ip6_output.c:195 [inline]
ip6_finish_output+0x694/0x1170 net/ipv6/ip6_output.c:206
NF_HOOK_COND include/linux/netfilter.h:291 [inline]
ip6_output+0x1f1/0x540 net/ipv6/ip6_output.c:227
dst_output include/net/dst.h:445 [inline]
NF_HOOK include/linux/netfilter.h:302 [inline]
NF_HOOK include/linux/netfilter.h:296 [inline]
mld_sendpack+0xa09/0xe70 net/ipv6/mcast.c:1820
mld_send_cr net/ipv6/mcast.c:2121 [inline]
mld_ifc_work+0x720/0xdc0 net/ipv6/mcast.c:2653
process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
worker_thread+0x669/0x1090 kernel/workqueue.c:2436
kthread+0x2e8/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
The buggy address belongs to the object at ffff88801d403dc0
which belongs to the cache ip6_dst_cache of size 240
The buggy address is located 192 bytes inside of
240-byte region [ffff88801d403dc0, ffff88801d403eb0)
The buggy address belongs to the physical page:
page:ffffea00007500c0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1d403
memcg:ffff888022f49c81
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000200 ffffea0001ef6580 dead000000000002 ffff88814addf640
raw: 0000000000000000 00000000800c000c 00000001ffffffff ffff888022f49c81
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x112a20(GFP_ATOMIC|__GFP_NOWARN|__GFP_NORETRY|__GFP_HARDWALL), pid 3719, tgid 3719 (kworker/0:6), ts 136223432244, free_ts 136222971441
prep_new_page mm/page_alloc.c:2539 [inline]
get_page_from_freelist+0x10b5/0x2d50 mm/page_alloc.c:4288
__alloc_pages+0x1cb/0x5b0 mm/page_alloc.c:5555
alloc_pages+0x1aa/0x270 mm/mempolicy.c:2285
alloc_slab_page mm/slub.c:1794 [inline]
allocate_slab+0x213/0x300 mm/slub.c:1939
new_slab mm/slub.c:1992 [inline]
___slab_alloc+0xa91/0x1400 mm/slub.c:3180
__slab_alloc.constprop.0+0x56/0xa0 mm/slub.c:3279
slab_alloc_node mm/slub.c:3364 [inline]
slab_alloc mm/slub.c:3406 [inline]
__kmem_cache_alloc_lru mm/slub.c:3413 [inline]
kmem_cache_alloc+0x31a/0x3d0 mm/slub.c:3422
dst_alloc+0x14a/0x1f0 net/core/dst.c:92
ip6_dst_alloc+0x32/0xa0 net/ipv6/route.c:344
icmp6_dst_alloc+0x71/0x680 net/ipv6/route.c:3261
mld_sendpack+0x5de/0xe70 net/ipv6/mcast.c:1809
mld_send_cr net/ipv6/mcast.c:2121 [inline]
mld_ifc_work+0x720/0xdc0 net/ipv6/mcast.c:2653
process_one_work+0x9bf/0x1710 kernel/workqueue.c:2289
worker_thread+0x669/0x1090 kernel/workqueue.c:2436
kthread+0x2e8/0x3a0 kernel/kthread.c:376
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:306
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1459 [inline]
free_pcp_prepare+0x65c/0xd90 mm/page_alloc.c:1509
free_unref_page_prepare mm/page_alloc.c:3387 [inline]
free_unref_page+0x1d/0x4d0 mm/page_alloc.c:3483
__unfreeze_partials+0x17c/0x1a0 mm/slub.c:2586
qlink_free mm/kasan/quarantine.c:168 [inline]
qlist_free_all+0x6a/0x170 mm/kasan/quarantine.c:187
kasan_quarantine_reduce+0x184/0x210 mm/kasan/quarantine.c:294
__kasan_slab_alloc+0x66/0x90 mm/kasan/common.c:302
kasan_slab_alloc include/linux/kasan.h:201 [inline]
slab_post_alloc_hook mm/slab.h:737 [inline]
slab_alloc_node mm/slub.c:3398 [inline]
kmem_cache_alloc_node+0x304/0x410 mm/slub.c:3443
__alloc_skb+0x214/0x300 net/core/skbuff.c:497
alloc_skb include/linux/skbuff.h:1267 [inline]
netlink_alloc_large_skb net/netlink/af_netlink.c:1191 [inline]
netlink_sendmsg+0x9a6/0xe10 net/netlink/af_netlink.c:1896
sock_sendmsg_nosec net/socket.c:714 [inline]
sock_sendmsg+0xd3/0x120 net/socket.c:734
__sys_sendto+0x23a/0x340 net/socket.c:2117
__do_sys_sendto net/socket.c:2129 [inline]
__se_sys_sendto net/socket.c:2125 [inline]
__x64_sys_sendto+0xe1/0x1b0 net/socket.c:2125
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 1758fd4688eb ("ipv6: remove unnecessary dst_hold() in ip6_fragment()")
Reported-by: [email protected]
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Wei Wang <[email protected]>
Cc: Martin KaFai Lau <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Expose port function commands to enable / disable migratable
capability, this is used to set the port function as migratable.
Live migration is the process of transferring a live virtual machine
from one physical host to another without disrupting its normal
operation.
In order for a VM to be able to perform LM, all the VM components must
be able to perform migration. e.g.: to be migratable.
In order for VF to be migratable, VF must be bound to VFIO driver with
migration support.
When migratable capability is enabled for a function of the port, the
device is making the necessary preparations for the function to be
migratable, which might include disabling features which cannot be
migrated.
Example of LM with migratable function configuration:
Set migratable of the VF's port function.
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
function:
hw_addr 00:00:00:00:00:00 migratable disable
$ devlink port function set pci/0000:06:00.0/2 migratable enable
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
function:
hw_addr 00:00:00:00:00:00 migratable enable
Bind VF to VFIO driver with migration support:
$ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/unbind
$ echo mlx5_vfio_pci > /sys/bus/pci/devices/0000:08:00.0/driver_override
$ echo <pci_id> > /sys/bus/pci/devices/0000:08:00.0/driver/bind
Attach VF to the VM.
Start the VM.
Perform LM.
Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Acked-by: Shannon Nelson <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Expose port function commands to enable / disable RoCE, this is used to
control the port RoCE device capabilities.
When RoCE is disabled for a function of the port, function cannot create
any RoCE specific resources (e.g GID table).
It also saves system memory utilization. For example disabling RoCE enable a
VF/SF saves 1 Mbytes of system memory per function.
Example of a PCI VF port which supports function configuration:
Set RoCE of the VF's port function.
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
function:
hw_addr 00:00:00:00:00:00 roce enable
$ devlink port function set pci/0000:06:00.0/2 roce disable
$ devlink port show pci/0000:06:00.0/2
pci/0000:06:00.0/2: type eth netdev enp6s0pf0vf1 flavour pcivf pfnum 0
vfnum 1
function:
hw_addr 00:00:00:00:00:00 roce disable
Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
In order to avoid partial request processing, validate the request
before processing it.
Signed-off-by: Shay Drory <[email protected]>
Reviewed-by: Jiri Pirko <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The 'group' argument is not modified, so mark it as 'const'. It will
allow us to constify arguments of the callers of this function in future
patches.
Signed-off-by: Ido Schimmel <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Drop the first three arguments and instead extract them from the MDB
configuration structure.
Signed-off-by: Ido Schimmel <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The checks only require information parsed from the RTM_NEWMDB netlink
message and do not rely on any state stored in the bridge driver.
Therefore, there is no need to perform the checks in the critical
section under the multicast lock.
Move the checks out of the critical section.
Signed-off-by: Ido Schimmel <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The parsing of the netlink messages and the validity checks are now
performed in br_mdb_config_init() so we can remove br_mdb_parse().
This finally allows us to stop passing netlink attributes deep in the
MDB control path and only use the MDB configuration structure.
Signed-off-by: Ido Schimmel <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The MDB group key (i.e., {source, destination, protocol, VID}) is
currently determined under the multicast lock from the netlink
attributes. Instead, use the group key from the MDB configuration
structure that was prepared before acquiring the lock.
No functional changes intended.
Signed-off-by: Ido Schimmel <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
As an intermediate step towards only using the new MDB configuration
structure, pass it further in the control path instead of passing
individual attributes.
No functional changes intended.
Signed-off-by: Ido Schimmel <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The MDB configuration structure (i.e., struct br_mdb_config) now
includes all the necessary information from the parsed RTM_{NEW,DEL}MDB
netlink messages, so use it.
This will later allow us to delete the calls to br_mdb_parse() from
br_mdb_add() and br_mdb_del().
No functional changes intended.
Signed-off-by: Ido Schimmel <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
These checks are now redundant as they are performed by
br_mdb_config_init() while parsing the RTM_{NEW,DEL}MDB messages.
Remove them.
Signed-off-by: Ido Schimmel <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Netlink attributes are currently passed deep in the MDB creation call
chain, making it difficult to add new attributes. In addition, some
validity checks are performed under the multicast lock although they can
be performed before it is ever acquired.
As a first step towards solving these issues, parse the RTM_{NEW,DEL}MDB
messages into a configuration structure, relieving other functions from
the need to handle raw netlink attributes.
Subsequent patches will convert the MDB code to use this configuration
structure.
This is consistent with how other rtnetlink objects are handled, such as
routes and nexthops.
Signed-off-by: Ido Schimmel <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the advice of the Documentation/filesystems/sysfs.rst and show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the
value to be returned to user space.
Signed-off-by: ye xingchen <[email protected]>
Reviewed-by: Leon Romanovsky <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2022-12-07
The 1st patch is by Oliver Hartkopp and fixes a potential NULL pointer
deref found by syzbot in the AF_CAN protocol.
The next 2 patches are by Jiri Slaby and Max Staudt and add the
missing flush_work() before freeing the underlying memory in the slcan
and can327 driver.
The last patch is by Frank Jungclaus and target the esd_usb driver and
fixes the CAN error counters, allowing them to return to zero.
* tag 'linux-can-fixes-for-6.1-20221207' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: esd_usb: Allow REC and TEC to return to zero
can: can327: flush TX_work on ldisc .close()
can: slcan: fix freed work crash
can: af_can: fix NULL pointer dereference in can_rcv_filter
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next
Stefan Schmidt says:
====================
ieee802154-next 2022-12-05
Miquel continued his work towards full scanning support. For this,
we now allow the creation of dedicated coordinator interfaces
to allow a PAN coordinator to serve in the network and set
the needed address filters with the hardware.
On top of this we have the first part to allow scanning for available
15.4 networks. A new netlink scan group, within the existing nl802154
API, was added.
In addition Miquel fixed two issues that have been introduced in the former
patches to free an skb correctly and clarifying an expression in the stack.
From David Girault we got tracing support when registering new PANs.
* tag 'ieee802154-for-net-next-2022-12-05' of git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan-next:
mac802154: Trace the registration of new PANs
ieee802154: Advertize coordinators discovery
mac802154: Allow the creation of coordinator interfaces
mac802154: Clarify an expression
mac802154: Move an skb free within the rx path
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Merge commit 5b481acab4ce ("bpf: do not rely on ALLOW_ERROR_INJECTION for fmod_ret")
from hid tree into bpf-next.
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The current way of expressing that a non-bpf kernel component is willing
to accept that bpf programs can be attached to it and that they can change
the return value is to abuse ALLOW_ERROR_INJECTION.
This is debated in the link below, and the result is that it is not a
reasonable thing to do.
Reuse the kfunc declaration structure to also tag the kernel functions
we want to be fmodret. This way we can control from any subsystem which
functions are being modified by bpf without touching the verifier.
Link: https://lore.kernel.org/all/[email protected]/
Suggested-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Benjamin Tissoires <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/sschmidt/wpan
Stefan Schmidt says:
====================
pull-request: ieee802154 for net 2022-12-05
An update from ieee802154 for your *net* tree:
Three small fixes this time around.
Ziyang Xuan fixed an error code for a timeout during initialization of the
cc2520 driver.
Hauke Mehrtens fixed a crash in the ca8210 driver SPI communication due
uninitialized SPI structures.
Wei Yongjun added INIT_LIST_HEAD ieee802154_if_add() to avoid a potential
null pointer dereference.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
When sending packets between nodes in netns, it calls tipc_lxc_xmit() for
peer node to receive the packets where tipc_sk_mcast_rcv()/tipc_sk_rcv()
might be called, and it's pretty much like in tipc_rcv().
Currently the local 'node rw lock' is held during calling tipc_lxc_xmit()
to protect the peer_net not being freed by another thread. However, when
receiving these packets, tipc_node_add_conn() might be called where the
peer 'node rw lock' is acquired. Then a dead lock warning is triggered by
lockdep detector, although it is not a real dead lock:
WARNING: possible recursive locking detected
--------------------------------------------
conn_server/1086 is trying to acquire lock:
ffff8880065cb020 (&n->lock#2){++--}-{2:2}, \
at: tipc_node_add_conn.cold.76+0xaa/0x211 [tipc]
but task is already holding lock:
ffff8880065cd020 (&n->lock#2){++--}-{2:2}, \
at: tipc_node_xmit+0x285/0xb30 [tipc]
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&n->lock#2);
lock(&n->lock#2);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by conn_server/1086:
#0: ffff8880036d1e40 (sk_lock-AF_TIPC){+.+.}-{0:0}, \
at: tipc_accept+0x9c0/0x10b0 [tipc]
#1: ffff8880036d5f80 (sk_lock-AF_TIPC/1){+.+.}-{0:0}, \
at: tipc_accept+0x363/0x10b0 [tipc]
#2: ffff8880065cd020 (&n->lock#2){++--}-{2:2}, \
at: tipc_node_xmit+0x285/0xb30 [tipc]
#3: ffff888012e13370 (slock-AF_TIPC){+...}-{2:2}, \
at: tipc_sk_rcv+0x2da/0x1b40 [tipc]
Call Trace:
<TASK>
dump_stack_lvl+0x44/0x5b
__lock_acquire.cold.77+0x1f2/0x3d7
lock_acquire+0x1d2/0x610
_raw_write_lock_bh+0x38/0x80
tipc_node_add_conn.cold.76+0xaa/0x211 [tipc]
tipc_sk_finish_conn+0x21e/0x640 [tipc]
tipc_sk_filter_rcv+0x147b/0x3030 [tipc]
tipc_sk_rcv+0xbb4/0x1b40 [tipc]
tipc_lxc_xmit+0x225/0x26b [tipc]
tipc_node_xmit.cold.82+0x4a/0x102 [tipc]
__tipc_sendstream+0x879/0xff0 [tipc]
tipc_accept+0x966/0x10b0 [tipc]
do_accept+0x37d/0x590
This patch avoids this warning by not holding the 'node rw lock' before
calling tipc_lxc_xmit(). As to protect the 'peer_net', rcu_read_lock()
should be enough, as in cleanup_net() when freeing the netns, it calls
synchronize_rcu() before the free is continued.
Also since tipc_lxc_xmit() is like the RX path in tipc_rcv(), it makes
sense to call it under rcu_read_lock(). Note that the right lock order
must be:
rcu_read_lock();
tipc_node_read_lock(n);
tipc_node_read_unlock(n);
tipc_lxc_xmit();
rcu_read_unlock();
instead of:
tipc_node_read_lock(n);
rcu_read_lock();
tipc_node_read_unlock(n);
tipc_lxc_xmit();
rcu_read_unlock();
and we have to call tipc_node_read_lock/unlock() twice in
tipc_node_xmit().
Fixes: f73b12812a3d ("tipc: improve throughput between nodes in netns")
Reported-by: Shuang Li <[email protected]>
Signed-off-by: Xin Long <[email protected]>
Link: https://lore.kernel.org/r/5bdd1f8fee9db695cfff4528a48c9b9d0523fb00.1670110641.git.lucien.xin@gmail.com
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Analogue to commit 8aa59e355949 ("can: af_can: fix NULL pointer
dereference in can_rx_register()") we need to check for a missing
initialization of ml_priv in the receive path of CAN frames.
Since commit 4e096a18867a ("net: introduce CAN specific pointer in the
struct net_device") the check for dev->type to be ARPHRD_CAN is not
sufficient anymore since bonding or tun netdevices claim to be CAN
devices but do not initialize ml_priv accordingly.
Fixes: 4e096a18867a ("net: introduce CAN specific pointer in the struct net_device")
Reported-by: [email protected]
Reported-by: Wei Chen <[email protected]>
Signed-off-by: Oliver Hartkopp <[email protected]>
Link: https://lore.kernel.org/all/[email protected]
Cc: [email protected]
Signed-off-by: Marc Kleine-Budde <[email protected]>
|
|
Cited commit added the table ID to the FIB info structure, but did not
properly initialize it when table ID 0 is used. This can lead to a route
in the default VRF with a preferred source address not being flushed
when the address is deleted.
Consider the following example:
# ip address add dev dummy1 192.0.2.1/28
# ip address add dev dummy1 192.0.2.17/28
# ip route add 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 100
# ip route add table 0 198.51.100.0/24 via 192.0.2.2 src 192.0.2.17 metric 200
# ip route show 198.51.100.0/24
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 100
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200
Both routes are installed in the default VRF, but they are using two
different FIB info structures. One with a metric of 100 and table ID of
254 (main) and one with a metric of 200 and table ID of 0. Therefore,
when the preferred source address is deleted from the default VRF,
the second route is not flushed:
# ip address del dev dummy1 192.0.2.17/28
# ip route show 198.51.100.0/24
198.51.100.0/24 via 192.0.2.2 dev dummy1 src 192.0.2.17 metric 200
Fix by storing a table ID of 254 instead of 0 in the route configuration
structure.
Add a test case that fails before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Table ID 0
TEST: Route removed in default VRF when source address deleted [FAIL]
Tests passed: 8
Tests failed: 1
And passes after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Table ID 0
TEST: Route removed in default VRF when source address deleted [ OK ]
Tests passed: 9
Tests failed: 0
Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs")
Reported-by: Donald Sharp <[email protected]>
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Cited commit added the table ID to the FIB info structure, but did not
prevent structures with different table IDs from being consolidated.
This can lead to routes being flushed from a VRF when an address is
deleted from a different VRF.
Fix by taking the table ID into account when looking for a matching FIB
info. This is already done for FIB info structures backed by a nexthop
object in fib_find_info_nh().
Add test cases that fail before the fix:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [FAIL]
TEST: Route in default VRF not removed [ OK ]
RTNETLINK answers: File exists
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [FAIL]
Tests passed: 6
Tests failed: 2
And pass after:
# ./fib_tests.sh -t ipv4_del_addr
IPv4 delete address route tests
Regular FIB info
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Identical FIB info with different table ID
TEST: Route removed from VRF when source address deleted [ OK ]
TEST: Route in default VRF not removed [ OK ]
TEST: Route removed in default VRF when source address deleted [ OK ]
TEST: Route in VRF is not removed by address delete [ OK ]
Tests passed: 8
Tests failed: 0
Fixes: 5a56a0b3a45d ("net: Don't delete routes in different VRFs")
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
The memcpy() in ncsi_cmd_handler_oem deserializes nca->data into a
flexible array structure that overlapping with non-flex-array members
(mfr_id) intentionally. Since the mem_to_flex() API is not finished,
temporarily silence this warning, since it is a false positive, using
unsafe_memcpy().
Reported-by: Joel Stanley <[email protected]>
Link: https://lore.kernel.org/netdev/CACPK8Xdfi=OJKP0x0D1w87fQeFZ4A2DP2qzGCRcuVbpU-9=4sQ@mail.gmail.com/
Cc: Samuel Mendoza-Jonas <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|