Age | Commit message (Collapse) | Author | Files | Lines |
|
Let 'bpf_flow_dissect' callers know the BPF program's retcode and act
accordingly.
Signed-off-by: Shmulik Ladkani <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Reviewed-by: Stanislav Fomichev <[email protected]>
Acked-by: John Fastabend <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
TPROXY is only allowed from prerouting, but nft_tproxy doesn't check this.
This fixes a crash (null dereference) when using tproxy from e.g. output.
Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support")
Reported-by: Shell Chen <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
|
|
When a TCP sends more bytes than allowed by the receive window, all future
packets can be marked as invalid.
This can clog up the conntrack table because of 5-day default timeout.
Sequence of packets:
01 initiator > responder: [S], seq 171, win 5840, options [mss 1330,sackOK,TS val 63 ecr 0,nop,wscale 1]
02 responder > initiator: [S.], seq 33211, ack 172, win 65535, options [mss 1460,sackOK,TS val 010 ecr 63,nop,wscale 8]
03 initiator > responder: [.], ack 33212, win 2920, options [nop,nop,TS val 068 ecr 010], length 0
04 initiator > responder: [P.], seq 172:240, ack 33212, win 2920, options [nop,nop,TS val 279 ecr 010], length 68
Window is 5840 starting from 33212 -> 39052.
05 responder > initiator: [.], ack 240, win 256, options [nop,nop,TS val 872 ecr 279], length 0
06 responder > initiator: [.], seq 33212:34530, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318
This is fine, conntrack will flag the connection as having outstanding
data (UNACKED), which lowers the conntrack timeout to 300s.
07 responder > initiator: [.], seq 34530:35848, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318
08 responder > initiator: [.], seq 35848:37166, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318
09 responder > initiator: [.], seq 37166:38484, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318
10 responder > initiator: [.], seq 38484:39802, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 1318
Packet 10 is already sending more than permitted, but conntrack doesn't
validate this (only seq is tested vs. maxend, not 'seq+len').
38484 is acceptable, but only up to 39052, so this packet should
not have been sent (or only 568 bytes, not 1318).
At this point, connection is still in '300s' mode.
Next packet however will get flagged:
11 responder > initiator: [P.], seq 39802:40128, ack 240, win 256, options [nop,nop,TS val 892 ecr 279], length 326
nf_ct_proto_6: SEQ is over the upper bound (over the window of the receiver) .. LEN=378 .. SEQ=39802 ACK=240 ACK PSH ..
Now, a couple of replies/acks comes in:
12 initiator > responder: [.], ack 34530, win 4368,
[.. irrelevant acks removed ]
16 initiator > responder: [.], ack 39802, win 8712, options [nop,nop,TS val 296201291 ecr 2982371892], length 0
This ack is significant -- this acks the last packet send by the
responder that conntrack considered valid.
This means that ack == td_end. This will withdraw the
'unacked data' flag, the connection moves back to the 5-day timeout
of established conntracks.
17 initiator > responder: ack 40128, win 10030, ...
This packet is also flagged as invalid.
Because conntrack only updates state based on packets that are
considered valid, packet 11 'did not exist' and that gets us:
nf_ct_proto_6: ACK is over upper bound 39803 (ACKed data not seen yet) .. SEQ=240 ACK=40128 WINDOW=10030 RES=0x00 ACK URG
Because this received and processed by the endpoints, the conntrack entry
remains in a bad state, no packets will ever be considered valid again:
30 responder > initiator: [F.], seq 40432, ack 2045, win 391, ..
31 initiator > responder: [.], ack 40433, win 11348, ..
32 initiator > responder: [F.], seq 2045, ack 40433, win 11348 ..
... all trigger 'ACK is over bound' test and we end up with
non-early-evictable 5-day default timeout.
NB: This patch triggers a bunch of checkpatch warnings because of silly
indent. I will resend the cleanup series linked below to reduce the
indent level once this change has propagated to net-next.
I could route the cleanup via nf but that causes extra backport work for
stable maintainers.
Link: https://lore.kernel.org/netfilter-devel/[email protected]/T/#mb1d7147d36294573cc4f81d00f9f8dadfdd06cd8
Signed-off-by: Florian Westphal <[email protected]>
|
|
Harshit Mogalapalli says:
In ebt_do_table() function dereferencing 'private->hook_entry[hook]'
can lead to NULL pointer dereference. [..] Kernel panic:
general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f]
[..]
RIP: 0010:ebt_do_table+0x1dc/0x1ce0
Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 5c 16 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 6c df 08 48 8d 7d 2c 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 88
[..]
Call Trace:
nf_hook_slow+0xb1/0x170
__br_forward+0x289/0x730
maybe_deliver+0x24b/0x380
br_flood+0xc6/0x390
br_dev_xmit+0xa2e/0x12c0
For some reason ebtables rejects blobs that provide entry points that are
not supported by the table, but what it should instead reject is the
opposite: blobs that DO NOT provide an entry point supported by the table.
t->valid_hooks is the bitmask of hooks (input, forward ...) that will see
packets. Providing an entry point that is not support is harmless
(never called/used), but the inverse isn't: it results in a crash
because the ebtables traverser doesn't expect a NULL blob for a location
its receiving packets for.
Instead of fixing all the individual checks, do what iptables is doing and
reject all blobs that differ from the expected hooks.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Harshit Mogalapalli <[email protected]>
Reported-by: syzkaller <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
|
|
When a driver returns -EOPNOTSUPP in dsa_port_bridge_join() but failed
to provide a reason for it, DSA attempts to set the extack to say that
software fallback will kick in.
The problem is, when we use brctl and the legacy bridge ioctls, the
extack will be NULL, and DSA dereferences it in the process of setting
it.
Sergei Antonov proves this using the following stack trace:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
PC is at dsa_slave_changeupper+0x5c/0x158
dsa_slave_changeupper from raw_notifier_call_chain+0x38/0x6c
raw_notifier_call_chain from __netdev_upper_dev_link+0x198/0x3b4
__netdev_upper_dev_link from netdev_master_upper_dev_link+0x50/0x78
netdev_master_upper_dev_link from br_add_if+0x430/0x7f4
br_add_if from br_ioctl_stub+0x170/0x530
br_ioctl_stub from br_ioctl_call+0x54/0x7c
br_ioctl_call from dev_ifsioc+0x4e0/0x6bc
dev_ifsioc from dev_ioctl+0x2f8/0x758
dev_ioctl from sock_ioctl+0x5f0/0x674
sock_ioctl from sys_ioctl+0x518/0xe40
sys_ioctl from ret_fast_syscall+0x0/0x1c
Fix the problem by only overriding the extack if non-NULL.
Fixes: 1c6e8088d9a7 ("net: dsa: allow port_bridge_join() to override extack message")
Link: https://lore.kernel.org/netdev/CABikg9wx7vB5eRDAYtvAm7fprJ09Ta27a4ZazC=NX5K4wn6pWA@mail.gmail.com/
Reported-by: Sergei Antonov <[email protected]>
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Tested-by: Sergei Antonov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
More logic will be added to dsa_tree_setup_master() and
dsa_tree_teardown_master() in upcoming changes.
Reduce the indentation by one level in these functions by introducing
and using a dedicated iterator for CPU ports of a tree.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The fact that the tagging protocol is set and queried from the
/sys/class/net/<dsa-master>/dsa/tagging file is a bit of a quirk from
the single CPU port days which isn't aging very well now that DSA can
have more than a single CPU port. This is because the tagging protocol
is a switch property, yet in the presence of multiple CPU ports it can
be queried and set from multiple sysfs files, all of which are handled
by the same implementation.
The current logic ensures that the net device whose sysfs file we're
changing the tagging protocol through must be down. That net device is
the DSA master, and this is fine for single DSA master / CPU port setups.
But exactly because the tagging protocol is per switch [ tree, in fact ]
and not per DSA master, this isn't fine any longer with multiple CPU
ports, and we must iterate through the tree and find all DSA masters,
and make sure that all of them are down.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
This is an adaptation of commit c0a8a9c27493 ("net: dsa: automatically
bring user ports down when master goes down") for multiple DSA masters.
When a DSA master goes down, only the user ports under its control
should go down too, the others can still send/receive traffic.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
All the traffic to/from a DSA master is supposed to be distributed among
its DSA switch upper interfaces, so we should not allow other upper
device kinds.
An exception to this is DSA_TAG_PROTO_NONE (switches with no DSA tags),
and in that case it is actually expected to create e.g. VLAN interfaces
on the master. But for those, netdev_uses_dsa(master) returns false, so
the restriction doesn't apply.
The motivation for this change is to allow LAG interfaces of DSA masters
to be DSA masters themselves. We want to restrict the user's degrees of
freedom by 1: the LAG should already have all DSA masters as lowers, and
while lower ports of the LAG can be removed, none can be added after the
fact.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
When DSA gains support for multiple CPU ports in a LAG, it will become
mandatory to monitor the changeupper events for the DSA master.
In fact, there are already some restrictions to be imposed in that area,
namely that a DSA master cannot be a bridge port except in some special
circumstances.
Centralize the restrictions at the level of the DSA layer as a
preliminary step.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
dsa_slave_prechangeupper_sanity_check() is supposed to enforce some
adjacency restrictions, and calls ds->ops->port_prechangeupper if the
driver implements it.
We convert the error code from the port_prechangeupper() call to a
notifier code, and 0 is converted to NOTIFY_OK, but the caller of
dsa_slave_prechangeupper_sanity_check() stops at any notifier code
different from NOTIFY_DONE.
Avoid this by converting back the notifier code to an error code, so
that both NOTIFY_OK and NOTIFY_DONE will be seen as 0. This allows more
parallel sanity check functions to be added.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Traditionally, DSA has had a single netdev notifier handling function
for each device type.
For the sake of code cleanliness, we would like to introduce more
handling functions which do one thing, but the conditions for entering
these functions start to overlap. Example: a handling function which
tracks whether any bridges contain both DSA and non-DSA interfaces.
Either this is placed before dsa_slave_changeupper(), case in which it
will prevent that function from executing, or we place it after
dsa_slave_changeupper(), case in which we will prevent it from
executing. The other alternative is to ignore errors from the new
handling function (not ideal).
To support this usage, we need to change the pattern. In the new model,
we enter all notifier handling sub-functions, and exit with NOTIFY_DONE
if there is nothing to do. This allows the sub-functions to be
relatively free-form and independent from each other.
Signed-off-by: Vladimir Oltean <[email protected]>
Reviewed-by: Florian Fainelli <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
This adds extra condition to wake up data reader: do it only when number
of readable bytes >= SO_RCVLOWAT. Otherwise, there is no sense to kick
user, because it will wait until SO_RCVLOWAT bytes will be dequeued. This
check is performed in vsock_data_ready().
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Vishnu Dasa <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
This adds extra condition to wake up data reader: do it only when number
of readable bytes >= SO_RCVLOWAT. Otherwise, there is no sense to kick
user,because it will wait until SO_RCVLOWAT bytes will be dequeued. This
check is performed in vsock_data_ready().
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
This adds 'vsock_data_ready()' which must be called by transport to kick
sleeping data readers. It checks for SO_RCVLOWAT value before waking
user, thus preventing spurious wake ups. Based on 'tcp_data_ready()' logic.
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Passing 1 as the target to notify_poll_in(), we don't honor
what the user has set via SO_RCVLOWAT, going to set POLLIN
and POLLRDNORM, even if we don't have the amount of bytes
expected by the user.
Let's use sock_rcvlowat() to get the right target to pass to
notify_poll_in();
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
This callback controls setting of POLLIN, POLLRDNORM output bits of poll()
syscall, but in some cases, it is incorrectly to set it, when socket has
at least 1 bytes of available data. Use 'target' which is already exists.
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Reviewed-by: Vishnu Dasa <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
This callback controls setting of POLLIN, POLLRDNORM output bits of poll()
syscall, but in some cases, it is incorrectly to set it, when socket has
at least 1 bytes of available data. Use 'target' which is already exists.
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
For Hyper-V it is quiet difficult to support this socket option,due to
transport internals, so disable it.
Signed-off-by: Arseniy Krasnov <[email protected]>
Reviewed-by: Dexuan Cui <[email protected]>
Reviewed-by: Stefano Garzarella <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
This adds transport specific callback for SO_RCVLOWAT, because in some
transports it may be difficult to know current available number of bytes
ready to read. Thus, when SO_RCVLOWAT is set, transport may reject it.
Signed-off-by: Arseniy Krasnov <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
In rtnetlink_rcv_msg function, the permission for all user operations
is checked except the GET operation, which is the same as the checking
in qdisc. Therefore, remove the user rights check in qdisc.
Signed-off-by: Zhengchao Shao <[email protected]>
Message-Id: <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Drop unused argument from xfrm_policy_match,
__xfrm_policy_eval_candidates and xfrm_policy_eval_candidates.
No functional changes intended.
Signed-off-by: Hongbin Wang <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
|
|
Since commit 129bd7ca8ac0 ("net: dsa: Prevent usage of NET_DSA_TAG_8021Q
as tagging protocol"), dsa_8021q_netdev_ops no longer exists, so remove
the comment that talks about it.
Signed-off-by: Vladimir Oltean <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Reviewed-by: Andrew Lunn <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Early DSA drivers were kind of simplistic in that they assumed a fairly
narrow hardware layout. User ports would have integrated PHYs at an
internal MDIO address that is derivable from the port number, and shared
(DSA and CPU) ports would have an MII-style (serial or parallel)
connection to another MAC. Phylib and then phylink were used to drive
the internal PHYs, and this needed little to no description through the
platform data structures. Bringing up the shared ports at the maximum
supported link speed was the responsibility of the drivers.
As a result of this, when these early drivers were converted from
platform data to the new DSA OF bindings, there was no link information
translated into the first DT bindings.
https://lore.kernel.org/all/[email protected]/
Later, phylink was adopted for shared ports as well, and today we have a
workaround in place, introduced by commit a20f997010c4 ("net: dsa: Don't
instantiate phylink for CPU/DSA ports unless needed"). There, DSA checks
for the presence of phy-handle/fixed-link/managed OF properties, and if
missing, phylink registration would be skipped. This is because phylink
is optional for some drivers (the shared ports already work without it),
but the process of starting to register a port with phylink is
irreversible: if phylink_create() fails to find the fwnode properties it
needs, it bails out and it leaves the ports inoperational (because
phylink expects ports to be initially down, so DSA necessarily takes
them down, and doesn't know how to put them back up again).
DSA being a common framework, new drivers opt into this workaround
willy-nilly, but the ideal behavior from the DSA core's side would have
been to not interfere with phylink's process of failing at all. This
isn't possible because of regression concerns with pre-phylink DT blobs,
but at least DSA should put a stop to the proliferation of more of such
cases that rely on the workaround to skip phylink registration, and
sanitize the environment that new drivers work in.
To that end, create a list of compatible strings for which the
workaround is preserved, and don't apply the workaround for any drivers
outside that list (this includes new drivers).
In some cases, we make the assumption that even existing drivers don't
rely on DSA's workaround, and we do this by looking at the device trees
in which they appear. We can't fully know what is the situation with
downstream DT blobs, but we can guess the overall trend by studying the
DT blobs that were submitted upstream. If there are upstream blobs that
have lacking descriptions, we take it as very likely that there are many
more downstream blobs that do so too. If all upstream blobs have
complete descriptions, we take that as a hint that the driver is a
candidate for enforcing strict DT bindings (considering that most
bindings are copy-pasted). If there are no upstream DT blobs, we take
the conservative route of allowing the workaround, unless the driver
maintainer instructs us otherwise.
The driver situation is as follows:
ar9331
~~~~~~
compatible strings:
- qca,ar9331-switch
1 occurrence in mainline device trees, part of SoC dtsi
(arch/mips/boot/dts/qca/ar9331.dtsi), description is not problematic.
Verdict: opt into strict DT bindings and out of workarounds.
b53
~~~
compatible strings:
- brcm,bcm5325
- brcm,bcm53115
- brcm,bcm53125
- brcm,bcm53128
- brcm,bcm5365
- brcm,bcm5389
- brcm,bcm5395
- brcm,bcm5397
- brcm,bcm5398
- brcm,bcm53010-srab
- brcm,bcm53011-srab
- brcm,bcm53012-srab
- brcm,bcm53018-srab
- brcm,bcm53019-srab
- brcm,bcm5301x-srab
- brcm,bcm11360-srab
- brcm,bcm58522-srab
- brcm,bcm58525-srab
- brcm,bcm58535-srab
- brcm,bcm58622-srab
- brcm,bcm58623-srab
- brcm,bcm58625-srab
- brcm,bcm88312-srab
- brcm,cygnus-srab
- brcm,nsp-srab
- brcm,omega-srab
- brcm,bcm3384-switch
- brcm,bcm6328-switch
- brcm,bcm6368-switch
- brcm,bcm63xx-switch
I've found at least these mainline DT blobs with problems:
arch/arm/boot/dts/bcm47094-linksys-panamera.dts
- lacks phy-mode
arch/arm/boot/dts/bcm47189-tenda-ac9.dts
- lacks phy-mode and fixed-link
arch/arm/boot/dts/bcm47081-luxul-xap-1410.dts
arch/arm/boot/dts/bcm47081-luxul-xwr-1200.dts
arch/arm/boot/dts/bcm47081-buffalo-wzr-600dhp2.dts
- lacks phy-mode and fixed-link
arch/arm/boot/dts/bcm47094-luxul-xbr-4500.dts
arch/arm/boot/dts/bcm4708-smartrg-sr400ac.dts
arch/arm/boot/dts/bcm4708-luxul-xap-1510.dts
arch/arm/boot/dts/bcm953012er.dts
arch/arm/boot/dts/bcm4708-netgear-r6250.dts
arch/arm/boot/dts/bcm4708-buffalo-wzr-1166dhp-common.dtsi
arch/arm/boot/dts/bcm4708-luxul-xwc-1000.dts
arch/arm/boot/dts/bcm47094-luxul-abr-4500.dts
- lacks phy-mode and fixed-link
arch/arm/boot/dts/bcm53016-meraki-mr32.dts
- lacks phy-mode
Verdict: opt into DSA workarounds.
bcm_sf2
~~~~~~~
compatible strings:
- brcm,bcm4908-switch
- brcm,bcm7445-switch-v4.0
- brcm,bcm7278-switch-v4.0
- brcm,bcm7278-switch-v4.8
A single occurrence in mainline
(arch/arm64/boot/dts/broadcom/bcm4908/bcm4908.dtsi), part of a SoC
dtsi, valid description. Florian Fainelli explains that most of the
bcm_sf2 device trees lack a full description for the internal IMP
ports.
Verdict: opt the BCM4908 into strict DT bindings, and opt the rest
into the workarounds. Note that even though BCM4908 has strict DT
bindings, it still does not register with phylink on the IMP port
due to it implementing ->adjust_link().
hellcreek
~~~~~~~~~
compatible strings:
- hirschmann,hellcreek-de1soc-r1
No occurrence in mainline device trees. Kurt Kanzenbach explains
that the downstream device trees lacked phy-mode and fixed link, and
needed work, but were fixed in the meantime.
Verdict: opt into strict DT bindings and out of workarounds.
lan9303
~~~~~~~
compatible strings:
- smsc,lan9303-mdio
- smsc,lan9303-i2c
1 occurrence in mainline device trees:
arch/arm/boot/dts/imx53-kp-hsc.dts
- no phy-mode, no fixed-link
Verdict: opt out of strict DT bindings and into workarounds.
lantiq_gswip
~~~~~~~~~~~~
compatible strings:
- lantiq,xrx200-gswip
- lantiq,xrx300-gswip
- lantiq,xrx330-gswip
No occurrences in mainline device trees. Martin Blumenstingl
confirms that the downstream OpenWrt device trees lack a proper
fixed-link and need work, and that the incomplete description can
even be seen in the example from
Documentation/devicetree/bindings/net/dsa/lantiq-gswip.txt.
Verdict: opt out of strict DT bindings and into workarounds.
microchip ksz
~~~~~~~~~~~~~
compatible strings:
- microchip,ksz8765
- microchip,ksz8794
- microchip,ksz8795
- microchip,ksz8863
- microchip,ksz8873
- microchip,ksz9477
- microchip,ksz9897
- microchip,ksz9893
- microchip,ksz9563
- microchip,ksz8563
- microchip,ksz9567
- microchip,lan9370
- microchip,lan9371
- microchip,lan9372
- microchip,lan9373
- microchip,lan9374
5 occurrences in mainline device trees, all descriptions are valid.
But we had a snafu for the ksz8795 and ksz9477 drivers where the
phy-mode property would be expected to be located directly under the
'switch' node rather than under a port OF node. It was fixed by
commit edecfa98f602 ("net: dsa: microchip: look for phy-mode in port
nodes"). The driver still has compatibility with the old DT blobs.
The lan937x support was added later than the above snafu was fixed,
and even though it has support for the broken DT blobs by virtue of
sharing a common probing function, I'll take it that its DT blobs
are correct.
Verdict: opt lan937x into strict DT bindings, and the others out.
mt7530
~~~~~~
compatible strings
- mediatek,mt7621
- mediatek,mt7530
- mediatek,mt7531
Multiple occurrences in mainline device trees, one is part of an SoC
dtsi (arch/mips/boot/dts/ralink/mt7621.dtsi), all descriptions are fine.
Verdict: opt into strict DT bindings and out of workarounds.
mv88e6060
~~~~~~~~~
compatible string:
- marvell,mv88e6060
no occurrences in mainline, nobody knows anybody who uses it.
Verdict: opt out of strict DT bindings and into workarounds.
mv88e6xxx
~~~~~~~~~
compatible strings:
- marvell,mv88e6085
- marvell,mv88e6190
- marvell,mv88e6250
Device trees that have incomplete descriptions of CPU or DSA ports:
arch/arm64/boot/dts/freescale/imx8mq-zii-ultra.dtsi
- lacks phy-mode
arch/arm64/boot/dts/marvell/cn9130-crb.dtsi
- lacks phy-mode and fixed-link
arch/arm/boot/dts/vf610-zii-ssmb-spu3.dts
- lacks phy-mode
arch/arm/boot/dts/kirkwood-mv88f6281gtw-ge.dts
- lacks phy-mode
arch/arm/boot/dts/vf610-zii-spb4.dts
- lacks phy-mode
arch/arm/boot/dts/vf610-zii-cfu1.dts
- lacks phy-mode
arch/arm/boot/dts/vf610-zii-dev-rev-c.dts
- lacks phy-mode on CPU port, fixed-link on DSA ports
arch/arm/boot/dts/vf610-zii-dev-rev-b.dts
- lacks phy-mode on CPU port
arch/arm/boot/dts/armada-381-netgear-gs110emx.dts
- lacks phy-mode
arch/arm/boot/dts/vf610-zii-scu4-aib.dts
- lacks fixed-link on xgmii DSA ports and/or in-band-status on
2500base-x DSA ports, and phy-mode on CPU port
arch/arm/boot/dts/imx6qdl-gw5904.dtsi
- lacks phy-mode and fixed-link
arch/arm/boot/dts/armada-385-clearfog-gtr-l8.dts
- lacks phy-mode and fixed-link
arch/arm/boot/dts/vf610-zii-ssmb-dtu.dts
- lacks phy-mode
arch/arm/boot/dts/kirkwood-dir665.dts
- lacks phy-mode
arch/arm/boot/dts/kirkwood-rd88f6281.dtsi
- lacks phy-mode
arch/arm/boot/dts/orion5x-netgear-wnr854t.dts
- lacks phy-mode and fixed-link
arch/arm/boot/dts/armada-388-clearfog.dts
- lacks phy-mode
arch/arm/boot/dts/armada-xp-linksys-mamba.dts
- lacks phy-mode
arch/arm/boot/dts/armada-385-linksys.dtsi
- lacks phy-mode
arch/arm/boot/dts/imx6q-b450v3.dts
arch/arm/boot/dts/imx6q-b850v3.dts
- has a phy-handle but not a phy-mode?
arch/arm/boot/dts/armada-370-rd.dts
- lacks phy-mode
arch/arm/boot/dts/kirkwood-linksys-viper.dts
- lacks phy-mode
arch/arm/boot/dts/imx51-zii-rdu1.dts
- lacks phy-mode
arch/arm/boot/dts/imx51-zii-scu2-mezz.dts
- lacks phy-mode
arch/arm/boot/dts/imx6qdl-zii-rdu2.dtsi
- lacks phy-mode
arch/arm/boot/dts/armada-385-clearfog-gtr-s4.dts
- lacks phy-mode and fixed-link
Verdict: opt out of strict DT bindings and into workarounds.
ocelot
~~~~~~
compatible strings:
- mscc,vsc9953-switch
- felix (arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi) is a PCI
device, has no compatible string
2 occurrences in mainline, both are part of SoC dtsi and complete.
Verdict: opt into strict DT bindings and out of workarounds.
qca8k
~~~~~
compatible strings:
- qca,qca8327
- qca,qca8328
- qca,qca8334
- qca,qca8337
5 occurrences in mainline device trees, none of the descriptions are
problematic.
Verdict: opt into strict DT bindings and out of workarounds.
realtek
~~~~~~~
compatible strings:
- realtek,rtl8366rb
- realtek,rtl8365mb
2 occurrences in mainline, both descriptions are fine, additionally
rtl8365mb.c has a comment "The device tree firmware should also
specify the link partner of the extension port - either via a
fixed-link or other phy-handle."
Verdict: opt into strict DT bindings and out of workarounds.
rzn1_a5psw
~~~~~~~~~~
compatible strings:
- renesas,rzn1-a5psw
One single occurrence, part of SoC dtsi
(arch/arm/boot/dts/r9a06g032.dtsi), description is fine.
Verdict: opt into strict DT bindings and out of workarounds.
sja1105
~~~~~~~
Driver already validates its port OF nodes in
sja1105_parse_ports_node().
Verdict: opt into strict DT bindings and out of workarounds.
vsc73xx
~~~~~~~
compatible strings:
- vitesse,vsc7385
- vitesse,vsc7388
- vitesse,vsc7395
- vitesse,vsc7398
2 occurrences in mainline device trees, both descriptions are fine.
Verdict: opt into strict DT bindings and out of workarounds.
xrs700x
~~~~~~~
compatible strings:
- arrow,xrs7003e
- arrow,xrs7003f
- arrow,xrs7004e
- arrow,xrs7004f
no occurrences in mainline, we don't know.
Verdict: opt out of strict DT bindings and into workarounds.
Because there is a pattern where newly added switches reuse existing
drivers more often than introducing new ones, I've opted for deciding
who gets to opt into the workaround based on an OF compatible match
table in the DSA core. The alternative would have been to add another
boolean property to struct dsa_switch, like configure_vlan_while_not_filtering.
But this avoids situations where sometimes driver maintainers obfuscate
what goes on by sharing a common probing function, and therefore making
new switches inherit old quirks.
Side note, we also warn about missing properties for drivers that rely
on the workaround. This isn't an indication that we'll break
compatibility with those DT blobs any time soon, but is rather done to
raise awareness about the change, for future DT blob authors.
Cc: Rob Herring <[email protected]>
Cc: Frank Rowand <[email protected]>
Acked-by: Alvin Šipraga <[email protected]> # realtek
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
There is a subset of functions that applies only to shared (DSA and CPU)
ports, yet this is difficult to comprehend by looking at their code alone.
These are dsa_port_link_register_of(), dsa_port_link_unregister_of(),
and the functions that only these 2 call.
Rename this class of functions to dsa_shared_port_* to make this fact
more evident, even if this goes against the apparent convention that
function names in port.c must start with dsa_port_.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
dsa_port_link_register_of() and dsa_port_link_unregister_of() are not
written with the fact in mind that they can be called with a dp->dn that
is NULL (as evidenced even by the _of suffix in their name), but this is
exactly what happens.
How this behaves will differ depending on whether the backing driver
implements ->adjust_link() or not.
If it doesn't, the "if (of_phy_is_fixed_link(dp->dn) || phy_np)"
condition will return false, and dsa_port_link_register_of() will do
nothing and return 0.
If the driver does implement ->adjust_link(), the
"if (of_phy_is_fixed_link(dp->dn))" condition will return false
(dp->dn is NULL) and we will call dsa_port_setup_phy_of(). This will
call dsa_port_get_phy_device(), which will also return NULL, and we will
also do nothing and return 0.
It is hard to maintain this code and make future changes to it in this
state, so just suppress calls to these 2 functions if dp->dn is NULL.
The only functional effect is that if the driver does implement
->adjust_link(), we'll stop printing this to the console:
Using legacy PHYLIB callbacks. Please migrate to PHYLINK!
but instead we'll always print:
[ 8.539848] dsa-loop fixed-0:1f: skipping link registration for CPU port 5
This is for the better anyway, since "using legacy phylib callbacks"
was misleading information - we weren't issuing _any_ callbacks due to
dsa_port_get_phy_device() returning NULL.
Signed-off-by: Vladimir Oltean <[email protected]>
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Pull NFS client fixes from Trond Myklebust:
"Stable fixes:
- NFS: Fix another fsync() issue after a server reboot
Bugfixes:
- NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT
- NFS: Fix missing unlock in nfs_unlink()
- Add sanity checking of the file type used by __nfs42_ssc_open
- Fix a case where we're failing to set task->tk_rpc_status
Cleanups:
- Remove the NFS_CONTEXT_RESEND_WRITES flag that got obsoleted by the
fsync() fix"
* tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: RPC level errors should set task->tk_rpc_status
NFSv4.2 fix problems with __nfs42_ssc_open
NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT
NFS: Cleanup to remove unused flag NFS_CONTEXT_RESEND_WRITES
NFS: Remove a bogus flag setting in pnfs_write_done_resend_to_mds
NFS: Fix another fsync() issue after a server reboot
NFS: Fix missing unlock in nfs_unlink()
|
|
DECnet is an obsolete network protocol that receives more attention
from kernel janitors than users. It belongs in computer protocol
history museum not in Linux kernel.
It has been "Orphaned" in kernel since 2010. The iproute2 support
for DECnet was dropped in 5.0 release. The documentation link on
Sourceforge says it is abandoned there as well.
Leave the UAPI alone to keep userspace programs compiling.
This means that there is still an empty neighbour table
for AF_DECNET.
The table of /proc/sys/net entries was updated to match
current directories and reformatted to be alphabetical.
Signed-off-by: Stephen Hemminger <[email protected]>
Acked-by: David Ahern <[email protected]>
Acked-by: Nikolay Aleksandrov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Commit 3b3fd068c56e3fbea30090859216a368398e39bf added NULL check for
`rose_loopback_neigh->dev` in rose_loopback_timer() but omitted to
check rose_loopback_neigh->loopback.
It thus prevents *all* rose connect.
The reason is that a special rose_neigh loopback has a NULL device.
/proc/net/rose_neigh illustrates it via rose_neigh_show() function :
[...]
seq_printf(seq, "%05d %-9s %-4s %3d %3d %3s %3s %3lu %3lu",
rose_neigh->number,
(rose_neigh->loopback) ? "RSLOOP-0" : ax2asc(buf, &rose_neigh->callsign),
rose_neigh->dev ? rose_neigh->dev->name : "???",
rose_neigh->count,
/proc/net/rose_neigh displays special rose_loopback_neigh->loopback as
callsign RSLOOP-0:
addr callsign dev count use mode restart t0 tf digipeaters
00001 RSLOOP-0 ??? 1 2 DCE yes 0 0
By checking rose_loopback_neigh->loopback, rose_rx_call_request() is called
even in case rose_loopback_neigh->dev is NULL. This repairs rose connections.
Verification with rose client application FPAC:
FPAC-Node v 4.1.3 (built Aug 5 2022) for LINUX (help = h)
F6BVP-4 (Commands = ?) : u
Users - AX.25 Level 2 sessions :
Port Callsign Callsign AX.25 state ROSE state NetRom status
axudp F6BVP-5 -> F6BVP-9 Connected Connected ---------
Fixes: 3b3fd068c56e ("rose: Fix Null pointer dereference in rose_send_frame()")
Signed-off-by: Bernard Pidoux <[email protected]>
Suggested-by: Francois Romieu <[email protected]>
Cc: Thomas DL9SAU Osterried <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently queue_userspace_packet will call kfree_skb for all frames,
whether or not an error occurred. This can result in a single dropped
frame being reported as multiple drops in dropwatch. This functions
caller may also call kfree_skb in case of an error. This patch will
consume the skbs instead and allow caller's to use kfree_skb.
Signed-off-by: Mike Pattrick <[email protected]>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109957
Signed-off-by: David S. Miller <[email protected]>
|
|
Frames sent to userspace can be reported as dropped in
ovs_dp_process_packet, however, if they are dropped in the netlink code
then netlink_attachskb will report the same frame as dropped.
This patch checks for error codes which indicate that the frame has
already been freed.
Signed-off-by: Mike Pattrick <[email protected]>
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2109946
Signed-off-by: David S. Miller <[email protected]>
|
|
TCP_LISTEN sockets is a special case. They preserve skb with a newly
connected sock till accept() makes it fully functional socket.
Receive queue of such socket may grow after connected peer
send messages there. Since these messages may contain scm_fds,
we should expose correct fdinfo::scm_fds for listening socket too.
Signed-off-by: Kirill Tkhai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
The system hangs up when batman-adv soft-interface is created on
hard-interface with small MTU. For example, the following commands
create batman-adv soft-interface on dummy interface with zero MTU:
# ip link add name dummy0 type dummy
# ip link set mtu 0 dev dummy0
# ip link set up dev dummy0
# ip link add name bat0 type batadv
# ip link set dev dummy0 master bat0
These commands cause the system hang up with the following messages:
[ 90.578925][ T6689] batman_adv: bat0: Adding interface: dummy0
[ 90.580884][ T6689] batman_adv: bat0: The MTU of interface dummy0 is too small (0) to handle the transport of batman-adv packets. Packets going over this interface will be fragmented on layer2 which could impact the performance. Setting the MTU to 1560 would solve the problem.
[ 90.586264][ T6689] batman_adv: bat0: Interface activated: dummy0
[ 90.590061][ T6689] batman_adv: bat0: Forced to purge local tt entries to fit new maximum fragment MTU (-320)
[ 90.595517][ T6689] batman_adv: bat0: Forced to purge local tt entries to fit new maximum fragment MTU (-320)
[ 90.598499][ T6689] batman_adv: bat0: Forced to purge local tt entries to fit new maximum fragment MTU (-320)
This patch fixes this issue by returning error when enabling
hard-interface with small MTU size.
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Signed-off-by: Shigeru Yoshida <[email protected]>
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Simon Wunderlich <[email protected]>
|
|
The commit 94dfc73e7cf4 ("treewide: uapi: Replace zero-length arrays with
flexible-array members") changed various structures from using 0-length
arrays to flexible arrays
net/batman-adv/bat_v_elp.c: note: in included file:
./include/linux/ethtool.h:148:38: warning: nested flexible array
net/batman-adv/bat_v_elp.c:128:9: warning: using sizeof on a flexible structure
In theory, this could be worked around by using {} as initializer for the
variable on the stack. But this variable doesn't has to be initialized at
all by the caller of __ethtool_get_link_ksettings - everything will be
initialized by the callee when no error occurs.
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Simon Wunderlich <[email protected]>
|
|
Fix up a case in call_encode() where we're failing to set
task->tk_rpc_status when an RPC level error occurred.
Fixes: 9c5948c24869 ("SUNRPC: task should be exit if encode return EKEYEXPIRED more times")
Signed-off-by: Trond Myklebust <[email protected]>
|
|
No conflicts.
Signed-off-by: Jakub Kicinski <[email protected]>
|