Age | Commit message (Collapse) | Author | Files | Lines |
|
Add FEC API to netlink.
This is not a 1-to-1 conversion.
FEC settings already depend on link modes to tell user which
modes are supported. Take this further an use link modes for
manual configuration. Old struct ethtool_fecparam is still
used to talk to the drivers, so we need to translate back
and forth. We can revisit the internal API if number of FEC
encodings starts to grow.
Enforce only one active FEC bit (by using a bit position
rather than another mask).
Signed-off-by: Jakub Kicinski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After a short network outage, the dst_entry is timed out and put
in DST_OBSOLETE_DEAD. We are in this code because arp reply comes
from this neighbour after network recovers. There is a potential
race condition that dst_entry is still in DST_OBSOLETE_DEAD.
With that, another neighbour lookup causes more harm than good.
In best case all packets in arp_queue are lost. This is
counterproductive to the original goal of finding a better path
for those packets.
I observed a worst case with 4.x kernel where a dst_entry in
DST_OBSOLETE_DEAD state is associated with loopback net_device.
It leads to an ethernet header with all zero addresses.
A packet with all zero source MAC address is quite deadly with
mac80211, ath9k and 802.11 block ack. It fails
ieee80211_find_sta_by_ifaddr in ath9k (xmit.c). Ath9k flushes tx
queue (ath_tx_complete_aggr). BAW (block ack window) is not
updated. BAW logic is damaged and ath9k transmission is disabled.
Signed-off-by: Tong Zhu <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
This patch adds a helper function to set up the netlink and nfnetlink headers.
Update existing codebase to use it.
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
This patch adds a helper function to calculate the base sequence number
field that is stored in the nfnetlink header. Use the helper function
whenever possible.
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
The spinlock nf_tables_destroy_list_lock is initialized statically.
It is unnecessary to initialize by spin_lock_init().
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Yang Yingliang <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Move dst_check() to the garbage collector path. Stale routes trigger the
flow entry teardown state which makes affected flows go back to the
classic forwarding path to re-evaluate flow offloading.
IPv6 requires the dst cookie to work, store it in the flow_tuple,
otherwise dst_check() always fails.
Fixes: e5075c0badaa ("netfilter: flowtable: call dst_check() to fall back to classic forwarding")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Reduce logging of nftables events to a level similar to iptables.
Restore the table field to list the table, adding the generation.
Indicate the op as the most significant operation in the event.
A couple of sample events:
type=PROCTITLE msg=audit(2021-03-18 09:30:49.801:143) : proctitle=/usr/bin/python3 -s /usr/sbin/firewalld --nofork --nopid
type=SYSCALL msg=audit(2021-03-18 09:30:49.801:143) : arch=x86_64 syscall=sendmsg success=yes exit=172 a0=0x6 a1=0x7ffdcfcbe650 a2=0x0 a3=0x7ffdcfcbd52c items=0 ppid=1 pid=367 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=roo
t sgid=root fsgid=root tty=(none) ses=unset comm=firewalld exe=/usr/bin/python3.9 subj=system_u:system_r:firewalld_t:s0 key=(null)
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=ipv6 entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=ipv4 entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=inet entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
type=PROCTITLE msg=audit(2021-03-18 09:30:49.839:144) : proctitle=/usr/bin/python3 -s /usr/sbin/firewalld --nofork --nopid
type=SYSCALL msg=audit(2021-03-18 09:30:49.839:144) : arch=x86_64 syscall=sendmsg success=yes exit=22792 a0=0x6 a1=0x7ffdcfcbe650 a2=0x0 a3=0x7ffdcfcbd52c items=0 ppid=1 pid=367 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=r
oot sgid=root fsgid=root tty=(none) ses=unset comm=firewalld exe=/usr/bin/python3.9 subj=system_u:system_r:firewalld_t:s0 key=(null)
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=ipv6 entries=30 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=ipv4 entries=30 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=inet entries=165 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld
The issue was originally documented in
https://github.com/linux-audit/audit-kernel/issues/124
Signed-off-by: Richard Guy Briggs <[email protected]>
Acked-by: Paul Moore <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
modprobe calls from the nf_logger_find_get() API causes deadlock in very
special cases because they occur with the nf_tables transaction mutex held.
In the specific case of nf_log, deadlock is via:
A nf_tables -> transaction mutex -> nft_log -> modprobe -> nf_log_syslog \
-> pernet_ops rwsem -> wait for C
B netlink event -> rtnl_mutex -> nf_tables transaction mutex -> wait for A
C close() -> ip6mr_sk_done -> rtnl_mutex -> wait for B
Earlier patch added NFLOG/xt_LOG module softdeps to avoid the need to load
the backend module during a transaction.
For nft_log we would have to add a softdep for both nfnetlink_log or
nf_log_syslog, since we do not know in advance which of the two backends
are going to be configured.
This defers the modprobe op until after the transaction mutex is released.
Tested-by: Phil Sutter <[email protected]>
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
xt_LOG has no direct dependency on the syslog-based logger, it relies
on the nf_log core to probe the requested backend.
Now that all syslog-based loggers reside in the same module, we can
just add a soft dependency on nf_log_syslog and let modprobe take
care of it.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Remove nf_log_common. Now that all per-af modules have been merged
there is no longer a need to provide a helper module.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Provide bridge log support from nf_log_syslog.
After the merge there is no need to load the "real packet loggers",
all of them now reside in the same module.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
This, to me, seems less cluttered and less redundant. I was hoping
it could help reduce lock contention on the dto_q lock by reducing
the size of the critical section, but alas, the only improvement is
readability.
Signed-off-by: Chuck Lever <[email protected]>
|
|
These fields are no longer used.
The size of struct svc_rdma_recv_ctxt is now less than 300 bytes on
x86_64, down from 2440 bytes.
Signed-off-by: Chuck Lever <[email protected]>
|
|
Now that svc_rdma_recvfrom() waits for Read completion,
sc_read_complete_q is no longer used.
Signed-off-by: Chuck Lever <[email protected]>
|
|
Currently the generic RPC server layer calls svc_rdma_recvfrom()
twice to retrieve an RPC message that uses Read chunks. I'm not
exactly sure why this design was chosen originally.
Instead, let's wait for the Read chunk completion inline in the
first call to svc_rdma_recvfrom().
The goal is to eliminate some page allocator churn.
rdma_read_complete() replaces pages in the second svc_rqst by
calling put_page() repeatedly while the upper layer waits for the
request to be constructed, which adds unnecessary NFS WRITE round-
trip latency.
Signed-off-by: Chuck Lever <[email protected]>
Reviewed-by: Tom Talpey <[email protected]>
|
|
This patch added a new function mptcp_nl_remove_id_zero_address to
remove the id 0 address.
In this function, traverse all the existing msk sockets to find the
msk matched the input IP address. Then fill the removing list with
id 0, and pass it to mptcp_pm_remove_addr and mptcp_pm_remove_subflow.
Suggested-by: Paolo Abeni <[email protected]>
Suggested-by: Matthieu Baerts <[email protected]>
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
There are some duplicate code in mptcp_pm_nl_rm_addr_received and
mptcp_pm_nl_rm_subflow_received. This patch unifies them into a new
function named mptcp_pm_nl_rm_addr_or_subflow. In it, use the input
parameter rm_type to identify it's now removing an address or a subflow.
Suggested-by: Paolo Abeni <[email protected]>
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
There's only one subflow involving the non-zero id address, but there
may be multi subflows involving the id 0 address.
Here's an example:
local_id=0, remote_id=0
local_id=1, remote_id=0
local_id=0, remote_id=1
If the removing address id is 0, all the subflows involving the id 0
address need to be removed.
In mptcp_pm_nl_rm_addr_received/mptcp_pm_nl_rm_subflow_received, the
"break" prevents the iteration to the next subflow, so this patch
dropped them.
Reviewed-by: Mat Martineau <[email protected]>
Signed-off-by: Geliang Tang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
sysctl_icmp_echo_enable_probe is an u8.
ipv4_net_table entry should use
.maxlen = sizeof(u8).
.proc_handler = proc_dou8vec_minmax,
Fixes: f1b8fa9fa586 ("net: add sysctl for enabling RFC 8335 PROBE messages")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Andreas Roeseler <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently the UDP protocol delivers GSO_FRAGLIST packets to
the sockets without the expected segmentation.
This change addresses the issue introducing and maintaining
a couple of new fields to explicitly accept SKB_GSO_UDP_L4
or GSO_FRAGLIST packets. Additionally updates udp_unexpected_gso()
accordingly.
UDP sockets enabling UDP_GRO stil keep accept_udp_fraglist
zeroed.
v1 -> v2:
- use 2 bits instead of a whole GSO bitmask (Willem)
Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After the previous patch, the stack can do L4 UDP aggregation
on top of a UDP tunnel.
In such scenario, udp{4,6}_gro_complete will be called twice. This function
will enter its is_flist branch immediately, even though that is only
correct on the second call, as GSO_FRAGLIST is only relevant for the
inner packet.
Instead, we need to try first UDP tunnel-based aggregation, if the GRO
packet requires that.
This patch changes udp{4,6}_gro_complete to skip the frag list processing
when while encap_mark == 1, identifying processing of the outer tunnel
header.
Additionally, clears the field in udp_gro_complete() so that we can enter
the frag list path on the next round, for the inner header.
v1 -> v2:
- hopefully clarified the commit message
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If NETIF_F_GRO_FRAGLIST or NETIF_F_GRO_UDP_FWD are enabled, and there
are UDP tunnels available in the system, udp_gro_receive() could end-up
doing L4 aggregation (either SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST) at
the outer UDP tunnel level for packets effectively carrying and UDP
tunnel header.
That could cause inner protocol corruption. If e.g. the relevant
packets carry a vxlan header, different vxlan ids will be ignored/
aggregated to the same GSO packet. Inner headers will be ignored, too,
so that e.g. TCP over vxlan push packets will be held in the GRO
engine till the next flush, etc.
Just skip the SKB_GSO_UDP_L4 and SKB_GSO_FRAGLIST code path if the
current packet could land in a UDP tunnel, and let udp_gro_receive()
do GRO via udp_sk(sk)->gro_receive.
The check implemented in this patch is broader than what is strictly
needed, as the existing UDP tunnel could be e.g. configured on top of
a different device: we could end-up skipping GRO at-all for some packets.
Anyhow, that is a very thin corner case and covering it will add quite
a bit of complexity.
v1 -> v2:
- hopefully clarify the commit message
Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.")
Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets")
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When UDP packets generated locally by a socket with UDP_SEGMENT
traverse the following path:
UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) ->
UDP tunnel (rx) -> UDP socket (no UDP_GRO)
ip_summed will be set to CHECKSUM_PARTIAL at creation time and
such checksum mode will be preserved in the above path up to the
UDP tunnel receive code where we have:
__iptunnel_pull_header() -> skb_pull_rcsum() ->
skb_postpull_rcsum() -> __skb_postpull_rcsum()
The latter will convert the skb to CHECKSUM_NONE.
The UDP GSO packet will be later segmented as part of the rx socket
receive operation, and will present a CHECKSUM_NONE after segmentation.
Additionally the segmented packets UDP CB still refers to the original
GSO packet len. Overall that causes unexpected/wrong csum validation
errors later in the UDP receive path.
We could possibly address the issue with some additional checks and
csum mangling in the UDP tunnel code. Since the issue affects only
this UDP receive slow path, let's set a suitable csum status there.
Note that SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST packets lacking an UDP
encapsulation present a valid checksum when landing to udp_queue_rcv_skb(),
as the UDP checksum has been validated by the GRO engine.
v2 -> v3:
- even more verbose commit message and comments
v1 -> v2:
- restrict the csum update to the packets strictly needing them
- hopefully clarify the commit message and code comments
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The TODO file here has not been updated from 2005, and the function
development described in the file have been implemented or abandoned.
Its existence will mislead developers seeking to view outdated information.
Signed-off-by: Wang Qing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
The TODO file here has not been updated for 13 years, and the function
development described in the file have been implemented or abandoned.
Its existence will mislead developers seeking to view outdated information.
Signed-off-by: Wang Qing <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
/proc/net/nf_conntrack shows icmpv6 as unknown.
Fixes: 09ec82f5af99 ("netfilter: conntrack: remove protocol name from l4proto struct")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Fix out-of-bound access in the address array.
Fixes: 5c27d8d76ce8 ("netfilter: nf_flow_table_offload: add IPv6 support")
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Provide netdev family support from the nf_log_syslog module.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
This removes the nf_log_ipv6 module, the functionality is now
provided by nf_log_syslog.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
similar to previous change: nf_log_syslog now covers ARP logging
as well, the old nf_log_arp module is removed.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Netfilter has multiple log modules:
nf_log_arp
nf_log_bridge
nf_log_ipv4
nf_log_ipv6
nf_log_netdev
nfnetlink_log
nf_log_common
With the exception of nfnetlink_log (packet is sent to userspace for
dissection/logging), all of them log to the kernel ringbuffer.
This is the first part of a series to merge all modules except
nfnetlink_log into a single module: nf_log_syslog.
This allows to reduce code. After the series, only two log modules remain:
nfnetlink_log and nf_log_syslog. The latter provides the same
functionality as the old per-af log modules.
This renames nf_log_ipv4 to nf_log_syslog.
Signed-off-by: Florian Westphal <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
|
|
Currently the mentioned helper can end-up freeing the socket wmem
without waking-up any processes waiting for more write memory.
If the partially orphaned skb is attached to an UDP (or raw) socket,
the lack of wake-up can hang the user-space.
Even for TCP sockets not calling the sk destructor could have bad
effects on TSQ.
Address the issue using skb_orphan to release the sk wmem before
setting the new sock_efree destructor. Additionally bundle the
whole ownership update in a new helper, so that later other
potential users could avoid duplicate code.
v1 -> v2:
- use skb_orphan() instead of sort of open coding it (Eric)
- provide an helper for the ownership change (Eric)
Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()")
Suggested-by: Eric Dumazet <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
Reviewed-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
sch_htb: fix null pointer dereference on a null new_q
Currently if new_q is null, the null new_q pointer will be
dereference when 'q->offload' is true. Fix this by adding
a braces around htb_parent_to_leaf_offload() to avoid it.
Addresses-Coverity: ("Dereference after null check")
Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload")
Signed-off-by: Yunjian Wang <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
qrtr_tx_wait does not check for radix_tree_insert failure, causing
the 'flow' object to be unreferenced after qrtr_tx_wait return. Fix
that by releasing flow on radix_tree_insert failure.
Fixes: 5fdeb0d372ab ("net: qrtr: Implement outgoing flow control")
Reported-by: [email protected]
Signed-off-by: Loic Poulain <[email protected]>
Reviewed-by: Bjorn Andersson <[email protected]>
Reviewed-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Modify the icmp_rcv function to check PROBE messages and call icmp_echo
if a PROBE request is detected.
Modify the existing icmp_echo function to respond ot both ping and PROBE
requests.
This was tested using a custom modification to the iputils package and
wireshark. It supports IPV4 probing by name, ifindex, and probing by
both IPV4 and IPV6 addresses. It currently does not support responding
to probes off the proxy node (see RFC 8335 Section 2).
The modification to the iputils package is still in development and can
be found here: https://github.com/Juniper-Clinic-2020/iputils.git. It
supports full sending functionality of PROBE requests, but currently
does not parse the response messages, which is why Wireshark is required
to verify the sent and recieved PROBE messages. The modification adds
the ``-e'' flag to the command which allows the user to specify the
interface identifier to query the probed host. An example usage would be
<./ping -4 -e 1 [destination]> to send a PROBE request of ifindex 1 to the
destination node.
Signed-off-by: Andreas Roeseler <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add ipv6_dev_find to ipv6_stub to allow lookup of net_devices by IPV6
address in net/ipv4/icmp.c.
Signed-off-by: Andreas Roeseler <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Modify the ping_supported function to support PROBE message types. This
allows tools such as the ping command in the iputils package to be
modified to send PROBE requests through the existing framework for
sending ping requests.
Signed-off-by: Andreas Roeseler <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Section 8 of RFC 8335 specifies potential security concerns of
responding to PROBE requests, and states that nodes that support PROBE
functionality MUST be able to enable/disable responses and that
responses MUST be disabled by default
Signed-off-by: Andreas Roeseler <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Currently, action creation using ACT API in replace mode is buggy.
When invoking for non-existent action index 42,
tc action replace action bpf obj foo.o sec <xyz> index 42
kernel creates the action, fills up the netlink response, and then just
deletes the action after notifying userspace.
tc action show action bpf
doesn't list the action.
This happens due to the following sequence when ovr = 1 (replace mode)
is enabled:
tcf_idr_check_alloc is used to atomically check and either obtain
reference for existing action at index, or reserve the index slot using
a dummy entry (ERR_PTR(-EBUSY)).
This is necessary as pointers to these actions will be held after
dropping the idrinfo lock, so bumping the reference count is necessary
as we need to insert the actions, and notify userspace by dumping their
attributes. Finally, we drop the reference we took using the
tcf_action_put_many call in tcf_action_add. However, for the case where
a new action is created due to free index, its refcount remains one.
This when paired with the put_many call leads to the kernel setting up
the action, notifying userspace of its creation, and then tearing it
down. For existing actions, the refcount is still held so they remain
unaffected.
Fortunately due to rtnl_lock serialization requirement, such an action
with refcount == 1 will not be concurrently deleted by anything else, at
best CLS API can move its refcount up and down by binding to it after it
has been published from tcf_idr_insert_many. Since refcount is atleast
one until put_many call, CLS API cannot delete it. Also __tcf_action_put
release path already ensures deterministic outcome (either new action
will be created or existing action will be reused in case CLS API tries
to bind to action concurrently) due to idr lock serialization.
We fix this by making refcount of newly created actions as 2 in ACT API
replace mode. A relaxed store will suffice as visibility is ensured only
after the tcf_idr_insert_many call.
Note that in case of creation or overwriting using CLS API only (i.e.
bind = 1), overwriting existing action object is not allowed, and any
such request is silently ignored (without error).
The refcount bump that occurs in tcf_idr_check_alloc call there for
existing action will pair with tcf_exts_destroy call made from the
owner module for the same action. In case of action creation, there
is no existing action, so no tcf_exts_destroy callback happens.
This means no code changes for CLS API.
Fixes: cae422f379f3 ("net: sched: use reference counting action init")
Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Calling ncsi_stop_channel_monitor from channel_monitor is a guaranteed
deadlock on SMP because stop calls del_timer_sync on the timer that
invoked channel_monitor as its timer function.
Recognise the inherent race of marking the monitor disabled before
deleting the timer by just returning if enable was cleared. After
a timeout (the default case -- reset to START when response received)
just mark the monitor.enabled false.
If the channel has an entry on the channel_queue list, or if the
state is not ACTIVE or INACTIVE, then warn and mark the timer stopped
and don't restart, as the locking is broken somehow.
Fixes: 0795fb2021f0 ("net/ncsi: Stop monitor if channel times out or is inactive")
Signed-off-by: Milton Miller <[email protected]>
Signed-off-by: Eddie James <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
checkpatch started to complain about the mispelling of:
CHECK: 'wont' may be misspelled - perhaps 'won't'?
#459: FILE: ./net/batman-adv/bat_iv_ogm.c:459:
+ * - the resulting packet wont be bigger than
Signed-off-by: Sven Eckelmann <[email protected]>
Signed-off-by: Simon Wunderlich <[email protected]>
|
|
Replace WARN_ONCE() that can be triggered from userspace with
pr_warn_once(). Those still give user a hint what's the issue.
I've left WARN()s that are not possible to trigger with current
code-base and that would mean that the code has issues:
- relying on current compat_msg_min[type] <= xfrm_msg_min[type]
- expected 4-byte padding size difference between
compat_msg_min[type] and xfrm_msg_min[type]
- compat_policy[type].len <= xfrma_policy[type].len
(for every type)
Reported-by: [email protected]
Fixes: 5f3eea6b7e8f ("xfrm/compat: Attach xfrm dumps to 64=>32 bit translator")
Cc: "David S. Miller" <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Herbert Xu <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Steffen Klassert <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Dmitry Safonov <[email protected]>
Signed-off-by: Steffen Klassert <[email protected]>
|
|
pahole currently only generates the btf_id for external function and
ftrace-able function. Some functions in the bpf_tcp_ca_kfunc_ids
are static (e.g. cubictcp_init). Thus, unless CONFIG_DYNAMIC_FTRACE
is set, btf_ids for those functions will not be generated and the
compilation fails during resolve_btfids.
This patch limits those functions to CONFIG_DYNAMIC_FTRACE. I will
address the pahole generation in a followup and then remove the
CONFIG_DYNAMIC_FTRACE limitation.
Fixes: e78aea8b2170 ("bpf: tcp: Put some tcp cong functions in allowlist for bpf-tcp-cc")
Reported-by: Cong Wang <[email protected]>
Reported-by: Lorenz Bauer <[email protected]>
Signed-off-by: Martin KaFai Lau <[email protected]>
Signed-off-by: Alexei Starovoitov <[email protected]>
Link: https://lore.kernel.org/bpf/[email protected]
|
|
tcp_min_tso_segs is now stored in u8, so max value is 255.
255 limit is enforced by proc_dou8vec_minmax().
We can therefore remove the gso_max_segs variable.
Fixes: 47996b489bdc ("tcp: convert elligible sysctls to u8")
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger
a warning [1]
Issue here is that:
- all dev_put() should be paired with a corresponding prior dev_hold().
- A driver doing a dev_put() in its ndo_uninit() MUST also
do a dev_hold() in its ndo_init(), only when ndo_init()
is returning 0.
Otherwise, register_netdevice() would call ndo_uninit()
in its error path and release a refcount too soon.
Fixes: 919067cc845f ("net: add CONFIG_PCPU_DEV_REFCNT")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger
a warning [1]
Issue here is that:
- all dev_put() should be paired with a corresponding prior dev_hold().
- A driver doing a dev_put() in its ndo_uninit() MUST also
do a dev_hold() in its ndo_init(), only when ndo_init()
is returning 0.
Otherwise, register_netdevice() would call ndo_uninit()
in its error path and release a refcount too soon.
Therefore, we need to move dev_hold() call from
vti6_tnl_create2() to vti6_dev_init_gen()
[1]
WARNING: CPU: 0 PID: 15951 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
Modules linked in:
CPU: 0 PID: 15951 Comm: syz-executor.3 Not tainted 5.12.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
Code: 1d 6a 5a e8 09 31 ff 89 de e8 8d 1a ab fd 84 db 75 e0 e8 d4 13 ab fd 48 c7 c7 a0 e1 c1 89 c6 05 4a 5a e8 09 01 e8 2e 36 fb 04 <0f> 0b eb c4 e8 b8 13 ab fd 0f b6 1d 39 5a e8 09 31 ff 89 de e8 58
RSP: 0018:ffffc90001eaef28 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000040000 RSI: ffffffff815c51f5 RDI: fffff520003d5dd7
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815bdf8e R11: 0000000000000000 R12: ffff88801bb1c568
R13: ffff88801f69e800 R14: 00000000ffffffff R15: ffff888050889d40
FS: 00007fc79314e700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1c1ff47108 CR3: 0000000020fd5000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__refcount_dec include/linux/refcount.h:344 [inline]
refcount_dec include/linux/refcount.h:359 [inline]
dev_put include/linux/netdevice.h:4135 [inline]
vti6_dev_uninit+0x31a/0x360 net/ipv6/ip6_vti.c:297
register_netdevice+0xadf/0x1500 net/core/dev.c:10308
vti6_tnl_create2+0x1b5/0x400 net/ipv6/ip6_vti.c:190
vti6_newlink+0x9d/0xd0 net/ipv6/ip6_vti.c:1020
__rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3443
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3491
rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502
netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:674
____sys_sendmsg+0x331/0x810 net/socket.c:2350
___sys_sendmsg+0xf3/0x170 net/socket.c:2404
__sys_sendmmsg+0x195/0x470 net/socket.c:2490
__do_sys_sendmmsg net/socket.c:2519 [inline]
__se_sys_sendmmsg net/socket.c:2516 [inline]
__x64_sys_sendmmsg+0x99/0x100 net/socket.c:2516
Signed-off-by: Eric Dumazet <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger
a warning [1]
Issue here is that:
- all dev_put() should be paired with a corresponding dev_hold(),
and vice versa.
- A driver doing a dev_put() in its ndo_uninit() MUST also
do a dev_hold() in its ndo_init(), only when ndo_init()
is returning 0.
Otherwise, register_netdevice() would call ndo_uninit()
in its error path and release a refcount too soon.
ip6_gre for example (among others problematic drivers)
has to use dev_hold() in ip6gre_tunnel_init_common()
instead of from ip6gre_newlink_common(), covering
both ip6gre_tunnel_init() and ip6gre_tap_init()/
Note that ip6gre_tunnel_init_common() is not called from
ip6erspan_tap_init() thus we also need to add a dev_hold() there,
as ip6erspan_tunnel_uninit() does call dev_put()
[1]
refcount_t: decrement hit 0; leaking memory.
WARNING: CPU: 0 PID: 8422 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
Modules linked in:
CPU: 1 PID: 8422 Comm: syz-executor854 Not tainted 5.12.0-rc4-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31
Code: 1d 6a 5a e8 09 31 ff 89 de e8 8d 1a ab fd 84 db 75 e0 e8 d4 13 ab fd 48 c7 c7 a0 e1 c1 89 c6 05 4a 5a e8 09 01 e8 2e 36 fb 04 <0f> 0b eb c4 e8 b8 13 ab fd 0f b6 1d 39 5a e8 09 31 ff 89 de e8 58
RSP: 0018:ffffc900018befd0 EFLAGS: 00010282
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffff88801ef19c40 RSI: ffffffff815c51f5 RDI: fffff52000317dec
RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000
R10: ffffffff815bdf8e R11: 0000000000000000 R12: ffff888018cf4568
R13: ffff888018cf4c00 R14: ffff8880228f2000 R15: ffffffff8d659b80
FS: 00000000014eb300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055d7bf2b3138 CR3: 0000000014933000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__refcount_dec include/linux/refcount.h:344 [inline]
refcount_dec include/linux/refcount.h:359 [inline]
dev_put include/linux/netdevice.h:4135 [inline]
ip6gre_tunnel_uninit+0x3d7/0x440 net/ipv6/ip6_gre.c:420
register_netdevice+0xadf/0x1500 net/core/dev.c:10308
ip6gre_newlink_common.constprop.0+0x158/0x410 net/ipv6/ip6_gre.c:1984
ip6gre_newlink+0x275/0x7a0 net/ipv6/ip6_gre.c:2017
__rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3443
rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3491
rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553
netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502
netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline]
netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338
netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927
sock_sendmsg_nosec net/socket.c:654 [inline]
sock_sendmsg+0xcf/0x120 net/socket.c:674
____sys_sendmsg+0x6e8/0x810 net/socket.c:2350
___sys_sendmsg+0xf3/0x170 net/socket.c:2404
__sys_sendmsg+0xe5/0x1b0 net/socket.c:2433
do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46
Fixes: 919067cc845f ("net: add CONFIG_PCPU_DEV_REFCNT")
Signed-off-by: Eric Dumazet <[email protected]>
Reported-by: syzbot <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
We fix a warning from the htmldoc tool and an indentation error reported
by smatch. There are no functional changes in this commit.
Signed-off-by: Jon Maloy <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In the if(skb_peek(arrvq) == skb) branch, it calls __skb_dequeue(arrvq) to get
the skb by skb = skb_peek(arrvq). Then __skb_dequeue() unlinks the skb from arrvq
and returns the skb which equals to skb_peek(arrvq). After __skb_dequeue(arrvq)
finished, the skb is freed by kfree_skb(__skb_dequeue(arrvq)) in the first time.
Unfortunately, the same skb is freed in the second time by kfree_skb(skb) after
the branch completed.
My patch removes kfree_skb() in the if(skb_peek(arrvq) == skb) branch, because
this skb will be freed by kfree_skb(skb) finally.
Fixes: cb1b728096f54 ("tipc: eliminate race condition at multicast reception")
Signed-off-by: Lv Yunlong <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
If PHY is not available on DSA port (described at devicetree but absent or
failed to detect) then kernel prints warning after 3700 secs:
[ 3707.948771] ------------[ cut here ]------------
[ 3707.948784] Type was not set for devlink port.
[ 3707.948894] WARNING: CPU: 1 PID: 17 at net/core/devlink.c:8097 0xc083f9d8
We should unregister the devlink port as a user port and
re-register it as an unused port before executing "continue" in case of
dsa_port_setup error.
Fixes: 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal")
Signed-off-by: Maxim Kochetkov <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|