aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2021-03-31ethtool: support FEC settings over netlinkJakub Kicinski4-1/+262
Add FEC API to netlink. This is not a 1-to-1 conversion. FEC settings already depend on link modes to tell user which modes are supported. Take this further an use link modes for manual configuration. Old struct ethtool_fecparam is still used to talk to the drivers, so we need to translate back and forth. We can revisit the internal API if number of FEC encodings starts to grow. Enforce only one active FEC bit (by using a bit position rather than another mask). Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31neighbour: Disregard DEAD dst in neigh_updateTong Zhu1-1/+1
After a short network outage, the dst_entry is timed out and put in DST_OBSOLETE_DEAD. We are in this code because arp reply comes from this neighbour after network recovers. There is a potential race condition that dst_entry is still in DST_OBSOLETE_DEAD. With that, another neighbour lookup causes more harm than good. In best case all packets in arp_queue are lost. This is counterproductive to the original goal of finding a better path for those packets. I observed a worst case with 4.x kernel where a dst_entry in DST_OBSOLETE_DEAD state is associated with loopback net_device. It leads to an ethernet header with all zero addresses. A packet with all zero source MAC address is quite deadly with mac80211, ath9k and 802.11 block ack. It fails ieee80211_find_sta_by_ifaddr in ath9k (xmit.c). Ath9k flushes tx queue (ath_tx_complete_aggr). BAW (block ack window) is not updated. BAW logic is damaged and ath9k transmission is disabled. Signed-off-by: Tong Zhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31netfilter: add helper function to set up the nfnetlink header and use itPablo Neira Ayuso10-208/+75
This patch adds a helper function to set up the netlink and nfnetlink headers. Update existing codebase to use it. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: nftables: add helper function to set the base sequence numberPablo Neira Ayuso1-9/+14
This patch adds a helper function to calculate the base sequence number field that is stored in the nfnetlink header. Use the helper function whenever possible. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: nftables: remove unnecessary spin_lock_init()Yang Yingliang1-1/+0
The spinlock nf_tables_destroy_list_lock is initialized statically. It is unnecessary to initialize by spin_lock_init(). Reported-by: Hulk Robot <[email protected]> Signed-off-by: Yang Yingliang <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: flowtable: dst_check() from garbage collector pathPablo Neira Ayuso2-19/+40
Move dst_check() to the garbage collector path. Stale routes trigger the flow entry teardown state which makes affected flows go back to the classic forwarding path to re-evaluate flow offloading. IPv6 requires the dst cookie to work, store it in the flow_tuple, otherwise dst_check() always fails. Fixes: e5075c0badaa ("netfilter: flowtable: call dst_check() to fall back to classic forwarding") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31audit: log nftables configuration change events once per tableRichard Guy Briggs1-83/+103
Reduce logging of nftables events to a level similar to iptables. Restore the table field to list the table, adding the generation. Indicate the op as the most significant operation in the event. A couple of sample events: type=PROCTITLE msg=audit(2021-03-18 09:30:49.801:143) : proctitle=/usr/bin/python3 -s /usr/sbin/firewalld --nofork --nopid type=SYSCALL msg=audit(2021-03-18 09:30:49.801:143) : arch=x86_64 syscall=sendmsg success=yes exit=172 a0=0x6 a1=0x7ffdcfcbe650 a2=0x0 a3=0x7ffdcfcbd52c items=0 ppid=1 pid=367 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=roo t sgid=root fsgid=root tty=(none) ses=unset comm=firewalld exe=/usr/bin/python3.9 subj=system_u:system_r:firewalld_t:s0 key=(null) type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=ipv6 entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=ipv4 entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.801:143) : table=firewalld:2 family=inet entries=1 op=nft_register_table pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=PROCTITLE msg=audit(2021-03-18 09:30:49.839:144) : proctitle=/usr/bin/python3 -s /usr/sbin/firewalld --nofork --nopid type=SYSCALL msg=audit(2021-03-18 09:30:49.839:144) : arch=x86_64 syscall=sendmsg success=yes exit=22792 a0=0x6 a1=0x7ffdcfcbe650 a2=0x0 a3=0x7ffdcfcbd52c items=0 ppid=1 pid=367 auid=unset uid=root gid=root euid=root suid=root fsuid=root egid=r oot sgid=root fsgid=root tty=(none) ses=unset comm=firewalld exe=/usr/bin/python3.9 subj=system_u:system_r:firewalld_t:s0 key=(null) type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=ipv6 entries=30 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=ipv4 entries=30 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld type=NETFILTER_CFG msg=audit(2021-03-18 09:30:49.839:144) : table=firewalld:3 family=inet entries=165 op=nft_register_chain pid=367 subj=system_u:system_r:firewalld_t:s0 comm=firewalld The issue was originally documented in https://github.com/linux-audit/audit-kernel/issues/124 Signed-off-by: Richard Guy Briggs <[email protected]> Acked-by: Paul Moore <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: nft_log: perform module load from nf_tablesFlorian Westphal3-6/+22
modprobe calls from the nf_logger_find_get() API causes deadlock in very special cases because they occur with the nf_tables transaction mutex held. In the specific case of nf_log, deadlock is via: A nf_tables -> transaction mutex -> nft_log -> modprobe -> nf_log_syslog \ -> pernet_ops rwsem -> wait for C B netlink event -> rtnl_mutex -> nf_tables transaction mutex -> wait for A C close() -> ip6mr_sk_done -> rtnl_mutex -> wait for B Earlier patch added NFLOG/xt_LOG module softdeps to avoid the need to load the backend module during a transaction. For nft_log we would have to add a softdep for both nfnetlink_log or nf_log_syslog, since we do not know in advance which of the two backends are going to be configured. This defers the modprobe op until after the transaction mutex is released. Tested-by: Phil Sutter <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: nf_log: add module softdepsFlorian Westphal3-0/+3
xt_LOG has no direct dependency on the syslog-based logger, it relies on the nf_log core to probe the requested backend. Now that all syslog-based loggers reside in the same module, we can just add a soft dependency on nf_log_syslog and let modprobe take care of it. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: nf_log_common: merge with nf_log_syslogFlorian Westphal4-234/+181
Remove nf_log_common. Now that all per-af modules have been merged there is no longer a need to provide a helper module. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: nf_log_bridge: merge with nf_log_syslogFlorian Westphal5-93/+22
Provide bridge log support from nf_log_syslog. After the merge there is no need to load the "real packet loggers", all of them now reside in the same module. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31svcrdma: Clean up dto_q critical section in svc_rdma_recvfrom()Chuck Lever1-7/+7
This, to me, seems less cluttered and less redundant. I was hoping it could help reduce lock contention on the dto_q lock by reducing the size of the critical section, but alas, the only improvement is readability. Signed-off-by: Chuck Lever <[email protected]>
2021-03-31svcrdma: Remove svc_rdma_recv_ctxt::rc_pages and ::rc_argChuck Lever2-17/+0
These fields are no longer used. The size of struct svc_rdma_recv_ctxt is now less than 300 bytes on x86_64, down from 2440 bytes. Signed-off-by: Chuck Lever <[email protected]>
2021-03-31svcrdma: Remove sc_read_complete_qChuck Lever2-52/+6
Now that svc_rdma_recvfrom() waits for Read completion, sc_read_complete_q is no longer used. Signed-off-by: Chuck Lever <[email protected]>
2021-03-31svcrdma: Single-stage RDMA ReadChuck Lever2-67/+37
Currently the generic RPC server layer calls svc_rdma_recvfrom() twice to retrieve an RPC message that uses Read chunks. I'm not exactly sure why this design was chosen originally. Instead, let's wait for the Read chunk completion inline in the first call to svc_rdma_recvfrom(). The goal is to eliminate some page allocator churn. rdma_read_complete() replaces pages in the second svc_rqst by calling put_page() repeatedly while the upper layer waits for the request to be constructed, which adds unnecessary NFS WRITE round- trip latency. Signed-off-by: Chuck Lever <[email protected]> Reviewed-by: Tom Talpey <[email protected]>
2021-03-30mptcp: remove id 0 addressGeliang Tang1-0/+43
This patch added a new function mptcp_nl_remove_id_zero_address to remove the id 0 address. In this function, traverse all the existing msk sockets to find the msk matched the input IP address. Then fill the removing list with id 0, and pass it to mptcp_pm_remove_addr and mptcp_pm_remove_subflow. Suggested-by: Paolo Abeni <[email protected]> Suggested-by: Matthieu Baerts <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30mptcp: unify RM_ADDR and RM_SUBFLOW receivingGeliang Tang1-49/+33
There are some duplicate code in mptcp_pm_nl_rm_addr_received and mptcp_pm_nl_rm_subflow_received. This patch unifies them into a new function named mptcp_pm_nl_rm_addr_or_subflow. In it, use the input parameter rm_type to identify it's now removing an address or a subflow. Suggested-by: Paolo Abeni <[email protected]> Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30mptcp: remove all subflows involving id 0 addressGeliang Tang1-4/+0
There's only one subflow involving the non-zero id address, but there may be multi subflows involving the id 0 address. Here's an example: local_id=0, remote_id=0 local_id=1, remote_id=0 local_id=0, remote_id=1 If the removing address id is 0, all the subflows involving the id 0 address need to be removed. In mptcp_pm_nl_rm_addr_received/mptcp_pm_nl_rm_subflow_received, the "break" prevents the iteration to the next subflow, so this patch dropped them. Reviewed-by: Mat Martineau <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30net: fix icmp_echo_enable_probe sysctlEric Dumazet1-2/+2
sysctl_icmp_echo_enable_probe is an u8. ipv4_net_table entry should use .maxlen = sizeof(u8). .proc_handler = proc_dou8vec_minmax, Fixes: f1b8fa9fa586 ("net: add sysctl for enabling RFC 8335 PROBE messages") Signed-off-by: Eric Dumazet <[email protected]> Cc: Andreas Roeseler <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30udp: never accept GSO_FRAGLIST packetsPaolo Abeni1-0/+3
Currently the UDP protocol delivers GSO_FRAGLIST packets to the sockets without the expected segmentation. This change addresses the issue introducing and maintaining a couple of new fields to explicitly accept SKB_GSO_UDP_L4 or GSO_FRAGLIST packets. Additionally updates udp_unexpected_gso() accordingly. UDP sockets enabling UDP_GRO stil keep accept_udp_fraglist zeroed. v1 -> v2: - use 2 bits instead of a whole GSO bitmask (Willem) Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30udp: properly complete L4 GRO over UDP tunnel packetPaolo Abeni2-2/+9
After the previous patch, the stack can do L4 UDP aggregation on top of a UDP tunnel. In such scenario, udp{4,6}_gro_complete will be called twice. This function will enter its is_flist branch immediately, even though that is only correct on the second call, as GSO_FRAGLIST is only relevant for the inner packet. Instead, we need to try first UDP tunnel-based aggregation, if the GRO packet requires that. This patch changes udp{4,6}_gro_complete to skip the frag list processing when while encap_mark == 1, identifying processing of the outer tunnel header. Additionally, clears the field in udp_gro_complete() so that we can enter the frag list path on the next round, for the inner header. v1 -> v2: - hopefully clarified the commit message Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30udp: skip L4 aggregation for UDP tunnel packetsPaolo Abeni1-8/+11
If NETIF_F_GRO_FRAGLIST or NETIF_F_GRO_UDP_FWD are enabled, and there are UDP tunnels available in the system, udp_gro_receive() could end-up doing L4 aggregation (either SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST) at the outer UDP tunnel level for packets effectively carrying and UDP tunnel header. That could cause inner protocol corruption. If e.g. the relevant packets carry a vxlan header, different vxlan ids will be ignored/ aggregated to the same GSO packet. Inner headers will be ignored, too, so that e.g. TCP over vxlan push packets will be held in the GRO engine till the next flush, etc. Just skip the SKB_GSO_UDP_L4 and SKB_GSO_FRAGLIST code path if the current packet could land in a UDP tunnel, and let udp_gro_receive() do GRO via udp_sk(sk)->gro_receive. The check implemented in this patch is broader than what is strictly needed, as the existing UDP tunnel could be e.g. configured on top of a different device: we could end-up skipping GRO at-all for some packets. Anyhow, that is a very thin corner case and covering it will add quite a bit of complexity. v1 -> v2: - hopefully clarify the commit message Fixes: 9fd1ff5d2ac7 ("udp: Support UDP fraglist GRO/GSO.") Fixes: 36707061d6ba ("udp: allow forwarding of plain (non-fraglisted) UDP GRO packets") Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30udp: fixup csum for GSO receive slow pathPaolo Abeni2-0/+3
When UDP packets generated locally by a socket with UDP_SEGMENT traverse the following path: UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) -> UDP tunnel (rx) -> UDP socket (no UDP_GRO) ip_summed will be set to CHECKSUM_PARTIAL at creation time and such checksum mode will be preserved in the above path up to the UDP tunnel receive code where we have: __iptunnel_pull_header() -> skb_pull_rcsum() -> skb_postpull_rcsum() -> __skb_postpull_rcsum() The latter will convert the skb to CHECKSUM_NONE. The UDP GSO packet will be later segmented as part of the rx socket receive operation, and will present a CHECKSUM_NONE after segmentation. Additionally the segmented packets UDP CB still refers to the original GSO packet len. Overall that causes unexpected/wrong csum validation errors later in the UDP receive path. We could possibly address the issue with some additional checks and csum mangling in the UDP tunnel code. Since the issue affects only this UDP receive slow path, let's set a suitable csum status there. Note that SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST packets lacking an UDP encapsulation present a valid checksum when landing to udp_queue_rcv_skb(), as the UDP checksum has been validated by the GRO engine. v2 -> v3: - even more verbose commit message and comments v1 -> v2: - restrict the csum update to the packets strictly needing them - hopefully clarify the commit message and code comments Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30net/decnet: Delete obsolete TODO fileWang Qing1-40/+0
The TODO file here has not been updated from 2005, and the function development described in the file have been implemented or abandoned. Its existence will mislead developers seeking to view outdated information. Signed-off-by: Wang Qing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30net/ax25: Delete obsolete TODO fileWang Qing1-20/+0
The TODO file here has not been updated for 13 years, and the function development described in the file have been implemented or abandoned. Its existence will mislead developers seeking to view outdated information. Signed-off-by: Wang Qing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31netfilter: conntrack: do not print icmpv6 as unknown via /procPablo Neira Ayuso1-0/+1
/proc/net/nf_conntrack shows icmpv6 as unknown. Fixes: 09ec82f5af99 ("netfilter: conntrack: remove protocol name from l4proto struct") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: flowtable: fix NAT IPv6 offload manglingPablo Neira Ayuso1-3/+3
Fix out-of-bound access in the address array. Fixes: 5c27d8d76ce8 ("netfilter: nf_flow_table_offload: add IPv6 support") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: nf_log_netdev: merge with nf_log_syslogFlorian Westphal4-85/+36
Provide netdev family support from the nf_log_syslog module. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: nf_log_ipv6: merge with nf_log_syslogFlorian Westphal4-433/+360
This removes the nf_log_ipv6 module, the functionality is now provided by nf_log_syslog. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: nf_log_arp: merge with nf_log_syslogFlorian Westphal4-178/+115
similar to previous change: nf_log_syslog now covers ARP logging as well, the old nf_log_arp module is removed. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-31netfilter: nf_log_ipv4: rename to nf_log_syslogFlorian Westphal5-65/+76
Netfilter has multiple log modules: nf_log_arp nf_log_bridge nf_log_ipv4 nf_log_ipv6 nf_log_netdev nfnetlink_log nf_log_common With the exception of nfnetlink_log (packet is sent to userspace for dissection/logging), all of them log to the kernel ringbuffer. This is the first part of a series to merge all modules except nfnetlink_log into a single module: nf_log_syslog. This allows to reduce code. After the series, only two log modules remain: nfnetlink_log and nf_log_syslog. The latter provides the same functionality as the old per-af log modules. This renames nf_log_ipv4 to nf_log_syslog. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-03-30net: let skb_orphan_partial wake-up waiters.Paolo Abeni1-9/+3
Currently the mentioned helper can end-up freeing the socket wmem without waking-up any processes waiting for more write memory. If the partially orphaned skb is attached to an UDP (or raw) socket, the lack of wake-up can hang the user-space. Even for TCP sockets not calling the sk destructor could have bad effects on TSQ. Address the issue using skb_orphan to release the sk wmem before setting the new sock_efree destructor. Additionally bundle the whole ownership update in a new helper, so that later other potential users could avoid duplicate code. v1 -> v2: - use skb_orphan() instead of sort of open coding it (Eric) - provide an helper for the ownership change (Eric) Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()") Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30sch_htb: fix null pointer dereference on a null new_qYunjian Wang1-2/+3
sch_htb: fix null pointer dereference on a null new_q Currently if new_q is null, the null new_q pointer will be dereference when 'q->offload' is true. Fix this by adding a braces around htb_parent_to_leaf_offload() to avoid it. Addresses-Coverity: ("Dereference after null check") Fixes: d03b195b5aa0 ("sch_htb: Hierarchical QoS hardware offload") Signed-off-by: Yunjian Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30net: qrtr: Fix memory leak on qrtr_tx_wait failureLoic Poulain1-1/+4
qrtr_tx_wait does not check for radix_tree_insert failure, causing the 'flow' object to be unreferenced after qrtr_tx_wait return. Fix that by releasing flow on radix_tree_insert failure. Fixes: 5fdeb0d372ab ("net: qrtr: Implement outgoing flow control") Reported-by: [email protected] Signed-off-by: Loic Poulain <[email protected]> Reviewed-by: Bjorn Andersson <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30icmp: add response to RFC 8335 PROBE messagesAndreas Roeseler1-13/+121
Modify the icmp_rcv function to check PROBE messages and call icmp_echo if a PROBE request is detected. Modify the existing icmp_echo function to respond ot both ping and PROBE requests. This was tested using a custom modification to the iputils package and wireshark. It supports IPV4 probing by name, ifindex, and probing by both IPV4 and IPV6 addresses. It currently does not support responding to probes off the proxy node (see RFC 8335 Section 2). The modification to the iputils package is still in development and can be found here: https://github.com/Juniper-Clinic-2020/iputils.git. It supports full sending functionality of PROBE requests, but currently does not parse the response messages, which is why Wireshark is required to verify the sent and recieved PROBE messages. The modification adds the ``-e'' flag to the command which allows the user to specify the interface identifier to query the probed host. An example usage would be <./ping -4 -e 1 [destination]> to send a PROBE request of ifindex 1 to the destination node. Signed-off-by: Andreas Roeseler <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30ipv6: add ipv6_dev_find to stubsAndreas Roeseler2-0/+8
Add ipv6_dev_find to ipv6_stub to allow lookup of net_devices by IPV6 address in net/ipv4/icmp.c. Signed-off-by: Andreas Roeseler <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30net: add support for sending RFC 8335 PROBE messagesAndreas Roeseler1-1/+3
Modify the ping_supported function to support PROBE message types. This allows tools such as the ping command in the iputils package to be modified to send PROBE requests through the existing framework for sending ping requests. Signed-off-by: Andreas Roeseler <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30net: add sysctl for enabling RFC 8335 PROBE messagesAndreas Roeseler1-0/+9
Section 8 of RFC 8335 specifies potential security concerns of responding to PROBE requests, and states that nodes that support PROBE functionality MUST be able to enable/disable responses and that responses MUST be disabled by default Signed-off-by: Andreas Roeseler <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30net: sched: bump refcount for new action in ACT replace modeKumar Kartikeya Dwivedi1-0/+3
Currently, action creation using ACT API in replace mode is buggy. When invoking for non-existent action index 42, tc action replace action bpf obj foo.o sec <xyz> index 42 kernel creates the action, fills up the netlink response, and then just deletes the action after notifying userspace. tc action show action bpf doesn't list the action. This happens due to the following sequence when ovr = 1 (replace mode) is enabled: tcf_idr_check_alloc is used to atomically check and either obtain reference for existing action at index, or reserve the index slot using a dummy entry (ERR_PTR(-EBUSY)). This is necessary as pointers to these actions will be held after dropping the idrinfo lock, so bumping the reference count is necessary as we need to insert the actions, and notify userspace by dumping their attributes. Finally, we drop the reference we took using the tcf_action_put_many call in tcf_action_add. However, for the case where a new action is created due to free index, its refcount remains one. This when paired with the put_many call leads to the kernel setting up the action, notifying userspace of its creation, and then tearing it down. For existing actions, the refcount is still held so they remain unaffected. Fortunately due to rtnl_lock serialization requirement, such an action with refcount == 1 will not be concurrently deleted by anything else, at best CLS API can move its refcount up and down by binding to it after it has been published from tcf_idr_insert_many. Since refcount is atleast one until put_many call, CLS API cannot delete it. Also __tcf_action_put release path already ensures deterministic outcome (either new action will be created or existing action will be reused in case CLS API tries to bind to action concurrently) due to idr lock serialization. We fix this by making refcount of newly created actions as 2 in ACT API replace mode. A relaxed store will suffice as visibility is ensured only after the tcf_idr_insert_many call. Note that in case of creation or overwriting using CLS API only (i.e. bind = 1), overwriting existing action object is not allowed, and any such request is silently ignored (without error). The refcount bump that occurs in tcf_idr_check_alloc call there for existing action will pair with tcf_exts_destroy call made from the owner module for the same action. In case of action creation, there is no existing action, so no tcf_exts_destroy callback happens. This means no code changes for CLS API. Fixes: cae422f379f3 ("net: sched: use reference counting action init") Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30net/ncsi: Avoid channel_monitor hrtimer deadlockMilton Miller1-7/+13
Calling ncsi_stop_channel_monitor from channel_monitor is a guaranteed deadlock on SMP because stop calls del_timer_sync on the timer that invoked channel_monitor as its timer function. Recognise the inherent race of marking the monitor disabled before deleting the timer by just returning if enable was cleared. After a timeout (the default case -- reset to START when response received) just mark the monitor.enabled false. If the channel has an entry on the channel_queue list, or if the state is not ACTIVE or INACTIVE, then warn and mark the timer stopped and don't restart, as the locking is broken somehow. Fixes: 0795fb2021f0 ("net/ncsi: Stop monitor if channel times out or is inactive") Signed-off-by: Milton Miller <[email protected]> Signed-off-by: Eddie James <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-30batman-adv: Fix misspelled "wont"Sven Eckelmann1-1/+1
checkpatch started to complain about the mispelling of: CHECK: 'wont' may be misspelled - perhaps 'won't'? #459: FILE: ./net/batman-adv/bat_iv_ogm.c:459: + * - the resulting packet wont be bigger than Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2021-03-30xfrm/compat: Cleanup WARN()s that can be user-triggeredDmitry Safonov1-3/+9
Replace WARN_ONCE() that can be triggered from userspace with pr_warn_once(). Those still give user a hint what's the issue. I've left WARN()s that are not possible to trigger with current code-base and that would mean that the code has issues: - relying on current compat_msg_min[type] <= xfrm_msg_min[type] - expected 4-byte padding size difference between compat_msg_min[type] and xfrm_msg_min[type] - compat_policy[type].len <= xfrma_policy[type].len (for every type) Reported-by: [email protected] Fixes: 5f3eea6b7e8f ("xfrm/compat: Attach xfrm dumps to 64=>32 bit translator") Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Herbert Xu <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Steffen Klassert <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2021-03-29bpf: tcp: Limit calling some tcp cc functions to CONFIG_DYNAMIC_FTRACEMartin KaFai Lau1-0/+2
pahole currently only generates the btf_id for external function and ftrace-able function. Some functions in the bpf_tcp_ca_kfunc_ids are static (e.g. cubictcp_init). Thus, unless CONFIG_DYNAMIC_FTRACE is set, btf_ids for those functions will not be generated and the compilation fails during resolve_btfids. This patch limits those functions to CONFIG_DYNAMIC_FTRACE. I will address the pahole generation in a followup and then remove the CONFIG_DYNAMIC_FTRACE limitation. Fixes: e78aea8b2170 ("bpf: tcp: Put some tcp cong functions in allowlist for bpf-tcp-cc") Reported-by: Cong Wang <[email protected]> Reported-by: Lorenz Bauer <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2021-03-29tcp: fix tcp_min_tso_segs sysctlEric Dumazet1-2/+0
tcp_min_tso_segs is now stored in u8, so max value is 255. 255 limit is enforced by proc_dou8vec_minmax(). We can therefore remove the gso_max_segs variable. Fixes: 47996b489bdc ("tcp: convert elligible sysctls to u8") Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-29sit: proper dev_{hold|put} in ndo_[un]init methodsEric Dumazet1-3/+1
After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger a warning [1] Issue here is that: - all dev_put() should be paired with a corresponding prior dev_hold(). - A driver doing a dev_put() in its ndo_uninit() MUST also do a dev_hold() in its ndo_init(), only when ndo_init() is returning 0. Otherwise, register_netdevice() would call ndo_uninit() in its error path and release a refcount too soon. Fixes: 919067cc845f ("net: add CONFIG_PCPU_DEV_REFCNT") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-29ip6_vti: proper dev_{hold|put} in ndo_[un]init methodsEric Dumazet1-1/+1
After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger a warning [1] Issue here is that: - all dev_put() should be paired with a corresponding prior dev_hold(). - A driver doing a dev_put() in its ndo_uninit() MUST also do a dev_hold() in its ndo_init(), only when ndo_init() is returning 0. Otherwise, register_netdevice() would call ndo_uninit() in its error path and release a refcount too soon. Therefore, we need to move dev_hold() call from vti6_tnl_create2() to vti6_dev_init_gen() [1] WARNING: CPU: 0 PID: 15951 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 0 PID: 15951 Comm: syz-executor.3 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d 6a 5a e8 09 31 ff 89 de e8 8d 1a ab fd 84 db 75 e0 e8 d4 13 ab fd 48 c7 c7 a0 e1 c1 89 c6 05 4a 5a e8 09 01 e8 2e 36 fb 04 <0f> 0b eb c4 e8 b8 13 ab fd 0f b6 1d 39 5a e8 09 31 ff 89 de e8 58 RSP: 0018:ffffc90001eaef28 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000040000 RSI: ffffffff815c51f5 RDI: fffff520003d5dd7 RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815bdf8e R11: 0000000000000000 R12: ffff88801bb1c568 R13: ffff88801f69e800 R14: 00000000ffffffff R15: ffff888050889d40 FS: 00007fc79314e700(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f1c1ff47108 CR3: 0000000020fd5000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] dev_put include/linux/netdevice.h:4135 [inline] vti6_dev_uninit+0x31a/0x360 net/ipv6/ip6_vti.c:297 register_netdevice+0xadf/0x1500 net/core/dev.c:10308 vti6_tnl_create2+0x1b5/0x400 net/ipv6/ip6_vti.c:190 vti6_newlink+0x9d/0xd0 net/ipv6/ip6_vti.c:1020 __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3443 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3491 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x331/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmmsg+0x195/0x470 net/socket.c:2490 __do_sys_sendmmsg net/socket.c:2519 [inline] __se_sys_sendmmsg net/socket.c:2516 [inline] __x64_sys_sendmmsg+0x99/0x100 net/socket.c:2516 Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-29ip6_gre: proper dev_{hold|put} in ndo_[un]init methodsEric Dumazet1-2/+2
After adopting CONFIG_PCPU_DEV_REFCNT=n option, syzbot was able to trigger a warning [1] Issue here is that: - all dev_put() should be paired with a corresponding dev_hold(), and vice versa. - A driver doing a dev_put() in its ndo_uninit() MUST also do a dev_hold() in its ndo_init(), only when ndo_init() is returning 0. Otherwise, register_netdevice() would call ndo_uninit() in its error path and release a refcount too soon. ip6_gre for example (among others problematic drivers) has to use dev_hold() in ip6gre_tunnel_init_common() instead of from ip6gre_newlink_common(), covering both ip6gre_tunnel_init() and ip6gre_tap_init()/ Note that ip6gre_tunnel_init_common() is not called from ip6erspan_tap_init() thus we also need to add a dev_hold() there, as ip6erspan_tunnel_uninit() does call dev_put() [1] refcount_t: decrement hit 0; leaking memory. WARNING: CPU: 0 PID: 8422 at lib/refcount.c:31 refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Modules linked in: CPU: 1 PID: 8422 Comm: syz-executor854 Not tainted 5.12.0-rc4-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:refcount_warn_saturate+0xbf/0x1e0 lib/refcount.c:31 Code: 1d 6a 5a e8 09 31 ff 89 de e8 8d 1a ab fd 84 db 75 e0 e8 d4 13 ab fd 48 c7 c7 a0 e1 c1 89 c6 05 4a 5a e8 09 01 e8 2e 36 fb 04 <0f> 0b eb c4 e8 b8 13 ab fd 0f b6 1d 39 5a e8 09 31 ff 89 de e8 58 RSP: 0018:ffffc900018befd0 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff88801ef19c40 RSI: ffffffff815c51f5 RDI: fffff52000317dec RBP: 0000000000000004 R08: 0000000000000000 R09: 0000000000000000 R10: ffffffff815bdf8e R11: 0000000000000000 R12: ffff888018cf4568 R13: ffff888018cf4c00 R14: ffff8880228f2000 R15: ffffffff8d659b80 FS: 00000000014eb300(0000) GS:ffff8880b9c00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055d7bf2b3138 CR3: 0000000014933000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __refcount_dec include/linux/refcount.h:344 [inline] refcount_dec include/linux/refcount.h:359 [inline] dev_put include/linux/netdevice.h:4135 [inline] ip6gre_tunnel_uninit+0x3d7/0x440 net/ipv6/ip6_gre.c:420 register_netdevice+0xadf/0x1500 net/core/dev.c:10308 ip6gre_newlink_common.constprop.0+0x158/0x410 net/ipv6/ip6_gre.c:1984 ip6gre_newlink+0x275/0x7a0 net/ipv6/ip6_gre.c:2017 __rtnl_newlink+0x1062/0x1710 net/core/rtnetlink.c:3443 rtnl_newlink+0x64/0xa0 net/core/rtnetlink.c:3491 rtnetlink_rcv_msg+0x44e/0xad0 net/core/rtnetlink.c:5553 netlink_rcv_skb+0x153/0x420 net/netlink/af_netlink.c:2502 netlink_unicast_kernel net/netlink/af_netlink.c:1312 [inline] netlink_unicast+0x533/0x7d0 net/netlink/af_netlink.c:1338 netlink_sendmsg+0x856/0xd90 net/netlink/af_netlink.c:1927 sock_sendmsg_nosec net/socket.c:654 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:674 ____sys_sendmsg+0x6e8/0x810 net/socket.c:2350 ___sys_sendmsg+0xf3/0x170 net/socket.c:2404 __sys_sendmsg+0xe5/0x1b0 net/socket.c:2433 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 Fixes: 919067cc845f ("net: add CONFIG_PCPU_DEV_REFCNT") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-29tipc: fix htmldoc and smatch warningsJon Maloy2-2/+3
We fix a warning from the htmldoc tool and an indentation error reported by smatch. There are no functional changes in this commit. Signed-off-by: Jon Maloy <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-29net:tipc: Fix a double free in tipc_sk_mcast_rcvLv Yunlong1-1/+1
In the if(skb_peek(arrvq) == skb) branch, it calls __skb_dequeue(arrvq) to get the skb by skb = skb_peek(arrvq). Then __skb_dequeue() unlinks the skb from arrvq and returns the skb which equals to skb_peek(arrvq). After __skb_dequeue(arrvq) finished, the skb is freed by kfree_skb(__skb_dequeue(arrvq)) in the first time. Unfortunately, the same skb is freed in the second time by kfree_skb(skb) after the branch completed. My patch removes kfree_skb() in the if(skb_peek(arrvq) == skb) branch, because this skb will be freed by kfree_skb(skb) finally. Fixes: cb1b728096f54 ("tipc: eliminate race condition at multicast reception") Signed-off-by: Lv Yunlong <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-29net: dsa: Fix type was not set for devlink portMaxim Kochetkov1-1/+7
If PHY is not available on DSA port (described at devicetree but absent or failed to detect) then kernel prints warning after 3700 secs: [ 3707.948771] ------------[ cut here ]------------ [ 3707.948784] Type was not set for devlink port. [ 3707.948894] WARNING: CPU: 1 PID: 17 at net/core/devlink.c:8097 0xc083f9d8 We should unregister the devlink port as a user port and re-register it as an unused port before executing "continue" in case of dsa_port_setup error. Fixes: 86f8b1c01a0a ("net: dsa: Do not make user port errors fatal") Signed-off-by: Maxim Kochetkov <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>