aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2021-04-19gro: fix napi_gro_frags() Fast GRO breakage due to IP alignment checkAlexander Lobakin1-4/+4
Commit 38ec4944b593 ("gro: ensure frag0 meets IP header alignment") did the right thing, but missed the fact that napi_gro_frags() logics calls for skb_gro_reset_offset() *before* pulling Ethernet header to the skb linear space. That said, the introduced check for frag0 address being aligned to 4 always fails for it as Ethernet header is obviously 14 bytes long, and in case with NET_IP_ALIGN its start is not aligned to 4. Fix this by adding @nhoff argument to skb_gro_reset_offset() which tells if an IP header is placed right at the start of frag0 or not. This restores Fast GRO for napi_gro_frags() that became very slow after the mentioned commit, and preserves the introduced check to avoid silent unaligned accesses. From v1 [0]: - inline tiny skb_gro_reset_offset() to let the code be optimized more efficively (esp. for the !NET_IP_ALIGN case) (Eric); - pull in Reviewed-by from Eric. [0] https://lore.kernel.org/netdev/[email protected] Fixes: 38ec4944b593 ("gro: ensure frag0 meets IP header alignment") Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Alexander Lobakin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller15-88/+333
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Add vlan match and pop actions to the flowtable offload, patches from wenxu. 2) Reduce size of the netns_ct structure, which itself is embedded in struct net Make netns_ct a read-mostly structure. Patches from Florian Westphal. 3) Add FLOW_OFFLOAD_XMIT_UNSPEC to skip dst check from garbage collector path, as required by the tc CT action. From Roi Dayan. 4) VLAN offload fixes for nftables: Allow for matching on both s-vlan and c-vlan selectors. Fix match of VLAN id due to incorrect byteorder. Add a new routine to properly populate flow dissector ethertypes. 5) Missing keys in ip{6}_route_me_harder() results in incorrect routes. This includes an update for selftest infra. Patches from Ido Schimmel. 6) Add counter hardware offload support through FLOW_CLS_STATS. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-04-19net: sched: tapr: prevent cycle_time == 0 in parse_taprio_scheduleDu Cheng1-0/+6
There is a reproducible sequence from the userland that will trigger a WARN_ON() condition in taprio_get_start_time, which causes kernel to panic if configured as "panic_on_warn". Catch this condition in parse_taprio_schedule to prevent this condition. Reported as bug on syzkaller: https://syzkaller.appspot.com/bug?extid=d50710fd0873a9c6b40c Reported-by: [email protected] Signed-off-by: Du Cheng <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-19ethtool: ioctl: Fix out-of-bounds warning in store_link_ksettings_for_user()Gustavo A. R. Silva1-1/+1
Fix the following out-of-bounds warning: net/ethtool/ioctl.c:492:2: warning: 'memcpy' offset [49, 84] from the object at 'link_usettings' is out of the bounds of referenced subobject 'base' with type 'struct ethtool_link_settings' at offset 0 [-Warray-bounds] The problem is that the original code is trying to copy data into a some struct members adjacent to each other in a single call to memcpy(). This causes a legitimate compiler warning because memcpy() overruns the length of &link_usettings.base. Fix this by directly using &link_usettings and _from_ as destination and source addresses, instead. This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). Link: https://github.com/KSPP/linux/issues/109 Reported-by: kernel test robot <[email protected]> Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-19mld: remove unnecessary prototypesTaehee Yoo1-3/+0
Some prototypes are unnecessary, so delete it. Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-19nexthop: Restart nexthop dump based on last dumped nexthop identifierIdo Schimmel1-8/+6
Currently, a multi-part nexthop dump is restarted based on the number of nexthops that have been dumped so far. This can result in a lot of nexthops not being dumped when nexthops are simultaneously deleted: # ip nexthop | wc -l 65536 # ip nexthop flush Dump was interrupted and may be inconsistent. Flushed 36040 nexthops # ip nexthop | wc -l 29496 Instead, restart the dump based on the nexthop identifier (fixed number) of the last successfully dumped nexthop: # ip nexthop | wc -l 65536 # ip nexthop flush Dump was interrupted and may be inconsistent. Flushed 65536 nexthops # ip nexthop | wc -l 0 Reported-by: Maksym Yaremchuk <[email protected]> Tested-by: Maksym Yaremchuk <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Petr Machata <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-19vsock/vmci: log once the failed queue pair allocationStefano Garzarella1-2/+1
VMCI feature is not supported in conjunction with the vSphere Fault Tolerance (FT) feature. VMware Tools can repeatedly try to create a vsock connection. If FT is enabled the kernel logs is flooded with the following messages: qp_alloc_hypercall result = -20 Could not attach to queue pair with -20 "qp_alloc_hypercall result = -20" was hidden by commit e8266c4c3307 ("VMCI: Stop log spew when qp allocation isn't possible"), but "Could not attach to queue pair with -20" is still there flooding the log. Since the error message can be useful in some cases, print it only once. Fixes: d021c344051a ("VSOCK: Introduce VM Sockets") Signed-off-by: Stefano Garzarella <[email protected]> Reviewed-by: Jorgen Hansen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-19cfg80211: scan: drop entry from hidden_list on overflowJohannes Berg1-0/+2
If we overflow the maximum number of BSS entries and free the new entry, drop it from any hidden_list that it may have been added to in the code above or in cfg80211_combine_bsses(). Reported-by: Dan Carpenter <[email protected]> Link: https://lore.kernel.org/r/20210416094212.5de7d1676ad7.Ied283b0bc5f504845e7d6ab90626bdfa68bb3dc0@changeid Cc: [email protected] Signed-off-by: Johannes Berg <[email protected]>
2021-04-19wireless: fix spelling of A-MSDU in HE capabilitiesJohannes Berg1-1/+1
In the HE capabilities, spell A-MSDU correctly, not "A-MDSU". Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/iwlwifi.20210409123755.9e6ff1af1181.If6868bc6902ccd9a95c74c78f716c4b41473ef14@changeid Signed-off-by: Johannes Berg <[email protected]>
2021-04-19wireless: align HE capabilities A-MPDU Length Exponent ExtensionJohannes Berg1-8/+8
The A-MPDU length exponent extension is defined differently in 802.11ax D6.1, align with that. Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/iwlwifi.20210409123755.c2a257d3e2df.I3455245d388c52c61dace7e7958dbed7e807cfb6@changeid Signed-off-by: Johannes Berg <[email protected]>
2021-04-19wireless: align some HE capabilities with the specJohannes Berg1-9/+10
Some names were changed, align that with the spec as of 802.11ax-D6.1. Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/iwlwifi.20210409123755.b1e5fbab0d8c.I3eb6076cb0714ec6aec6b8f9dee613ce4a05d825@changeid Signed-off-by: Johannes Berg <[email protected]>
2021-04-19xfrm: ipcomp: remove unnecessary get_cpu()Sabrina Dubroca1-17/+8
While testing ipcomp on a realtime kernel, Xiumei reported a "sleeping in atomic" bug, caused by a memory allocation while preemption is disabled (ipcomp_decompress -> alloc_page -> ... get_page_from_freelist). As Sebastian noted [1], this get_cpu() isn't actually needed, since ipcomp_decompress() is called in napi context anyway, so BH is already disabled. This patch replaces get_cpu + per_cpu_ptr with this_cpu_ptr, then simplifies the error returns, since there isn't any common operation left. [1] https://lore.kernel.org/lkml/[email protected]/ Cc: Juri Lelli <[email protected]> Reported-by: Xiumei Mu <[email protected]> Suggested-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Sabrina Dubroca <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2021-04-19xfrm: avoid synchronize_rcu during netns destructionFlorian Westphal1-3/+7
Use the new exit_pre hook to NULL the netlink socket. The net namespace core will do a synchronize_rcu() between the exit_pre and exit/exit_batch handlers. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2021-04-19xfrm: remove stray synchronize_rcu from xfrm_initFlorian Westphal1-3/+0
This function is called during boot, from ipv4 stack, there is no need to set the pointer to NULL (static storage duration, so already NULL). No need for the synchronize_rcu either. Remove both. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2021-04-19flow: remove spi key from flowi structFlorian Westphal1-39/+0
xfrm session decode ipv4 path (but not ipv6) sets this, but there are no consumers. Remove it. Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Steffen Klassert <[email protected]>
2021-04-19mac80211: drop the connection if firmware crashed while in CSANaftali Goldstein3-3/+12
Don't bother keeping the link in that case. It is way too complicated to keep the connection. Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Naftali Goldstein <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/iwlwifi.20210409123755.a126c8833398.I677bdac314dd50d90474a90593902c17f9410cc4@changeid Signed-off-by: Johannes Berg <[email protected]>
2021-04-19mac80211: properly drop the connection in case of invalid CSA IEEmmanuel Grumbach1-5/+2
In case the frequency is invalid, ieee80211_parse_ch_switch_ie will fail and we may not even reach the check in ieee80211_sta_process_chanswitch. Drop the connection in case ieee80211_parse_ch_switch_ie failed, but still take into account the CSA mode to remember not to send a deauth frame in case if it is forbidden to. Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/iwlwifi.20210409123755.34712ef96a0a.I75d7ad7f1d654e8b0aa01cd7189ff00a510512b3@changeid Signed-off-by: Johannes Berg <[email protected]>
2021-04-19mac80211: make ieee80211_vif_to_wdev work when the vif isn't in the driverEmmanuel Grumbach1-9/+1
This will allow the low level driver to get the wdev during the add_interface flow. In order to do that, remove a few checks from there and do not return NULL for vifs that were not yet added to the driver. Note that all the current callers of this helper function assume that the vif already exists: - The callers from the drivers already have a vif pointer. Before this change, ieee80211_vif_to_wdev would return NULL in some cases, but those callers don't even check they get a non-NULL pointer from ieee80211_vif_to_wdev. - The callers from net/mac80211/cfg.c assume the vif is already added to the driver as well. So, this change has no impact on existing callers of this helper function. Signed-off-by: Emmanuel Grumbach <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/iwlwifi.20210409123755.6078d3517095.I1907a45f267a62dab052bcc44428aa7a2005ffc9@changeid Signed-off-by: Johannes Berg <[email protected]>
2021-04-19nl80211/cfg80211: add a flag to negotiate for LMR feedback in NDP rangingAvraham Stern2-1/+12
Add a flag that indicates that the ISTA shall indicate support for LMR feedback in NDP ranging negotiation. Signed-off-by: Avraham Stern <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/iwlwifi.20210409123755.eff546283504.I2606161e700ac24d94d0b50c8edcdedd4c0395c2@changeid Signed-off-by: Johannes Berg <[email protected]>
2021-04-19mac80211: aes_cmac: check crypto_shash_setkey() return valueJohannes Berg1-2/+9
As crypto_shash_setkey() can fail, we should check the return value. Addresses-Coverity-ID: 1401813 ("Unchecked return value") Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/iwlwifi.20210409123755.533ff7acf1d2.I034bafa201c4a6823333f8410aeaa60cca5ee9e0@changeid Signed-off-by: Johannes Berg <[email protected]>
2021-04-19mac80211: minstrel_ht: remove extraneous indentation on if statementColin Ian King1-1/+1
The increment of idx is indented one level too deeply, clean up the code by removing the extraneous tab. Signed-off-by: Colin Ian King <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2021-04-19mac80211: properly process TXQ management framesJohannes Berg1-5/+14
My previous commit to not apply flow control to management frames that are going over TXQs (which is currently only the case for iwlwifi, I think) broke things, with iwlwifi firmware crashing on certain frames. As it turns out, that was due to the frame being too short: space for the MIC wasn't added at the end of encrypted management frames. Clearly, this is due to using the 'frags' queue - this is meant only for frames that have already been processed for TX, and the code in ieee80211_tx_dequeue() just returns them. This caused all management frames to now not get any TX processing. To fix this, use IEEE80211_TX_INTCFL_NEED_TXPROCESSING (which is currently used only in other circumstances) to indicate that the frames need processing, and clear it immediately after so that, at least in theory, MMPDUs can be fragmented. Fixes: 73bc9e0af594 ("mac80211: don't apply flow control on management frames") Acked-by: Toke Høiland-Jørgensen <[email protected]> Link: https://lore.kernel.org/r/20210416134702.ef8486a64293.If0a9025b39c71bb91b11dd6ac45547aba682df34@changeid Signed-off-by: Johannes Berg <[email protected]>
2021-04-19cfg80211: constify ieee80211_get_response_rate returnJoe Perches1-1/+1
It's not modified so make it const with the eventual goal of moving data to text for various static struct ieee80211_rate arrays. Signed-off-by: Joe Perches <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2021-04-18netfilter: nftables: counter hardware offload supportPablo Neira Ayuso3-7/+69
This patch adds the .offload_stats operation to synchronize hardware stats with the expression data. Update the counter expression to use this new interface. The hardware stats are retrieved from the netlink dump path via FLOW_CLS_STATS command to the driver. Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-04-18netfilter: Dissect flow after packet manglingIdo Schimmel2-0/+4
Netfilter tries to reroute mangled packets as a different route might need to be used following the mangling. When this happens, netfilter does not populate the IP protocol, the source port and the destination port in the flow key. Therefore, FIB rules that match on these fields are ignored and packets can be misrouted. Solve this by dissecting the outer flow and populating the flow key before rerouting the packet. Note that flow dissection only happens when FIB rules that match on these fields are installed, so in the common case there should not be a penalty. Reported-by: Michal Soltys <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-04-18netfilter: nftables_offload: special ethertype handling for VLANPablo Neira Ayuso1-0/+44
The nftables offload parser sets FLOW_DISSECTOR_KEY_BASIC .n_proto to the ethertype field in the ethertype frame. However: - FLOW_DISSECTOR_KEY_BASIC .n_proto field always stores either IPv4 or IPv6 ethertypes. - FLOW_DISSECTOR_KEY_VLAN .vlan_tpid stores either the 802.1q and 802.1ad ethertypes. Same as for FLOW_DISSECTOR_KEY_CVLAN. This function adjusts the flow dissector to handle two scenarios: 1) FLOW_DISSECTOR_KEY_VLAN .vlan_tpid is set to 802.1q or 802.1ad. Then, transfer: - the .n_proto field to FLOW_DISSECTOR_KEY_VLAN .tpid. - the original FLOW_DISSECTOR_KEY_VLAN .tpid to the FLOW_DISSECTOR_KEY_CVLAN .tpid - the original FLOW_DISSECTOR_KEY_CVLAN .tpid to the .n_proto field. 2) .n_proto is set to 802.1q or 802.1ad. Then, transfer: - the .n_proto field to FLOW_DISSECTOR_KEY_VLAN .tpid. - the original FLOW_DISSECTOR_KEY_VLAN .tpid to the .n_proto field. Fixes: a82055af5959 ("netfilter: nft_payload: add VLAN offload support") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-04-18netfilter: nftables_offload: VLAN id needs host byteorder in flow dissectorPablo Neira Ayuso2-6/+45
The flow dissector representation expects the VLAN id in host byteorder. Add the NFT_OFFLOAD_F_NETWORK2HOST flag to swap the bytes from nft_cmp. Fixes: a82055af5959 ("netfilter: nft_payload: add VLAN offload support") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-04-18netfilter: nft_payload: fix C-VLAN offload supportPablo Neira Ayuso1-2/+3
- add another struct flow_dissector_key_vlan for C-VLAN - update layer 3 dependency to allow to match on IPv4/IPv6 Fixes: 89d8fd44abfb ("netfilter: nft_payload: add C-VLAN offload support") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2021-04-17Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski21-57/+154
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c - keep the ZC code, drop the code related to reinit net/bridge/netfilter/ebtables.c - fix build after move to net_generic Signed-off-by: Jakub Kicinski <[email protected]>
2021-04-16mptcp: use mptcp_for_each_subflow in mptcp_closeGeliang Tang1-1/+1
This patch used the macro helper mptcp_for_each_subflow() instead of list_for_each_entry() in mptcp_close. Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16mptcp: add tracepoint in subflow_check_data_availGeliang Tang1-3/+1
This patch added a tracepoint in subflow_check_data_avail() to show the mapping status. Suggested-by: Paolo Abeni <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16mptcp: add tracepoint in ack_update_mskGeliang Tang1-0/+6
This patch added a tracepoint in ack_update_msk() to track the incoming data_ack and window/snd_una updates. Suggested-by: Paolo Abeni <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16mptcp: add tracepoint in get_mapping_statusGeliang Tang1-3/+3
This patch added a tracepoint in the mapping status function get_mapping_status() to dump every mpext field. Suggested-by: Paolo Abeni <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16mptcp: add tracepoint in mptcp_subflow_get_sendGeliang Tang1-4/+4
This patch added a tracepoint in the packet scheduler function mptcp_subflow_get_send(). Suggested-by: Paolo Abeni <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16mptcp: export mptcp_subflow_activeGeliang Tang2-12/+12
This patch moved the static function mptcp_subflow_active to protocol.h as an inline one. Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16mptcp: fix format specifiers for unsigned intGeliang Tang1-2/+2
Some of the sequence numbers are printed as the negative ones in the debug log: [ 46.250932] MPTCP: DSS [ 46.250940] MPTCP: data_fin=0 dsn64=0 use_map=0 ack64=1 use_ack=1 [ 46.250948] MPTCP: data_ack=2344892449471675613 [ 46.251012] MPTCP: msk=000000006e157e3f status=10 [ 46.251023] MPTCP: msk=000000006e157e3f snd_data_fin_enable=0 pending=0 snd_nxt=2344892449471700189 write_seq=2344892449471700189 [ 46.251343] MPTCP: msk=00000000ec44a129 ssk=00000000f7abd481 sending dfrag at seq=-1658937016627538668 len=100 already sent=0 [ 46.251360] MPTCP: data_seq=16787807057082012948 subflow_seq=1 data_len=100 dsn64=1 This patch used the format specifier %u instead of %d for the unsigned int values to fix it. Fixes: d9ca1de8c0cd ("mptcp: move page frag allocation in mptcp_sendmsg()") Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Geliang Tang <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16kunit: mptcp: adhere to KUNIT formatting standardNico Pache4-4/+4
Drop 'S' from end of CONFIG_MPTCP_KUNIT_TESTS in order to adhere to the KUNIT *_KUNIT_TEST config name format. Fixes: a00a582203db (mptcp: move crypto test to KUNIT) Reviewed-by: David Gow <[email protected]> Reviewed-by: Matthieu Baerts <[email protected]> Signed-off-by: Nico Pache <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16flow_dissector: Fix out-of-bounds warning in __skb_flow_bpf_to_target()Gustavo A. R. Silva1-2/+4
Fix the following out-of-bounds warning: net/core/flow_dissector.c:835:3: warning: 'memcpy' offset [33, 48] from the object at 'flow_keys' is out of the bounds of referenced subobject 'ipv6_src' with type '__u32[4]' {aka 'unsigned int[4]'} at offset 16 [-Warray-bounds] The problem is that the original code is trying to copy data into a couple of struct members adjacent to each other in a single call to memcpy(). So, the compiler legitimately complains about it. As these are just a couple of members, fix this by copying each one of them in separate calls to memcpy(). This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). Link: https://github.com/KSPP/linux/issues/109 Reported-by: kernel test robot <[email protected]> Signed-off-by: Gustavo A. R. Silva <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16netlink: don't call ->netlink_bind with table lock heldFlorian Westphal1-2/+2
When I added support to allow generic netlink multicast groups to be restricted to subscribers with CAP_NET_ADMIN I was unaware that a genl_bind implementation already existed in the past. It was reverted due to ABBA deadlock: 1. ->netlink_bind gets called with the table lock held. 2. genetlink bind callback is invoked, it grabs the genl lock. But when a new genl subsystem is (un)registered, these two locks are taken in reverse order. One solution would be to revert again and add a comment in genl referring 1e82a62fec613, "genetlink: remove genl_bind"). This would need a second change in mptcp to not expose the raw token value anymore, e.g. by hashing the token with a secret key so userspace can still associate subflow events with the correct mptcp connection. However, Paolo Abeni reminded me to double-check why the netlink table is locked in the first place. I can't find one. netlink_bind() is already called without this lock when userspace joins a group via NETLINK_ADD_MEMBERSHIP setsockopt. Same holds for the netlink_unbind operation. Digging through the history, commit f773608026ee1 ("netlink: access nlk groups safely in netlink bind and getname") expanded the lock scope. commit 3a20773beeeeade ("net: netlink: cap max groups which will be considered in netlink_bind()") ... removed the nlk->ngroups access that the lock scope extension was all about. Reduce the lock scope again and always call ->netlink_bind without the table lock. The Fixes tag should be vs. the patch mentioned in the link below, but that one got squash-merged into the patch that came earlier in the series. Fixes: 4d54cc32112d8d ("mptcp: avoid lock_fast usage in accept path") Link: https://lore.kernel.org/mptcp/[email protected]/T/#u Cc: Cong Wang <[email protected]> Cc: Xin Long <[email protected]> Cc: Johannes Berg <[email protected]> Cc: Sean Tranchetti <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Pablo Neira Ayuso <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16ethtool: add interface to read RMON statsJakub Kicinski3-0/+93
Most devices maintain RMON (RFC 2819) stats - particularly the "histogram" of packets received by size. Unlike other RFCs which duplicate IEEE stats, the short/oversized frame counters in RMON don't seem to match IEEE stats 1-to-1 either, so expose those, too. Do not expose basic packet, CRC errors etc - those are already otherwise covered. Because standard defines packet ranges only up to 1518, and everything above that should theoretically be "oversized" - devices often create their own ranges. Going beyond what the RFC defines - expose the "histogram" in the Tx direction (assume for now that the ranges will be the same). Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16ethtool: add interface to read standard MAC Ctrl statsJakub Kicinski3-0/+39
Number of devices maintains the standard-based MAC control counters for control frames. Add a API for those. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16ethtool: add interface to read standard MAC statsJakub Kicinski3-0/+96
Most of the MAC statistics are included in struct rtnl_link_stats64, but some fields are aggregated. Besides it's good to expose these clearly hardware stats separately. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16ethtool: add a new command for reading standard statsJakub Kicinski5-1/+226
Add an interface for reading standard stats, including stats which don't have a corresponding control interface. Start with IEEE 802.3 PHY stats. There seems to be only one stat to expose there. Define API to not require user space changes when new stats or groups are added. Groups are based on bitset, stats have a string set associated. v1: wrap stats in a nest Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16sctp: Fix out-of-bounds warning in sctp_process_asconf_param()Gustavo A. R. Silva1-1/+1
Fix the following out-of-bounds warning: net/sctp/sm_make_chunk.c:3150:4: warning: 'memcpy' offset [17, 28] from the object at 'addr' is out of the bounds of referenced subobject 'v4' with type 'struct sockaddr_in' at offset 0 [-Warray-bounds] This helps with the ongoing efforts to globally enable -Warray-bounds and get us closer to being able to tighten the FORTIFY_SOURCE routines on memcpy(). Link: https://github.com/KSPP/linux/issues/109 Reported-by: kernel test robot <[email protected]> Signed-off-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16mld: fix suspicious RCU usage in __ipv6_dev_mc_dec()Taehee Yoo1-0/+2
__ipv6_dev_mc_dec() internally uses sleepable functions so that caller must not acquire atomic locks. But caller, which is addrconf_verify_rtnl() acquires rcu_read_lock_bh(). So this warning occurs in the __ipv6_dev_mc_dec(). Test commands: ip netns add A ip link add veth0 type veth peer name veth1 ip link set veth1 netns A ip link set veth0 up ip netns exec A ip link set veth1 up ip a a 2001:db8::1/64 dev veth0 valid_lft 2 preferred_lft 1 Splat looks like: ============================ WARNING: suspicious RCU usage 5.12.0-rc6+ #515 Not tainted ----------------------------- kernel/sched/core.c:8294 Illegal context switch in RCU-bh read-side critical section! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 4 locks held by kworker/4:0/1997: #0: ffff88810bd72d48 ((wq_completion)ipv6_addrconf){+.+.}-{0:0}, at: process_one_work+0x761/0x1440 #1: ffff888105c8fe00 ((addr_chk_work).work){+.+.}-{0:0}, at: process_one_work+0x795/0x1440 #2: ffffffffb9279fb0 (rtnl_mutex){+.+.}-{3:3}, at: addrconf_verify_work+0xa/0x20 #3: ffffffffb8e30860 (rcu_read_lock_bh){....}-{1:2}, at: addrconf_verify_rtnl+0x23/0xc60 stack backtrace: CPU: 4 PID: 1997 Comm: kworker/4:0 Not tainted 5.12.0-rc6+ #515 Workqueue: ipv6_addrconf addrconf_verify_work Call Trace: dump_stack+0xa4/0xe5 ___might_sleep+0x27d/0x2b0 __mutex_lock+0xc8/0x13f0 ? lock_downgrade+0x690/0x690 ? __ipv6_dev_mc_dec+0x49/0x2a0 ? mark_held_locks+0xb7/0x120 ? mutex_lock_io_nested+0x1270/0x1270 ? lockdep_hardirqs_on_prepare+0x12c/0x3e0 ? _raw_spin_unlock_irqrestore+0x47/0x50 ? trace_hardirqs_on+0x41/0x120 ? __wake_up_common_lock+0xc9/0x100 ? __wake_up_common+0x620/0x620 ? memset+0x1f/0x40 ? netlink_broadcast_filtered+0x2c4/0xa70 ? __ipv6_dev_mc_dec+0x49/0x2a0 __ipv6_dev_mc_dec+0x49/0x2a0 ? netlink_broadcast_filtered+0x2f6/0xa70 addrconf_leave_solict.part.64+0xad/0xf0 ? addrconf_join_solict.part.63+0xf0/0xf0 ? nlmsg_notify+0x63/0x1b0 __ipv6_ifa_notify+0x22c/0x9c0 ? inet6_fill_ifaddr+0xbe0/0xbe0 ? lockdep_hardirqs_on_prepare+0x12c/0x3e0 ? __local_bh_enable_ip+0xa5/0xf0 ? ipv6_del_addr+0x347/0x870 ipv6_del_addr+0x3b1/0x870 ? addrconf_ifdown+0xfe0/0xfe0 ? rcu_read_lock_any_held.part.27+0x20/0x20 addrconf_verify_rtnl+0x8a9/0xc60 addrconf_verify_work+0xf/0x20 process_one_work+0x84c/0x1440 In order to avoid this problem, it uses rcu_read_unlock_bh() for a short time. RCU is used for avoiding freeing ifp(struct *inet6_ifaddr) while ifp is being used. But this will not be released even if rcu_read_unlock_bh() is used. Because before rcu_read_unlock_bh(), it uses in6_ifa_hold(ifp). So this is safe. Fixes: 63ed8de4be81 ("mld: add mc_lock for protecting per-interface mld data") Suggested-by: Eric Dumazet <[email protected]> Reported-by: Eric Dumazet <[email protected]> Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16mptcp: sockopt: add TCP_CONGESTION and TCP_INFOFlorian Westphal2-0/+106
TCP_CONGESTION is set for all subflows. The mptcp socket gains icsk_ca_ops too so it can be used to keep the authoritative state that should be set on new/future subflows. TCP_INFO will return first subflow only. The out-of-tree kernel has a MPTCP_INFO getsockopt, this could be added later on. Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16mptcp: setsockopt: SO_DEBUG and no-op optionsFlorian Westphal1-0/+16
Handle SO_DEBUG and set it on all subflows. Ignore those values not implemented on TCP sockets. Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16mptcp: setsockopt: add SO_INCOMING_CPUFlorian Westphal1-0/+16
Replicate to all subflows. Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16mptcp: setsockopt: add SO_MARK supportFlorian Westphal1-0/+8
Value is synced to all subflows. Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-04-16mptcp: setsockopt: support SO_LINGERFlorian Westphal1-0/+43
Similar to PRIORITY/KEEPALIVE: needs to be mirrored to all subflows. Acked-by: Paolo Abeni <[email protected]> Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Mat Martineau <[email protected]> Signed-off-by: David S. Miller <[email protected]>