aboutsummaryrefslogtreecommitdiff
path: root/net
AgeCommit message (Collapse)AuthorFilesLines
2019-09-13netfilter: nf_tables_offload: refactor the nft_flow_offload_rule functionwenxu1-7/+13
Pass rule, chain and flow_rule object parameters to nft_flow_offload_rule to reuse it. Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-09-13netfilter: nf_tables_offload: refactor the nft_flow_offload_chain functionwenxu1-7/+13
Pass chain and policy parameters to nft_flow_offload_chain to reuse it. Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-09-13netfilter: nf_tables_offload: add __nft_offload_get_chain functionwenxu1-18/+34
Add __nft_offload_get_chain function to get basechain from device. This function requires that caller holds the per-netns nftables mutex. This patch implicitly fixes missing offload flags check and proper mutex from nft_indr_block_cb(). Fixes: 9a32669fecfb ("netfilter: nf_tables_offload: support indr block call") Signed-off-by: wenxu <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-09-12sctp: Fix the link time qualifier of 'sctp_ctrlsock_exit()'Christophe JAILLET1-1/+1
The '.exit' functions from 'pernet_operations' structure should be marked as __net_exit, not __net_init. Fixes: 8e2d61e0aed2 ("sctp: fix race on protocol/netns initialization") Signed-off-by: Christophe JAILLET <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-12net: qrtr: fix memort leak in qrtr_tun_write_iterNavid Emamdoost1-1/+4
In qrtr_tun_write_iter the allocated kbuf should be release in case of error or success return. v2 Update: Thanks to David Miller for pointing out the release on success path as well. Signed-off-by: Navid Emamdoost <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-12net: Fix null de-reference of device refcountSubash Abhinov Kasiviswanathan1-0/+2
In event of failure during register_netdevice, free_netdev is invoked immediately. free_netdev assumes that all the netdevice refcounts have been dropped prior to it being called and as a result frees and clears out the refcount pointer. However, this is not necessarily true as some of the operations in the NETDEV_UNREGISTER notifier handlers queue RCU callbacks for invocation after a grace period. The IPv4 callback in_dev_rcu_put tries to access the refcount after free_netdev is called which leads to a null de-reference- 44837.761523: <6> Unable to handle kernel paging request at virtual address 0000004a88287000 44837.761651: <2> pc : in_dev_finish_destroy+0x4c/0xc8 44837.761654: <2> lr : in_dev_finish_destroy+0x2c/0xc8 44837.762393: <2> Call trace: 44837.762398: <2> in_dev_finish_destroy+0x4c/0xc8 44837.762404: <2> in_dev_rcu_put+0x24/0x30 44837.762412: <2> rcu_nocb_kthread+0x43c/0x468 44837.762418: <2> kthread+0x118/0x128 44837.762424: <2> ret_from_fork+0x10/0x1c Fix this by waiting for the completion of the call_rcu() in case of register_netdevice errors. Fixes: 93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.") Cc: Sean Tranchetti <[email protected]> Signed-off-by: Subash Abhinov Kasiviswanathan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-12net: dsa: microchip: remove NET_DSA_TAG_KSZ_COMMONGeorge McCollister2-8/+3
Remove the superfluous NET_DSA_TAG_KSZ_COMMON and just use the existing NET_DSA_TAG_KSZ. Update the description to mention the three switch families it supports. No functional change. Signed-off-by: George McCollister <[email protected]> Reviewed-by: Marek Vasut <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-12ipv6: Fix the link time qualifier of 'ping_v6_proc_exit_net()'Christophe JAILLET1-1/+1
The '.exit' functions from 'pernet_operations' structure should be marked as __net_exit, not __net_init. Fixes: d862e5461423 ("net: ipv6: Implement /proc/net/icmp6.") Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-11tcp: force a PSH flag on TSO packetsEric Dumazet1-2/+13
When tcp sends a TSO packet, adding a PSH flag on it reduces the sojourn time of GRO packet in GRO receivers. This is particularly the case under pressure, since RX queues receive packets for many concurrent flows. A sender can give a hint to GRO engines when it is appropriate to flush a super-packet, especially when pacing is in the picture, since next packet is probably delayed by one ms. Having less packets in GRO engine reduces chance of LRU eviction or inflated RTT, and reduces GRO cost. We found recently that we must not set the PSH flag on individual full-size MSS segments [1] : Under pressure (CWR state), we better let the packet sit for a small delay (depending on NAPI logic) so that the ACK packet is delayed, and thus next packet we send is also delayed a bit. Eventually the bottleneck queue can be drained. DCTCP flows with CWND=1 have demonstrated the issue. This patch allows to slowdown the aggregate traffic without involving high resolution timers on senders and/or receivers. It has been used at Google for about four years, and has been discussed at various networking conferences. [1] segments smaller than MSS already have PSH flag set by tcp_sendmsg() / tcp_mark_push(), unless MSG_MORE has been requested by the user. Signed-off-by: Eric Dumazet <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Cc: Neal Cardwell <[email protected]> Cc: Yuchung Cheng <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Tariq Toukan <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-11tcp: fix tcp_ecn_withdraw_cwr() to clear TCP_ECN_QUEUE_CWRNeal Cardwell1-1/+1
Fix tcp_ecn_withdraw_cwr() to clear the correct bit: TCP_ECN_QUEUE_CWR. Rationale: basically, TCP_ECN_DEMAND_CWR is a bit that is purely about the behavior of data receivers, and deciding whether to reflect incoming IP ECN CE marks as outgoing TCP th->ece marks. The TCP_ECN_QUEUE_CWR bit is purely about the behavior of data senders, and deciding whether to send CWR. The tcp_ecn_withdraw_cwr() function is only called from tcp_undo_cwnd_reduction() by data senders during an undo, so it should zero the sender-side state, TCP_ECN_QUEUE_CWR. It does not make sense to stop the reflection of incoming CE bits on incoming data packets just because outgoing packets were spuriously retransmitted. The bug has been reproduced with packetdrill to manifest in a scenario with RFC3168 ECN, with an incoming data packet with CE bit set and carrying a TCP timestamp value that causes cwnd undo. Before this fix, the IP CE bit was ignored and not reflected in the TCP ECE header bit, and sender sent a TCP CWR ('W') bit on the next outgoing data packet, even though the cwnd reduction had been undone. After this fix, the sender properly reflects the CE bit and does not set the W bit. Note: the bug actually predates 2005 git history; this Fixes footer is chosen to be the oldest SHA1 I have tested (from Sep 2007) for which the patch applies cleanly (since before this commit the code was in a .h file). Fixes: bdf1ee5d3bd3 ("[TCP]: Move code from tcp_ecn.h to tcp*.c and tcp.h & remove it") Signed-off-by: Neal Cardwell <[email protected]> Acked-by: Yuchung Cheng <[email protected]> Acked-by: Soheil Hassas Yeganeh <[email protected]> Cc: Eric Dumazet <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-11ipv6: Don't use dst gateway directly in ip6_confirm_neigh()Stefano Brivio1-1/+1
This is the equivalent of commit 2c6b55f45d53 ("ipv6: fix neighbour resolution with raw socket") for ip6_confirm_neigh(): we can send a packet with MSG_CONFIRM on a raw socket for a connected route, so the gateway would be :: here, and we should pick the next hop using rt6_nexthop() instead. This was found by code review and, to the best of my knowledge, doesn't actually fix a practical issue: the destination address from the packet is not considered while confirming a neighbour, as ip6_confirm_neigh() calls choose_neigh_daddr() without passing the packet, so there are no similar issues as the one fixed by said commit. A possible source of issues with the existing implementation might come from the fact that, if we have a cached dst, we won't consider it, while rt6_nexthop() takes care of that. I might just not be creative enough to find a practical problem here: the only way to affect this with cached routes is to have one coming from an ICMPv6 redirect, but if the next hop is a directly connected host, there should be no topology for which a redirect applies here, and tests with redirected routes show no differences for MSG_CONFIRM (and MSG_PROBE) packets on raw sockets destined to a directly connected host. However, directly using the dst gateway here is not consistent anymore with neighbour resolution, and, in general, as we want the next hop, using rt6_nexthop() looks like the only sane way to fetch it. Reported-by: Guillaume Nault <[email protected]> Signed-off-by: Stefano Brivio <[email protected]> Acked-by: Guillaume Nault <[email protected]> Acked-by: Nicolas Dichtel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-11net/rds: An rds_sock is added too early to the hash tableKa-Cheong Poon1-22/+18
In rds_bind(), an rds_sock is added to the RDS bind hash table before rs_transport is set. This means that the socket can be found by the receive code path when rs_transport is NULL. And the receive code path de-references rs_transport for congestion update check. This can cause a panic. An rds_sock should not be added to the bind hash table before all the needed fields are set. Reported-by: [email protected] Signed-off-by: Ka-Cheong Poon <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-11mac80211: Do not send Layer 2 Update frame before authorizationJouni Malinen2-10/+8
The Layer 2 Update frame is used to update bridges when a station roams to another AP even if that STA does not transmit any frames after the reassociation. This behavior was described in IEEE Std 802.11F-2003 as something that would happen based on MLME-ASSOCIATE.indication, i.e., before completing 4-way handshake. However, this IEEE trial-use recommended practice document was published before RSN (IEEE Std 802.11i-2004) and as such, did not consider RSN use cases. Furthermore, IEEE Std 802.11F-2003 was withdrawn in 2006 and as such, has not been maintained amd should not be used anymore. Sending out the Layer 2 Update frame immediately after association is fine for open networks (and also when using SAE, FT protocol, or FILS authentication when the station is actually authenticated by the time association completes). However, it is not appropriate for cases where RSN is used with PSK or EAP authentication since the station is actually fully authenticated only once the 4-way handshake completes after authentication and attackers might be able to use the unauthenticated triggering of Layer 2 Update frame transmission to disrupt bridge behavior. Fix this by postponing transmission of the Layer 2 Update frame from station entry addition to the point when the station entry is marked authorized. Similarly, send out the VLAN binding update only if the STA entry has already been authorized. Signed-off-by: Jouni Malinen <[email protected]> Reviewed-by: Johannes Berg <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-11Merge tag 'mac80211-next-for-davem-2019-09-11' of ↵David S. Miller15-67/+75
git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next Johannes Berg says: ==================== We have a number of changes, but things are settling down: * a fix in the new 6 GHz channel support * a fix for recent minstrel (rate control) updates for an infinite loop * handle interface type changes better wrt. management frame registrations (for management frames sent to userspace) * add in-BSS RX time to survey information * handle HW rfkill properly if !CONFIG_RFKILL * send deauth on IBSS station expiry, to avoid state mismatches * handle deferred crypto tailroom updates in mac80211 better when device restart happens * fix a spectre-v1 - really a continuation of a previous patch * advertise NL80211_CMD_UPDATE_FT_IES as supported if so * add some missing parsing in VHT extended NSS support * support HE in mac80211_hwsim * let mac80211 drivers determine the max MTU themselves along with the usual cleanups etc. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-09-11cfg80211: Purge frame registrations on iftype changeDenis Kenzior1-0/+1
Currently frame registrations are not purged, even when changing the interface type. This can lead to potentially weird situations where frames possibly not allowed on a given interface type remain registered due to the type switching happening after registration. The kernel currently relies on userspace apps to actually purge the registrations themselves, this is not something that the kernel should rely on. Add a call to cfg80211_mlme_purge_registrations() to forcefully remove any registrations left over prior to switching the iftype. Cc: [email protected] Signed-off-by: Denis Kenzior <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-09-11nl80211: Fix possible Spectre-v1 for CQM RSSI thresholdsMasashi Honma1-1/+3
commit 1222a1601488 ("nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds") was incomplete and requires one more fix to prevent accessing to rssi_thresholds[n] because user can control rssi_thresholds[i] values to make i reach to n. For example, rssi_thresholds = {-400, -300, -200, -100} when last is -34. Cc: [email protected] Fixes: 1222a1601488 ("nl80211: Fix possible Spectre-v1 for CQM RSSI thresholds") Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Masashi Honma <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-09-11mac80211: allow drivers to set max MTUWen Gong2-1/+2
Make it possibly for drivers to adjust the default max_mtu by storing it in the hardware struct and using that value for all interfaces. Signed-off-by: Wen Gong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-09-11cfg80211: Do not compare with boolean in nl80211_common_reg_change_eventzhong jiang1-5/+3
With the help of boolinit.cocci, we use !nl80211_reg_change_event_fill instead of (nl80211_reg_change_event_fill == false). Meanwhile, Clean up the code. Signed-off-by: zhong jiang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-09-11mac80211: IBSS: send deauth when expiring inactive STAsJohannes Berg4-8/+19
When we expire an inactive station, try to send it a deauth. This helps if it's actually still around, and just has issues with beacon distribution (or we do), and it will not also remove us. Then, if we have shared state, this may not be reset properly, causing problems; for example, we saw a case where aggregation sessions weren't removed properly (due to the TX start being offloaded to firmware and it relying on deauth for stop), causing a lot of traffic to get lost due to the SN reset after remove/add of the peer. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-09-11mac80211: don't check if key is NULL in ieee80211_key_link()Luca Coelho1-1/+1
We already assume that key is not NULL and dereference it in a few other places before we check whether it is NULL, so the check is unnecessary. Remove it. Fixes: 96fc6efb9ad9 ("mac80211: IEEE 802.11 Extended Key ID support") Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-09-11mac80211: clear crypto tx tailroom counter upon keys enableLior Cohen3-35/+15
In case we got a fw restart while roaming from encrypted AP to non-encrypted one, we might end up with hitting a warning on the pending counter crypto_tx_tailroom_pending_dec having a non-zero value. The following comment taken from net/mac80211/key.c explains the rational for the delayed tailroom needed: /* * The reason for the delayed tailroom needed decrementing is to * make roaming faster: during roaming, all keys are first deleted * and then new keys are installed. The first new key causes the * crypto_tx_tailroom_needed_cnt to go from 0 to 1, which invokes * the cost of synchronize_net() (which can be slow). Avoid this * by deferring the crypto_tx_tailroom_needed_cnt decrementing on * key removal for a while, so if we roam the value is larger than * zero and no 0->1 transition happens. * * The cost is that if the AP switching was from an AP with keys * to one without, we still allocate tailroom while it would no * longer be needed. However, in the typical (fast) roaming case * within an ESS this usually won't happen. */ The next flow lead to the warning eventually reported as a bug: 1. Disconnect from encrypted AP 2. Set crypto_tx_tailroom_pending_dec = 1 for the key 3. Schedule work 4. Reconnect to non-encrypted AP 5. Add a new key, setting the tailroom counter = 1 6. Got FW restart while pending counter is set ---> hit the warning While on it, the ieee80211_reset_crypto_tx_tailroom() func was merged into its single caller ieee80211_reenable_keys (previously called ieee80211_enable_keys). Also, we reset the crypto_tx_tailroom_pending_dec and remove the counters warning as we just reset both. Signed-off-by: Lior Cohen <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-09-11mac80211: remove unnecessary key conditionJohannes Berg1-3/+3
When we reach this point, the key cannot be NULL. Remove the condition that suggests otherwise. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-09-11mac80211: list features in WEP/TKIP disable in better orderJohannes Berg1-1/+1
"HE/HT/VHT" is a bit confusing since really the order of development (and possible support) is different - change this to "HT/VHT/HE". Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-09-11cfg80211: always shut down on HW rfkillJohannes Berg3-9/+11
When the RFKILL subsystem isn't available, then rfkill_blocked() always returns false. In the case of hardware rfkill this will be wrong though, as if the hardware reported being killed then it cannot operate any longer. Since we only ever call the rfkill_sync work in this case, just rename it to rfkill_block and always pass "true" for the blocked parameter, rather than passing rfkill_blocked(). We rely on the underlying driver to still reject any new attempt to bring up the device by itself. Signed-off-by: Johannes Berg <[email protected]> Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-09-11mac80211: vht: add support VHT EXT NSS BW in parsing VHTMordechay Goodstein1-1/+9
This fixes was missed in parsing the vht capabilities max bw support. Signed-off-by: Mordechay Goodstein <[email protected]> Fixes: e80d642552a3 ("mac80211: copy VHT EXT NSS BW Support/Capable data to station") Signed-off-by: Luca Coelho <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-09-11cfg80211: fix boundary value in ieee80211_frequency_to_channel()Arend van Spriel1-1/+1
The boundary value used for the 6G band was incorrect as it would result in invalid 6G channel number for certain frequencies. Reported-by: Amar Singhal <[email protected]> Signed-off-by: Arend van Spriel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johannes Berg <[email protected]>
2019-09-10netfilter: nft_{fwd,dup}_netdev: add offload supportPablo Neira Ayuso5-2/+62
This patch adds support for packet mirroring and redirection. The nft_fwd_dup_netdev_offload() function configures the flow_action object for the fwd and the dup actions. Extend nft_flow_rule_destroy() to release the net_device object when the flow_rule object is released, since nft_fwd_dup_netdev_offload() bumps the net_device reference counter. Signed-off-by: Pablo Neira Ayuso <[email protected]> Acked-by: wenxu <[email protected]>
2019-09-10netfilter: nft_synproxy: add synproxy stateful object supportFernando Fernandez Mancera1-21/+122
Register a new synproxy stateful object type into the stateful object infrastructure. Signed-off-by: Fernando Fernandez Mancera <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-09-10sctp: fix the missing put_user when dumping transport thresholdsXin Long1-1/+2
This issue causes SCTP_PEER_ADDR_THLDS sockopt not to be able to dump a transport thresholds info. Fix it by adding 'goto' put_user in sctp_getsockopt_paddr_thresholds. Fixes: 8add543e369d ("sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt") Signed-off-by: Xin Long <[email protected]> Acked-by: Marcelo Ricardo Leitner <[email protected]> Acked-by: Neil Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-10sch_hhf: ensure quantum and hhf_non_hh_weight are non-zeroCong Wang1-1/+1
In case of TCA_HHF_NON_HH_WEIGHT or TCA_HHF_QUANTUM is zero, it would make no progress inside the loop in hhf_dequeue() thus kernel would get stuck. Fix this by checking this corner case in hhf_change(). Fixes: 10239edf86f1 ("net-qdisc-hhf: Heavy-Hitter Filter (HHF) qdisc") Reported-by: [email protected] Reported-by: [email protected] Reported-by: [email protected] Cc: Jamal Hadi Salim <[email protected]> Cc: Jiri Pirko <[email protected]> Cc: Terry Lam <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-10net_sched: check cops->tcf_block in tc_bind_tclass()Cong Wang1-0/+2
At least sch_red and sch_tbf don't implement ->tcf_block() while still have a non-zero tc "class". Instead of adding nop implementations to each of such qdisc's, we can just relax the check of cops->tcf_block() in tc_bind_tclass(). They don't support TC filter anyway. Reported-by: [email protected] Cc: Jamal Hadi Salim <[email protected]> Cc: Jiri Pirko <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-10devlink: add 'reset_dev_on_drv_probe' paramDirk van der Merwe1-0/+5
Add the 'reset_dev_on_drv_probe' devlink parameter, controlling the device reset policy on driver probe. This parameter is useful in conjunction with the existing 'fw_load_policy' parameter. Signed-off-by: Dirk van der Merwe <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-10bridge/mdb: remove wrong use of NLM_F_MULTINicolas Dichtel1-1/+1
NLM_F_MULTI must be used only when a NLMSG_DONE message is sent at the end. In fact, NLMSG_DONE is sent only at the end of a dump. Libraries like libnl will wait forever for NLMSG_DONE. Fixes: 949f1e39a617 ("bridge: mdb: notify on router port add and del") CC: Nikolay Aleksandrov <[email protected]> Signed-off-by: Nicolas Dichtel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-08netfilter: nf_tables_offload: move indirect flow_block callback logic to corePablo Neira Ayuso2-11/+21
Add nft_offload_init() and nft_offload_exit() function to deal with the init and the exit path of the offload infrastructure. Rename nft_indr_block_get_and_ing_cmd() to nft_indr_block_cb(). Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-09-08netfilter: nf_tables_offload: avoid excessive stack usageArnd Bergmann1-7/+13
The nft_offload_ctx structure is much too large to put on the stack: net/netfilter/nf_tables_offload.c:31:23: error: stack frame size of 1200 bytes in function 'nft_flow_rule_create' [-Werror,-Wframe-larger-than=] Use dynamic allocation here, as we do elsewhere in the same function. Fixes: c9626a2cbdb2 ("netfilter: nf_tables: add hardware offload support") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-09-08netfilter: nf_tables: Fix an Oops in nf_tables_updobj() error handlingDan Carpenter1-3/+3
The "newobj" is an error pointer so we can't pass it to kfree(). It doesn't need to be freed so we can remove that and I also renamed the error label. Fixes: d62d0ba97b58 ("netfilter: nf_tables: Introduce stateful object update operation") Signed-off-by: Dan Carpenter <[email protected]> Acked-by: Fernando Fernandez Mancera <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2019-09-07net/tls: align non temporal copy to cache linesJakub Kicinski1-5/+28
Unlike normal TCP code TLS has to touch the cache lines it copies into to fill header info. On memory-heavy workloads having non temporal stores and normal accesses targeting the same cache line leads to significant overhead. Measured 3% overhead running 3600 round robin connections with additional memory heavy workload. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Dirk van der Merwe <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-07net/tls: remove the record tail optimizationJakub Kicinski1-20/+47
For TLS device offload the tag/message authentication code are filled in by the device. The kernel merely reserves space for them. Because device overwrites it, the contents of the tag make do no matter. Current code tries to save space by reusing the header as the tag. This, however, leads to an additional frag being created and defeats buffer coalescing (which trickles all the way down to the drivers). Remove this optimization, and try to allocate the space for the tag in the usual way, leave the memory uninitialized. If memory allocation fails rewind the record pointer so that we use the already copied user data as tag. Note that the optimization was actually buggy, as the tag for TLS 1.2 is 16 bytes, but header is just 13, so the reuse may had looked past the end of the page.. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Dirk van der Merwe <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-07net/tls: use RCU for the adder to the offload record listJakub Kicinski1-8/+13
All modifications to TLS record list happen under the socket lock. Since records form an ordered queue readers are only concerned about elements being removed, additions can happen concurrently. Use RCU primitives to ensure the correct access types (READ_ONCE/WRITE_ONCE). Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Dirk van der Merwe <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-07net/tls: unref frags in orderJakub Kicinski1-6/+3
It's generally more cache friendly to walk arrays in order, especially those which are likely not in cache. Signed-off-by: Jakub Kicinski <[email protected]> Reviewed-by: Dirk van der Merwe <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-07Merge branch 'for-upstream' of ↵David S. Miller4-16/+27
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2019-09-06 Here's the main bluetooth-next pull request for the 5.4 kernel. - Cleanups & fixes to btrtl driver - Fixes for Realtek devices in btusb, e.g. for suspend handling - Firmware loading support for BCM4345C5 - hidp_send_message() return value handling fixes - Added support for utilizing Fast Advertising Interval - Various other minor cleanups & fixes Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-09-07net: gso: Fix skb_segment splat when splitting gso_size mangled skb having ↵Shmulik Ladkani1-0/+19
linear-headed frag_list Historically, support for frag_list packets entering skb_segment() was limited to frag_list members terminating on exact same gso_size boundaries. This is verified with a BUG_ON since commit 89319d3801d1 ("net: Add frag_list support to skb_segment"), quote: As such we require all frag_list members terminate on exact MSS boundaries. This is checked using BUG_ON. As there should only be one producer in the kernel of such packets, namely GRO, this requirement should not be difficult to maintain. However, since commit 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper"), the "exact MSS boundaries" assumption no longer holds: An eBPF program using bpf_skb_change_proto() DOES modify 'gso_size', but leaves the frag_list members as originally merged by GRO with the original 'gso_size'. Example of such programs are bpf-based NAT46 or NAT64. This lead to a kernel BUG_ON for flows involving: - GRO generating a frag_list skb - bpf program performing bpf_skb_change_proto() or bpf_skb_adjust_room() - skb_segment() of the skb See example BUG_ON reports in [0]. In commit 13acc94eff12 ("net: permit skb_segment on head_frag frag_list skb"), skb_segment() was modified to support the "gso_size mangling" case of a frag_list GRO'ed skb, but *only* for frag_list members having head_frag==true (having a page-fragment head). Alas, GRO packets having frag_list members with a linear kmalloced head (head_frag==false) still hit the BUG_ON. This commit adds support to skb_segment() for a 'head_skb' packet having a frag_list whose members are *non* head_frag, with gso_size mangled, by disabling SG and thus falling-back to copying the data from the given 'head_skb' into the generated segmented skbs - as suggested by Willem de Bruijn [1]. Since this approach involves the penalty of skb_copy_and_csum_bits() when building the segments, care was taken in order to enable this solution only when required: - untrusted gso_size, by testing SKB_GSO_DODGY is set (SKB_GSO_DODGY is set by any gso_size mangling functions in net/core/filter.c) - the frag_list is non empty, its item is a non head_frag, *and* the headlen of the given 'head_skb' does not match the gso_size. [0] https://lore.kernel.org/netdev/20190826170724.25ff616f@pixies/ https://lore.kernel.org/netdev/[email protected]/ [1] https://lore.kernel.org/netdev/CA+FuTSfVsgNDi7c=GUU8nMg2hWxF2SjCNLXetHeVPdnxAW5K-w@mail.gmail.com/ Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper") Suggested-by: Willem de Bruijn <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Alexander Duyck <[email protected]> Signed-off-by: Shmulik Ladkani <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Reviewed-by: Alexander Duyck <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-07ipmr: remove hard code cache_resolve_queue_len limitHangbin Liu2-4/+4
This is a re-post of previous patch wrote by David Miller[1]. Phil Karn reported[2] that on busy networks with lots of unresolved multicast routing entries, the creation of new multicast group routes can be extremely slow and unreliable. The reason is we hard-coded multicast route entries with unresolved source addresses(cache_resolve_queue_len) to 10. If some multicast route never resolves and the unresolved source addresses increased, there will be no ability to create new multicast route cache. To resolve this issue, we need either add a sysctl entry to make the cache_resolve_queue_len configurable, or just remove cache_resolve_queue_len limit directly, as we already have the socket receive queue limits of mrouted socket, pointed by David. >From my side, I'd perfer to remove the cache_resolve_queue_len limit instead of creating two more(IPv4 and IPv6 version) sysctl entry. [1] https://lkml.org/lkml/2018/7/22/11 [2] https://lkml.org/lkml/2018/7/21/343 v3: instead of remove cache_resolve_queue_len totally, let's only remove the hard code limit when allocate the unresolved cache, as Eric Dumazet suggested, so we don't need to re-count it in other places. v2: hold the mfc_unres_lock while walking the unresolved list in queue_count(), as Nikolay Aleksandrov remind. Reported-by: Phil Karn <[email protected]> Signed-off-by: Hangbin Liu <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-07ipv6: addrconf_f6i_alloc - fix non-null pointer check to !IS_ERR()Maciej Żenczykowski1-1/+1
Fixes a stupid bug I recently introduced... ip6_route_info_create() returns an ERR_PTR(err) and not a NULL on error. Fixes: d55a2e374a94 ("net-ipv6: fix excessive RTF_ADDRCONF flag on ::1/128 local route (and others)'") Cc: David Ahern <[email protected]> Cc: Lorenzo Colitti <[email protected]> Cc: Eric Dumazet <[email protected]> Signed-off-by: Maciej Żenczykowski <[email protected]> Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-07tcp: ulp: fix possible crash in tcp_diag_get_aux_size()Eric Dumazet1-1/+1
tcp_diag_get_aux_size() can be called with sockets in any state. icsk_ulp_ops is only present for full sockets. For SYN_RECV or TIME_WAIT ones we would access garbage. Fixes: 61723b393292 ("tcp: ulp: add functions to dump ulp-specific information") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: Luke Hsiao <[email protected]> Reported-by: Neal Cardwell <[email protected]> Cc: Davide Caratti <[email protected]> Acked-by: Davide Caratti <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-07net: fib_notifier: move fib_notifier_ops from struct net into per-net structJiri Pirko1-6/+23
No need for fib_notifier_ops to be in struct net. It is used only by fib_notifier as a private data. Use net_generic to introduce per-net fib_notifier struct and move fib_notifier_ops there. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-09-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-nextDavid S. Miller9-21/+231
Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Add nft_reg_store64() and nft_reg_load64() helpers, from Ander Juaristi. 2) Time matching support, also from Ander Juaristi. 3) VLAN support for nfnetlink_log, from Michael Braun. 4) Support for set element deletions from the packet path, also from Ander. 5) Remove __read_mostly from conntrack spinlock, from Li RongQing. 6) Support for updating stateful objects, this also includes the initial client for this infrastructure: the quota extension. A follow up fix for the control plane also comes in this batch. Patches from Fernando Fernandez Mancera. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-09-06kcm: use BPF_PROG_RUNSami Tolvanen1-1/+1
Instead of invoking struct bpf_prog::bpf_func directly, use the BPF_PROG_RUN macro. Signed-off-by: Sami Tolvanen <[email protected]> Acked-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]>
2019-09-06Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller9-85/+550
Daniel Borkmann says: ==================== The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) Add the ability to use unaligned chunks in the AF_XDP umem. By relaxing where the chunks can be placed, it allows to use an arbitrary buffer size and place whenever there is a free address in the umem. Helps more seamless DPDK AF_XDP driver integration. Support for i40e, ixgbe and mlx5e, from Kevin and Maxim. 2) Addition of a wakeup flag for AF_XDP tx and fill rings so the application can wake up the kernel for rx/tx processing which avoids busy-spinning of the latter, useful when app and driver is located on the same core. Support for i40e, ixgbe and mlx5e, from Magnus and Maxim. 3) bpftool fixes for printf()-like functions so compiler can actually enforce checks, bpftool build system improvements for custom output directories, and addition of 'bpftool map freeze' command, from Quentin. 4) Support attaching/detaching XDP programs from 'bpftool net' command, from Daniel. 5) Automatic xskmap cleanup when AF_XDP socket is released, and several barrier/{read,write}_once fixes in AF_XDP code, from Björn. 6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf inclusion as well as libbpf versioning improvements, from Andrii. 7) Several new BPF kselftests for verifier precision tracking, from Alexei. 8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya. 9) And more BPF kselftest improvements all over the place, from Stanislav. 10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub. 11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan. 12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni. 13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin. 14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar. 15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari, Peter, Wei, Yue. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-09-06Bluetooth: hidp: Fix assumptions on the return value of hidp_send_messageDan Elkouby1-2/+2
hidp_send_message was changed to return non-zero values on success, which some other bits did not expect. This caused spurious errors to be propagated through the stack, breaking some drivers, such as hid-sony for the Dualshock 4 in Bluetooth mode. As pointed out by Dan Carpenter, hid-microsoft directly relied on that assumption as well. Fixes: 48d9cc9d85dd ("Bluetooth: hidp: Let hidp_send_message return number of queued bytes") Signed-off-by: Dan Elkouby <[email protected]> Reviewed-by: Dan Carpenter <[email protected]> Reviewed-by: Jiri Kosina <[email protected]> Signed-off-by: Marcel Holtmann <[email protected]>