aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-09-06tcp: fix early ETIMEDOUT after spurious non-SACK RTONeal Cardwell1-7/+18
Fix a bug reported and analyzed by Nagaraj Arankal, where the handling of a spurious non-SACK RTO could cause a connection to fail to clear retrans_stamp, causing a later RTO to very prematurely time out the connection with ETIMEDOUT. Here is the buggy scenario, expanding upon Nagaraj Arankal's excellent report: (*1) Send one data packet on a non-SACK connection (*2) Because no ACK packet is received, the packet is retransmitted and we enter CA_Loss; but this retransmission is spurious. (*3) The ACK for the original data is received. The transmitted packet is acknowledged. The TCP timestamp is before the retrans_stamp, so tcp_may_undo() returns true, and tcp_try_undo_loss() returns true without changing state to Open (because tcp_is_sack() is false), and tcp_process_loss() returns without calling tcp_try_undo_recovery(). Normally after undoing a CA_Loss episode, tcp_fastretrans_alert() would see that the connection has returned to CA_Open and fall through and call tcp_try_to_open(), which would set retrans_stamp to 0. However, for non-SACK connections we hold the connection in CA_Loss, so do not fall through to call tcp_try_to_open() and do not set retrans_stamp to 0. So retrans_stamp is (erroneously) still non-zero. At this point the first "retransmission event" has passed and been recovered from. Any future retransmission is a completely new "event". However, retrans_stamp is erroneously still set. (And we are still in CA_Loss, which is correct.) (*4) After 16 minutes (to correspond with tcp_retries2=15), a new data packet is sent. Note: No data is transmitted between (*3) and (*4) and we disabled keep alives. The socket's timeout SHOULD be calculated from this point in time, but instead it's calculated from the prior "event" 16 minutes ago (step (*2)). (*5) Because no ACK packet is received, the packet is retransmitted. (*6) At the time of the 2nd retransmission, the socket returns ETIMEDOUT, prematurely, because retrans_stamp is (erroneously) too far in the past (set at the time of (*2)). This commit fixes this bug by ensuring that we reuse in tcp_try_undo_loss() the same careful logic for non-SACK connections that we have in tcp_try_undo_recovery(). To avoid duplicating logic, we factor out that logic into a new tcp_is_non_sack_preventing_reopen() helper and call that helper from both undo functions. Fixes: da34ac7626b5 ("tcp: only undo on partial ACKs in CA_Loss") Reported-by: Nagaraj Arankal <[email protected]> Link: https://lore.kernel.org/all/SJ0PR84MB1847BE6C24D274C46A1B9B0EB27A9@SJ0PR84MB1847.NAMPRD84.PROD.OUTLOOK.COM/ Signed-off-by: Neal Cardwell <[email protected]> Signed-off-by: Yuchung Cheng <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-09-05Merge tag 'for-net-2022-09-02' of ↵David S. Miller1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth Luiz Augusto von Dentz says: ==================== bluetooth pull request for net: - Fix regression preventing ACL packet transmission ==================== Signed-off-by: David S. Miller <[email protected]>
2022-09-05stmmac: intel: Simplify intel_eth_pci_remove()Christophe JAILLET1-2/+0
There is no point to call pcim_iounmap_regions() in the remove function, this frees a managed resource that would be release by the framework anyway. Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-05net: mvpp2: debugfs: fix memory leak when using debugfs_lookup()Greg Kroah-Hartman1-2/+2
When calling debugfs_lookup() the result must have dput() called on it, otherwise the memory will leak over time. Fix this up to be much simpler logic and only create the root debugfs directory once when the driver is first accessed. That resolves the memory leak and makes things more obvious as to what the intent is. Cc: Marcin Wojtas <[email protected]> Cc: Russell King <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: [email protected] Cc: stable <[email protected]> Fixes: 21da57a23125 ("net: mvpp2: add a debugfs interface for the Header Parser") Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-05ipv6: sr: fix out-of-bounds read when setting HMAC data.David Lebrun1-0/+5
The SRv6 layer allows defining HMAC data that can later be used to sign IPv6 Segment Routing Headers. This configuration is realised via netlink through four attributes: SEG6_ATTR_HMACKEYID, SEG6_ATTR_SECRET, SEG6_ATTR_SECRETLEN and SEG6_ATTR_ALGID. Because the SECRETLEN attribute is decoupled from the actual length of the SECRET attribute, it is possible to provide invalid combinations (e.g., secret = "", secretlen = 64). This case is not checked in the code and with an appropriately crafted netlink message, an out-of-bounds read of up to 64 bytes (max secret length) can occur past the skb end pointer and into skb_shared_info: Breakpoint 1, seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208 208 memcpy(hinfo->secret, secret, slen); (gdb) bt #0 seg6_genl_sethmac (skb=<optimized out>, info=<optimized out>) at net/ipv6/seg6.c:208 #1 0xffffffff81e012e9 in genl_family_rcv_msg_doit (skb=skb@entry=0xffff88800b1f9f00, nlh=nlh@entry=0xffff88800b1b7600, extack=extack@entry=0xffffc90000ba7af0, ops=ops@entry=0xffffc90000ba7a80, hdrlen=4, net=0xffffffff84237580 <init_net>, family=<optimized out>, family=<optimized out>) at net/netlink/genetlink.c:731 #2 0xffffffff81e01435 in genl_family_rcv_msg (extack=0xffffc90000ba7af0, nlh=0xffff88800b1b7600, skb=0xffff88800b1f9f00, family=0xffffffff82fef6c0 <seg6_genl_family>) at net/netlink/genetlink.c:775 #3 genl_rcv_msg (skb=0xffff88800b1f9f00, nlh=0xffff88800b1b7600, extack=0xffffc90000ba7af0) at net/netlink/genetlink.c:792 #4 0xffffffff81dfffc3 in netlink_rcv_skb (skb=skb@entry=0xffff88800b1f9f00, cb=cb@entry=0xffffffff81e01350 <genl_rcv_msg>) at net/netlink/af_netlink.c:2501 #5 0xffffffff81e00919 in genl_rcv (skb=0xffff88800b1f9f00) at net/netlink/genetlink.c:803 #6 0xffffffff81dff6ae in netlink_unicast_kernel (ssk=0xffff888010eec800, skb=0xffff88800b1f9f00, sk=0xffff888004aed000) at net/netlink/af_netlink.c:1319 #7 netlink_unicast (ssk=ssk@entry=0xffff888010eec800, skb=skb@entry=0xffff88800b1f9f00, portid=portid@entry=0, nonblock=<optimized out>) at net/netlink/af_netlink.c:1345 #8 0xffffffff81dff9a4 in netlink_sendmsg (sock=<optimized out>, msg=0xffffc90000ba7e48, len=<optimized out>) at net/netlink/af_netlink.c:1921 ... (gdb) p/x ((struct sk_buff *)0xffff88800b1f9f00)->head + ((struct sk_buff *)0xffff88800b1f9f00)->end $1 = 0xffff88800b1b76c0 (gdb) p/x secret $2 = 0xffff88800b1b76c0 (gdb) p slen $3 = 64 '@' The OOB data can then be read back from userspace by dumping HMAC state. This commit fixes this by ensuring SECRETLEN cannot exceed the actual length of SECRET. Reported-by: Lucas Leong <[email protected]> Tested: verified that EINVAL is correctly returned when secretlen > len(secret) Fixes: 4f4853dc1c9c1 ("ipv6: sr: implement API to control SR HMAC structure") Signed-off-by: David Lebrun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-05Merge branch 'bonding-fixes'David S. Miller2-7/+21
Hangbin Liu says: ==================== bonding: fix lladdr finding and confirmation This patch set fixed 3 issues when setting lladdr as bonding IPv6 target. Please see each patch for the details. v2: separate the patch to 3 parts ==================== Signed-off-by: David S. Miller <[email protected]>
2022-09-05bonding: accept unsolicited NA messageHangbin Liu1-5/+12
The unsolicited NA message with all-nodes multicast dest address should be valid, as this also means the link could reach the target. Also rename bond_validate_ns() to bond_validate_na(). Reported-by: LiLiang <[email protected]> Fixes: 5e1eeef69c0f ("bonding: NS target should accept link local address") Signed-off-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-05bonding: add all node mcast address when slave upHangbin Liu1-2/+6
When a link is enslave to bond, it need to set the interface down first. This makes the slave remove mac multicast address 33:33:00:00:00:01(The IPv6 multicast address ff02::1 is kept even when the interface down). When bond set the slave up, ipv6_mc_up() was not called due to commit c2edacf80e15 ("bonding / ipv6: no addrconf for slaves separately from master"). This is not an issue before we adding the lladdr target feature for bonding, as the mac multicast address will be added back when bond interface up and join group ff02::1. But after adding lladdr target feature for bonding. When user set a lladdr target, the unsolicited NA message with all-nodes multicast dest will be dropped as the slave interface never add 33:33:00:00:00:01 back. Fix this by calling ipv6_mc_up() to add 33:33:00:00:00:01 back when the slave interface up. Reported-by: LiLiang <[email protected]> Fixes: 5e1eeef69c0f ("bonding: NS target should accept link local address") Signed-off-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-05bonding: use unspecified address if no available link local addressHangbin Liu1-0/+3
When ns_ip6_target was set, the ipv6_dev_get_saddr() will be called to get available source address and send IPv6 neighbor solicit message. If the target is global address, ipv6_dev_get_saddr() will get any available src address. But if the target is link local address, ipv6_dev_get_saddr() will only get available address from our interface, i.e. the corresponding bond interface. But before bond interface up, all the address is tentative, while ipv6_dev_get_saddr() will ignore tentative address. This makes we can't find available link local src address, then bond_ns_send() will not be called and no NS message was sent. Finally bond interface will keep in down state. Fix this by sending NS with unspecified address if there is no available source address. Reported-by: LiLiang <[email protected]> Fixes: 5e1eeef69c0f ("bonding: NS target should accept link local address") Signed-off-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-04Merge tag 'wireless-2022-09-03' of ↵David S. Miller11-26/+73
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless Johannes berg says: ==================== We have a handful of fixes: - fix DMA from stack in wilc1000 driver - fix crash on chip reset failure in mt7921e - fix for the reported warning on aggregation timer expiry - check packet lengths in hwsim virtio paths - fix compiler warnings/errors with AAD construction by using struct_group - fix Intel 4965 driver rate scale operation - release channel contexts correctly in mac80211 mlme code ==================== Signed-off-by: David S. Miller <[email protected]>
2022-09-03wifi: use struct_group to copy addressesJohannes Berg3-6/+8
We sometimes copy all the addresses from the 802.11 header for the AAD, which may cause complaints from fortify checks. Use struct_group() to avoid the compiler warnings/errors. Change-Id: Ic3ea389105e7813b22095b295079eecdabde5045 Signed-off-by: Johannes Berg <[email protected]>
2022-09-03wifi: mac80211_hwsim: check length for virtio packetsSoenke Huster1-1/+6
An invalid packet with a length shorter than the specified length in the netlink header can lead to use-after-frees and slab-out-of-bounds in the processing of the netlink attributes, such as the following: BUG: KASAN: slab-out-of-bounds in __nla_validate_parse+0x1258/0x2010 Read of size 2 at addr ffff88800ac7952c by task kworker/0:1/12 Workqueue: events hwsim_virtio_rx_work Call Trace: <TASK> dump_stack_lvl+0x45/0x5d print_report.cold+0x5e/0x5e5 kasan_report+0xb1/0x1c0 __nla_validate_parse+0x1258/0x2010 __nla_parse+0x22/0x30 hwsim_virtio_handle_cmd.isra.0+0x13f/0x2d0 hwsim_virtio_rx_work+0x1b2/0x370 process_one_work+0x8df/0x1530 worker_thread+0x575/0x11a0 kthread+0x29d/0x340 ret_from_fork+0x22/0x30 </TASK> Discarding packets with an invalid length solves this. Therefore, skb->len must be set at reception. Change-Id: Ieaeb9a4c62d3beede274881a7c2722c6c6f477b6 Signed-off-by: Soenke Huster <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2022-09-03wifi: mac80211: fix locking in auth/assoc timeoutJohannes Berg1-6/+5
If we hit an authentication or association timeout, we only release the chanctx for the deflink, and the other link(s) are released later by ieee80211_vif_set_links(), but we're not locking this correctly. Fix the locking here while releasing the channels and links. Change-Id: I9e08c1a5434592bdc75253c1abfa6c788f9f39b1 Fixes: 81151ce462e5 ("wifi: mac80211: support MLO authentication/association with one link") Signed-off-by: Johannes Berg <[email protected]>
2022-09-03wifi: mac80211: mlme: release deflink channel in error caseJohannes Berg1-0/+1
In the prep_channel error case we didn't release the deflink channel leaving it to be left around. Fix that. Change-Id: If0dfd748125ec46a31fc6045a480dc28e03723d2 Signed-off-by: Johannes Berg <[email protected]>
2022-09-03wifi: mac80211: fix link warning in RX agg timer expiryMukesh Sisodiya1-0/+4
The rx data link pointer isn't set from the RX aggregation timer, resulting in a later warning. Fix that by setting it to the first valid link for now, with a FIXME to worry about statistics later, it's not very important since it's just the timeout case. Reported-by: Hans de Goede <[email protected]> Link: https://lore.kernel.org/r/[email protected] Fixes: 56057da4569b ("wifi: mac80211: rx: track link in RX data") Signed-off-by: Mukesh Sisodiya <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2022-09-03Merge branch '40GbE' of ↵David S. Miller4-5/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-09-02 (i40e, iavf) This series contains updates to i40e and iavf drivers. Przemyslaw adds reset to ADQ configuration to allow for setting of rate limit beyond TC0 for i40e. Ivan Vecera does not free client on failure to open which could cause NULL pointer dereference to occur on i40e. He also detaches device during reset to prevent NDO calls with could cause races for iavf. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-09-03Merge branch '100GbE' of ↵David S. Miller4-18/+80
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== This series contains updates to ice driver only. Przemyslaw fixes memory leak of DMA memory due to incorrect freeing of rx_buf. Michal S corrects incorrect call to free memory. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-09-03net: dsa: microchip: fix kernel oops on ksz8 switchesOleksij Rempel1-6/+24
After driver refactoring we was running ksz9477 specific CPU port configuration on ksz8 family which ended with kernel oops. So, make sure we run this code only on ksz9477 compatible devices. Tested on KSZ8873 and KSZ9477. Fixes: da8cd08520f3 ("net: dsa: microchip: add support for common phylink mac link up") Signed-off-by: Oleksij Rempel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-03xen-netback: only remove 'hotplug-status' when the vif is actually destroyedPaul Durrant1-1/+1
Removing 'hotplug-status' in backend_disconnected() means that it will be removed even in the case that the frontend unilaterally disconnects (which it is free to do at any time). The consequence of this is that, when the frontend attempts to re-connect, the backend gets stuck in 'InitWait' rather than moving straight to 'Connected' (which it can do because the hotplug script has already run). Instead, the 'hotplug-status' mode should be removed in netback_remove() i.e. when the vif really is going away. Fixes: 0f4558ae9187 ("Revert "xen-netback: remove 'hotplug-status' once it has served its purpose"") Signed-off-by: Paul Durrant <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-02net: fec: Use a spinlock to guard `fep->ptp_clk_on`Csókás Bence3-27/+19
Mutexes cannot be taken in a non-preemptible context, causing a panic in `fec_ptp_save_state()`. Replacing `ptp_clk_mutex` by `tmreg_lock` fixes this. Fixes: 6a4d7234ae9a ("net: fec: ptp: avoid register access when ipg clock is disabled") Fixes: f79959220fa5 ("fec: Restart PPS after link state change") Reported-by: Marc Kleine-Budde <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Csókás Bence <[email protected]> Tested-by: Francesco Dolcini <[email protected]> # Toradex Apalis iMX6 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-02net: fec: add pm_qos support on imx6q platformWei Fang2-1/+13
There is a very low probability that tx timeout will occur during suspend and resume stress test on imx6q platform. So we add pm_qos support to prevent system from entering low level idles which may affect the transmission of tx. Signed-off-by: Wei Fang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nfJakub Kicinski12-110/+46
Florian Westphal says: ==================== netfilter: bug fixes for net 1. Fix IP address check in irc DCC conntrack helper, this should check the opposite direction rather than the destination address of the packets' direction, from David Leadbeater. 2. bridge netfilter needs to drop dst references, from Harsh Modi. This was fine back in the day the code was originally written, but nowadays various tunnels can pre-set metadata dsts on packets. 3. Remove nf_conntrack_helper sysctl and the modparam toggle, users need to explicitily assign the helpers to use via nftables or iptables. Conntrack helpers, by design, may be used to add dynamic port redirections to internal machines, so its necessary to restrict which hosts/peers are allowed to use them. It was discovered that improper checking in the irc DCC helper makes it possible to trigger the 'please do dynamic port forward' from outside by embedding a 'DCC' in a PING request; if the client echos that back a expectation/port forward gets added. The auto-assign-for-everything mechanism has been in "please don't do this" territory since 2012. From Pablo. 4. Fix a memory leak in the netdev hook error unwind path, also from Pablo. * git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_conntrack_irc: Fix forged IP logic netfilter: nf_tables: clean up hook list when offload flags check fails netfilter: br_netfilter: Drop dst references before setting. netfilter: remove nf_conntrack_helper sysctl and modparam toggles ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-02Bluetooth: hci_sync: Fix hci_read_buffer_size_syncLuiz Augusto von Dentz1-6/+6
hci_read_buffer_size_sync shall not use HCI_OP_LE_READ_BUFFER_SIZE_V2 sinze that is LE specific, instead it is hci_le_read_buffer_size_sync version that shall use it. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216382 Fixes: 26afbd826ee3 ("Bluetooth: Add initial implementation of CIS connections") Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2022-09-02iavf: Detach device during reset taskIvan Vecera1-3/+11
iavf_reset_task() takes crit_lock at the beginning and holds it during whole call. The function subsequently calls iavf_init_interrupt_scheme() that grabs RTNL. Problem occurs when userspace initiates during the reset task any ndo callback that runs under RTNL like iavf_open() because some of that functions tries to take crit_lock. This leads to classic A-B B-A deadlock scenario. To resolve this situation the device should be detached in iavf_reset_task() prior taking crit_lock to avoid subsequent ndos running under RTNL and reattach the device at the end. Fixes: 62fe2a865e6d ("i40evf: add missing rtnl_lock() around i40evf_set_interrupt_capability") Cc: Jacob Keller <[email protected]> Cc: Patryk Piotrowski <[email protected]> Cc: SlawomirX Laba <[email protected]> Tested-by: Vitaly Grinberg <[email protected]> Signed-off-by: Ivan Vecera <[email protected]> Tested-by: Konrad Jankowski <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-09-02i40e: Fix kernel crash during module removalIvan Vecera1-1/+4
The driver incorrectly frees client instance and subsequent i40e module removal leads to kernel crash. Reproducer: 1. Do ethtool offline test followed immediately by another one host# ethtool -t eth0 offline; ethtool -t eth0 offline 2. Remove recursively irdma module that also removes i40e module host# modprobe -r irdma Result: [ 8675.035651] i40e 0000:3d:00.0 eno1: offline testing starting [ 8675.193774] i40e 0000:3d:00.0 eno1: testing finished [ 8675.201316] i40e 0000:3d:00.0 eno1: offline testing starting [ 8675.358921] i40e 0000:3d:00.0 eno1: testing finished [ 8675.496921] i40e 0000:3d:00.0: IRDMA hardware initialization FAILED init_state=2 status=-110 [ 8686.188955] i40e 0000:3d:00.1: i40e_ptp_stop: removed PHC on eno2 [ 8686.943890] i40e 0000:3d:00.1: Deleted LAN device PF1 bus=0x3d dev=0x00 func=0x01 [ 8686.952669] i40e 0000:3d:00.0: i40e_ptp_stop: removed PHC on eno1 [ 8687.761787] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 8687.768755] #PF: supervisor read access in kernel mode [ 8687.773895] #PF: error_code(0x0000) - not-present page [ 8687.779034] PGD 0 P4D 0 [ 8687.781575] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 8687.785935] CPU: 51 PID: 172891 Comm: rmmod Kdump: loaded Tainted: G W I 5.19.0+ #2 [ 8687.794800] Hardware name: Intel Corporation S2600WFD/S2600WFD, BIOS SE5C620.86B.0X.02.0001.051420190324 05/14/2019 [ 8687.805222] RIP: 0010:i40e_lan_del_device+0x13/0xb0 [i40e] [ 8687.810719] Code: d4 84 c0 0f 84 b8 25 01 00 e9 9c 25 01 00 41 bc f4 ff ff ff eb 91 90 0f 1f 44 00 00 41 54 55 53 48 8b 87 58 08 00 00 48 89 fb <48> 8b 68 30 48 89 ef e8 21 8a 0f d5 48 89 ef e8 a9 78 0f d5 48 8b [ 8687.829462] RSP: 0018:ffffa604072efce0 EFLAGS: 00010202 [ 8687.834689] RAX: 0000000000000000 RBX: ffff8f43833b2000 RCX: 0000000000000000 [ 8687.841821] RDX: 0000000000000000 RSI: ffff8f4b0545b298 RDI: ffff8f43833b2000 [ 8687.848955] RBP: ffff8f43833b2000 R08: 0000000000000001 R09: 0000000000000000 [ 8687.856086] R10: 0000000000000000 R11: 000ffffffffff000 R12: ffff8f43833b2ef0 [ 8687.863218] R13: ffff8f43833b2ef0 R14: ffff915103966000 R15: ffff8f43833b2008 [ 8687.870342] FS: 00007f79501c3740(0000) GS:ffff8f4adffc0000(0000) knlGS:0000000000000000 [ 8687.878427] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 8687.884174] CR2: 0000000000000030 CR3: 000000014276e004 CR4: 00000000007706e0 [ 8687.891306] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 8687.898441] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 8687.905572] PKRU: 55555554 [ 8687.908286] Call Trace: [ 8687.910737] <TASK> [ 8687.912843] i40e_remove+0x2c0/0x330 [i40e] [ 8687.917040] pci_device_remove+0x33/0xa0 [ 8687.920962] device_release_driver_internal+0x1aa/0x230 [ 8687.926188] driver_detach+0x44/0x90 [ 8687.929770] bus_remove_driver+0x55/0xe0 [ 8687.933693] pci_unregister_driver+0x2a/0xb0 [ 8687.937967] i40e_exit_module+0xc/0xf48 [i40e] Two offline tests cause IRDMA driver failure (ETIMEDOUT) and this failure is indicated back to i40e_client_subtask() that calls i40e_client_del_instance() to free client instance referenced by pf->cinst and sets this pointer to NULL. During the module removal i40e_remove() calls i40e_lan_del_device() that dereferences pf->cinst that is NULL -> crash. Do not remove client instance when client open callbacks fails and just clear __I40E_CLIENT_INSTANCE_OPENED bit. The driver also needs to take care about this situation (when netdev is up and client is NOT opened) in i40e_notify_client_of_netdev_close() and calls client close callback only when __I40E_CLIENT_INSTANCE_OPENED is set. Fixes: 0ef2d5afb12d ("i40e: KISS the client interface") Signed-off-by: Ivan Vecera <[email protected]> Tested-by: Helena Anna Dubel <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-09-02i40e: Fix ADQ rate limiting for PFPrzemyslaw Patynowski2-1/+5
Fix HW rate limiting for ADQ. Fallback to kernel queue selection for ADQ, as it is network stack that decides which queue to use for transmit with ADQ configured. Reset PF after creation of VMDq2 VSIs required for ADQ, as to reprogram TX queue contexts in i40e_configure_tx_ring. Without this patch PF would limit TX rate only according to TC0. Fixes: a9ce82f744dc ("i40e: Enable 'channel' mode in mqprio for TC configs") Signed-off-by: Przemyslaw Patynowski <[email protected]> Signed-off-by: Jan Sokolowski <[email protected]> Tested-by: Bharathi Sreenivas <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2022-09-02ice: use bitmap_free instead of devm_kfreeMichal Swiatkowski1-1/+1
pf->avail_txqs was allocated using bitmap_zalloc, bitmap_free should be used to free this memory. Fixes: 78b5713ac1241 ("ice: Alloc queue management bitmaps and arrays dynamically") Signed-off-by: Michal Swiatkowski <[email protected]> Tested-by: Gurucharan <[email protected]> (A Contingent worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-09-02ice: Fix DMA mappings leakPrzemyslaw Patynowski4-17/+79
Fix leak, when user changes ring parameters. During reallocation of RX buffers, new DMA mappings are created for those buffers. New buffers with different RX ring count should substitute older ones, but those buffers were freed in ice_vsi_cfg_rxq and reallocated again with ice_alloc_rx_buf. kfree on rx_buf caused leak of already mapped DMA. Reallocate ZC with xdp_buf struct, when BPF program loads. Reallocate back to rx_buf, when BPF program unloads. If BPF program is loaded/unloaded and XSK pools are created, reallocate RX queues accordingly in XDP_SETUP_XSK_POOL handler. Steps for reproduction: while : do for ((i=0; i<=8160; i=i+32)) do ethtool -G enp130s0f0 rx $i tx $i sleep 0.5 ethtool -g enp130s0f0 done done Fixes: 617f3e1b588c ("ice: xsk: allocate separate memory for XDP SW ring") Signed-off-by: Przemyslaw Patynowski <[email protected]> Signed-off-by: Mateusz Palczewski <[email protected]> Tested-by: Chandan <[email protected]> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <[email protected]>
2022-09-02Merge tag 'rxrpc-fixes-20220901' of ↵David S. Miller18-108/+280
git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs David Howells says: ==================== rxrpc fixes Here are some fixes for AF_RXRPC: (1) Fix the handling of ICMP/ICMP6 packets. This is a problem due to rxrpc being switched to acting as a UDP tunnel, thereby allowing it to steal the packets before they go through the UDP Rx queue. UDP tunnels can't get ICMP/ICMP6 packets, however. This patch adds an additional encap hook so that they can. (2) Fix the encryption routines in rxkad to handle packets that have more than three parts correctly. The problem is that ->nr_frags doesn't count the initial fragment, so the sglist ends up too short. (3) Fix a problem with destruction of the local endpoint potentially getting repeated. (4) Fix the calculation of the time at which to resend. jiffies_to_usecs() gives microseconds, not nanoseconds. (5) Fix AFS to work out when callback promises and locks expire based on the time an op was issued rather than the time the first reply packet arrives. We don't know how long the server took between calculating the expiry interval and transmitting the reply. (6) Given (5), rxrpc_get_reply_time() is no longer used, so remove it. ==================== Signed-off-by: David S. Miller <[email protected]>
2022-09-02tcp: TX zerocopy should not sense pfmemalloc statusEric Dumazet3-2/+23
We got a recent syzbot report [1] showing a possible misuse of pfmemalloc page status in TCP zerocopy paths. Indeed, for pages coming from user space or other layers, using page_is_pfmemalloc() is moot, and possibly could give false positives. There has been attempts to make page_is_pfmemalloc() more robust, but not using it in the first place in this context is probably better, removing cpu cycles. Note to stable teams : You need to backport 84ce071e38a6 ("net: introduce __skb_fill_page_desc_noacc") as a prereq. Race is more probable after commit c07aea3ef4d4 ("mm: add a signature in struct page") because page_is_pfmemalloc() is now using low order bit from page->lru.next, which can change more often than page->index. Low order bit should never be set for lru.next (when used as an anchor in LRU list), so KCSAN report is mostly a false positive. Backporting to older kernel versions seems not necessary. [1] BUG: KCSAN: data-race in lru_add_fn / tcp_build_frag write to 0xffffea0004a1d2c8 of 8 bytes by task 18600 on cpu 0: __list_add include/linux/list.h:73 [inline] list_add include/linux/list.h:88 [inline] lruvec_add_folio include/linux/mm_inline.h:105 [inline] lru_add_fn+0x440/0x520 mm/swap.c:228 folio_batch_move_lru+0x1e1/0x2a0 mm/swap.c:246 folio_batch_add_and_move mm/swap.c:263 [inline] folio_add_lru+0xf1/0x140 mm/swap.c:490 filemap_add_folio+0xf8/0x150 mm/filemap.c:948 __filemap_get_folio+0x510/0x6d0 mm/filemap.c:1981 pagecache_get_page+0x26/0x190 mm/folio-compat.c:104 grab_cache_page_write_begin+0x2a/0x30 mm/folio-compat.c:116 ext4_da_write_begin+0x2dd/0x5f0 fs/ext4/inode.c:2988 generic_perform_write+0x1d4/0x3f0 mm/filemap.c:3738 ext4_buffered_write_iter+0x235/0x3e0 fs/ext4/file.c:270 ext4_file_write_iter+0x2e3/0x1210 call_write_iter include/linux/fs.h:2187 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x468/0x760 fs/read_write.c:578 ksys_write+0xe8/0x1a0 fs/read_write.c:631 __do_sys_write fs/read_write.c:643 [inline] __se_sys_write fs/read_write.c:640 [inline] __x64_sys_write+0x3e/0x50 fs/read_write.c:640 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffffea0004a1d2c8 of 8 bytes by task 18611 on cpu 1: page_is_pfmemalloc include/linux/mm.h:1740 [inline] __skb_fill_page_desc include/linux/skbuff.h:2422 [inline] skb_fill_page_desc include/linux/skbuff.h:2443 [inline] tcp_build_frag+0x613/0xb20 net/ipv4/tcp.c:1018 do_tcp_sendpages+0x3e8/0xaf0 net/ipv4/tcp.c:1075 tcp_sendpage_locked net/ipv4/tcp.c:1140 [inline] tcp_sendpage+0x89/0xb0 net/ipv4/tcp.c:1150 inet_sendpage+0x7f/0xc0 net/ipv4/af_inet.c:833 kernel_sendpage+0x184/0x300 net/socket.c:3561 sock_sendpage+0x5a/0x70 net/socket.c:1054 pipe_to_sendpage+0x128/0x160 fs/splice.c:361 splice_from_pipe_feed fs/splice.c:415 [inline] __splice_from_pipe+0x222/0x4d0 fs/splice.c:559 splice_from_pipe fs/splice.c:594 [inline] generic_splice_sendpage+0x89/0xc0 fs/splice.c:743 do_splice_from fs/splice.c:764 [inline] direct_splice_actor+0x80/0xa0 fs/splice.c:931 splice_direct_to_actor+0x305/0x620 fs/splice.c:886 do_splice_direct+0xfb/0x180 fs/splice.c:974 do_sendfile+0x3bf/0x910 fs/read_write.c:1249 __do_sys_sendfile64 fs/read_write.c:1317 [inline] __se_sys_sendfile64 fs/read_write.c:1303 [inline] __x64_sys_sendfile64+0x10c/0x150 fs/read_write.c:1303 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x2b/0x70 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x0000000000000000 -> 0xffffea0004a1d288 Reported by Kernel Concurrency Sanitizer on: CPU: 1 PID: 18611 Comm: syz-executor.4 Not tainted 6.0.0-rc2-syzkaller-00248-ge022620b5d05-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 07/22/2022 Fixes: c07aea3ef4d4 ("mm: add a signature in struct page") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Shakeel Butt <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-02tipc: fix shift wrapping bug in map_get()Dan Carpenter1-1/+1
There is a shift wrapping bug in this code so anything thing above 31 will return false. Fixes: 35c55c9877f8 ("tipc: add neighbor monitoring framework") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-02sch_sfb: Don't assume the skb is still around after enqueueing to childToke Høiland-Jørgensen1-4/+6
The sch_sfb enqueue() routine assumes the skb is still alive after it has been enqueued into a child qdisc, using the data in the skb cb field in the increment_qlen() routine after enqueue. However, the skb may in fact have been freed, causing a use-after-free in this case. In particular, this happens if sch_cake is used as a child of sfb, and the GSO splitting mode of CAKE is enabled (in which case the skb will be split into segments and the original skb freed). Fix this by copying the sfb cb data to the stack before enqueueing the skb, and using this stack copy in increment_qlen() instead of the skb pointer itself. Reported-by: [email protected] # ZDI-CAN-18231 Fixes: e13e02a3c68d ("net_sched: SFB flow scheduler") Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-09-01Revert "net: phy: meson-gxl: improve link-up behavior"Heiner Kallweit1-7/+1
This reverts commit 2c87c6f9fbddc5b84d67b2fa3f432fcac6d99d93. Meanwhile it turned out that the following commit is the proper workaround for the issue that 2c87c6f9fbdd tries to address. a3a57bf07de2 ("net: stmmac: work around sporadic tx issue on link-up") It's nor clear why the to be reverted commit helped for one user, for others it didn't make a difference. Fixes: 2c87c6f9fbdd ("net: phy: meson-gxl: improve link-up behavior") Signed-off-by: Heiner Kallweit <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-09-01Merge tag 'net-6.0-rc4' of ↵Linus Torvalds70-249/+478
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth, bpf and wireless. Current release - regressions: - bpf: - fix wrong last sg check in sk_msg_recvmsg() - fix kernel BUG in purge_effective_progs() - mac80211: - fix possible leak in ieee80211_tx_control_port() - potential NULL dereference in ieee80211_tx_control_port() Current release - new code bugs: - nfp: fix the access to management firmware hanging Previous releases - regressions: - ip: fix triggering of 'icmp redirect' - sched: tbf: don't call qdisc_put() while holding tree lock - bpf: fix corrupted packets for XDP_SHARED_UMEM - bluetooth: hci_sync: fix suspend performance regression - micrel: fix probe failure Previous releases - always broken: - tcp: make global challenge ack rate limitation per net-ns and default disabled - tg3: fix potential hang-up on system reboot - mac802154: fix reception for no-daddr packets Misc: - r8152: add PID for the lenovo onelink+ dock" * tag 'net-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits) net/smc: Remove redundant refcount increase Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb" tcp: make global challenge ack rate limitation per net-ns and default disabled tcp: annotate data-race around challenge_timestamp net: dsa: hellcreek: Print warning only once ip: fix triggering of 'icmp redirect' sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb selftests: net: sort .gitignore file Documentation: networking: correct possessive "its" kcm: fix strp_init() order and cleanup mlxbf_gige: compute MDIO period based on i1clk ethernet: rocker: fix sleep in atomic context bug in neigh_timer_handler net: lan966x: improve error handle in lan966x_fdma_rx_get_frame() nfp: fix the access to management firmware hanging net: phy: micrel: Make the GPIO to be non-exclusive net: virtio_net: fix notification coalescing comments net/sched: fix netdevice reference leaks in attach_default_qdiscs() net: sched: tbf: don't call qdisc_put() while holding tree lock net: Use u64_stats_fetch_begin_irq() for stats fetch. net: dsa: xrs700x: Use irqsave variant for u64 stats update ...
2022-09-01Merge tag 'slab-for-6.0-rc4' of ↵Linus Torvalds1-16/+29
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: - A fix from Waiman Long to avoid a theoretical deadlock reported by lockdep. * tag 'slab-for-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock
2022-09-01Merge tag 'sound-6.0-rc4' of ↵Linus Torvalds7-34/+146
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "Just handful changes at this time. The only major change is the regression fix about the x86 WC-page buffer allocation. The rest are trivial data-race fixes for ALSA sequencer core, the possible out-of-bounds access fixes in the new ALSA control hash code, and a few device-specific workarounds and fixes" * tag 'sound-6.0-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: usb-audio: Add quirk for LH Labs Geek Out HD Audio 1V5 ALSA: hda/realtek: Add speaker AMP init for Samsung laptops with ALC298 ALSA: control: Re-order bounds checking in get_ctl_id_hash() ALSA: control: Fix an out-of-bounds bug in get_ctl_id_hash() ALSA: hda: intel-nhlt: Correct the handling of fmt_config flexible array ALSA: seq: Fix data-race at module auto-loading ALSA: seq: oss: Fix data-race for max_midi_devs access ALSA: memalloc: Revive x86-specific WC page allocations again
2022-09-01rxrpc: Remove rxrpc_get_reply_time() which is no longer usedDavid Howells3-56/+0
Remove rxrpc_get_reply_time() as that is no longer used now that the call issue time is used instead of the reply time. Signed-off-by: David Howells <[email protected]>
2022-09-01afs: Use the operation issue time instead of the reply time for callbacksDavid Howells5-12/+5
rxrpc and kafs between them try to use the receive timestamp on the first data packet (ie. the one with sequence number 1) as a base from which to calculate the time at which callback promise and lock expiration occurs. However, we don't know how long it took for the server to send us the reply from it having completed the basic part of the operation - it might then, for instance, have to send a bunch of a callback breaks, depending on the particular operation. Fix this by using the time at which the operation is issued on the client as a base instead. That should never be longer than the server's idea of the expiry time. Fixes: 781070551c26 ("afs: Fix calculation of callback expiry time") Fixes: 2070a3e44962 ("rxrpc: Allow the reply time to be obtained on a client call") Suggested-by: Jeffrey E Altman <[email protected]> Signed-off-by: David Howells <[email protected]>
2022-09-01rxrpc: Fix calc of resend ageDavid Howells1-1/+1
Fix the calculation of the resend age to add a microsecond value as microseconds, not nanoseconds. Signed-off-by: David Howells <[email protected]>
2022-09-01rxrpc: Fix local destruction being repeatedDavid Howells1-0/+3
If the local processor work item for the rxrpc local endpoint gets requeued by an event (such as an incoming packet) between it getting scheduled for destruction and the UDP socket being closed, the rxrpc_local_destroyer() function can get run twice. The second time it can hang because it can end up waiting for cleanup events that will never happen. Signed-off-by: David Howells <[email protected]>
2022-09-01rxrpc: Fix an insufficiently large sglist in rxkad_verify_packet_2()David Howells1-1/+1
rxkad_verify_packet_2() has a small stack-allocated sglist of 4 elements, but if that isn't sufficient for the number of fragments in the socket buffer, we try to allocate an sglist large enough to hold all the fragments. However, for large packets with a lot of fragments, this isn't sufficient and we need at least one additional fragment. The problem manifests as skb_to_sgvec() returning -EMSGSIZE and this then getting returned by userspace. Most of the time, this isn't a problem as rxrpc sets a limit of 5692, big enough for 4 jumbo subpackets to be glued together; occasionally, however, the server will ignore the reported limit and give a packet that's a lot bigger - say 19852 bytes with ->nr_frags being 7. skb_to_sgvec() then tries to return a "zeroth" fragment that seems to occur before the fragments counted by ->nr_frags and we hit the end of the sglist too early. Note that __skb_to_sgvec() also has an skb_walk_frags() loop that is recursive up to 24 deep. I'm not sure if I need to take account of that too - or if there's an easy way of counting those frags too. Fix this by counting an extra frag and allocating a larger sglist based on that. Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()") Reported-by: Marc Dionne <[email protected]> Signed-off-by: David Howells <[email protected]> cc: [email protected]
2022-09-01rxrpc: Fix ICMP/ICMP6 error handlingDavid Howells8-38/+270
Because rxrpc pretends to be a tunnel on top of a UDP/UDP6 socket, allowing it to siphon off UDP packets early in the handling of received UDP packets thereby avoiding the packet going through the UDP receive queue, it doesn't get ICMP packets through the UDP ->sk_error_report() callback. In fact, it doesn't appear that there's any usable option for getting hold of ICMP packets. Fix this by adding a new UDP encap hook to distribute error messages for UDP tunnels. If the hook is set, then the tunnel driver will be able to see ICMP packets. The hook provides the offset into the packet of the UDP header of the original packet that caused the notification. An alternative would be to call the ->error_handler() hook - but that requires that the skbuff be cloned (as ip_icmp_error() or ipv6_cmp_error() do, though isn't really necessary or desirable in rxrpc's case is we want to parse them there and then, not queue them). Changes ======= ver #3) - Fixed an uninitialised variable. ver #2) - Fixed some missing CONFIG_AF_RXRPC_IPV6 conditionals. Fixes: 5271953cad31 ("rxrpc: Use the UDP encap_rcv hook") Signed-off-by: David Howells <[email protected]>
2022-09-01mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding ↵Waiman Long1-16/+29
slab_mutex/cpu_hotplug_lock A circular locking problem is reported by lockdep due to the following circular locking dependency. +--> cpu_hotplug_lock --> slab_mutex --> kn->active --+ | | +-----------------------------------------------------+ The forward cpu_hotplug_lock ==> slab_mutex ==> kn->active dependency happens in kmem_cache_destroy(): cpus_read_lock(); mutex_lock(&slab_mutex); ==> sysfs_slab_unlink() ==> kobject_del() ==> kernfs_remove() ==> __kernfs_remove() ==> kernfs_drain(): rwsem_acquire(&kn->dep_map, ...); The backward kn->active ==> cpu_hotplug_lock dependency happens in kernfs_fop_write_iter(): kernfs_get_active(); ==> slab_attr_store() ==> cpu_partial_store() ==> flush_all(): cpus_read_lock() One way to break this circular locking chain is to avoid holding cpu_hotplug_lock and slab_mutex while deleting the kobject in sysfs_slab_unlink() which should be equivalent to doing a write_lock and write_unlock pair of the kn->active virtual lock. Since the kobject structures are not protected by slab_mutex or the cpu_hotplug_lock, we can certainly release those locks before doing the delete operation. Move sysfs_slab_unlink() and sysfs_slab_release() to the newly created kmem_cache_release() and call it outside the slab_mutex & cpu_hotplug_lock critical sections. There will be a slight delay in the deletion of sysfs files if kmem_cache_release() is called indirectly from a work function. Fixes: 5a836bf6b09f ("mm: slub: move flush_cpu_slab() invocations __free_slab() invocations out of IRQ context") Signed-off-by: Waiman Long <[email protected]> Reviewed-by: Hyeonggon Yoo <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Acked-by: David Rientjes <[email protected]> Link: https://lore.kernel.org/all/YwOImVd+nRUsSAga@hyeyoo/ Signed-off-by: Vlastimil Babka <[email protected]>
2022-09-01net/smc: Remove redundant refcount increaseYacan Liu1-1/+0
For passive connections, the refcount increment has been done in smc_clcsock_accept()-->smc_sock_alloc(). Fixes: 3b2dec2603d5 ("net/smc: restructure client and server code in af_smc") Signed-off-by: Yacan Liu <[email protected]> Reviewed-by: Tony Lu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-08-31Revert "sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb"Jakub Kicinski1-3/+1
This reverts commit 90fabae8a2c225c4e4936723c38857887edde5cc. Patch was applied hastily, revert and let the v2 be reviewed. Fixes: 90fabae8a2c2 ("sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb") Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-31Merge branch 'tcp-tcp-challenge-ack-fixes'Jakub Kicinski4-13/+21
Eric Dumazet says: ==================== tcp: tcp challenge ack fixes syzbot found a typical data-race addressed in the first patch. While we are at it, second patch makes the global rate limit per net-ns and disabled by default. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-31tcp: make global challenge ack rate limitation per net-ns and default disabledEric Dumazet4-13/+21
Because per host rate limiting has been proven problematic (side channel attacks can be based on it), per host rate limiting of challenge acks ideally should be per netns and turned off by default. This is a long due followup of following commits: 083ae308280d ("tcp: enable per-socket rate limiting of all 'challenge acks'") f2b2c582e824 ("tcp: mitigate ACK loops for connections as tcp_sock") 75ff39ccc1bd ("tcp: make challenge acks less predictable") Signed-off-by: Eric Dumazet <[email protected]> Cc: Jason Baron <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-31tcp: annotate data-race around challenge_timestampEric Dumazet1-2/+2
challenge_timestamp can be read an written by concurrent threads. This was expected, but we need to annotate the race to avoid potential issues. Following patch moves challenge_timestamp and challenge_count to per-netns storage to provide better isolation. Fixes: 354e4aa391ed ("tcp: RFC 5961 5.2 Blind Data Injection Attack Mitigation") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Acked-by: Neal Cardwell <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-31net: dsa: hellcreek: Print warning only onceKurt Kanzenbach1-1/+1
In case the source port cannot be decoded, print the warning only once. This still brings attention to the user and does not spam the logs at the same time. Signed-off-by: Kurt Kanzenbach <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-08-31ip: fix triggering of 'icmp redirect'Nicolas Dichtel1-2/+2
__mkroute_input() uses fib_validate_source() to trigger an icmp redirect. My understanding is that fib_validate_source() is used to know if the src address and the gateway address are on the same link. For that, fib_validate_source() returns 1 (same link) or 0 (not the same network). __mkroute_input() is the only user of these positive values, all other callers only look if the returned value is negative. Since the below patch, fib_validate_source() didn't return anymore 1 when both addresses are on the same network, because the route lookup returns RT_SCOPE_LINK instead of RT_SCOPE_HOST. But this is, in fact, right. Let's adapat the test to return 1 again when both addresses are on the same link. CC: [email protected] Fixes: 747c14307214 ("ip: fix dflt addr selection for connected nexthop") Reported-by: kernel test robot <[email protected]> Reported-by: Heng Qi <[email protected]> Signed-off-by: Nicolas Dichtel <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>