aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-03-08ipv6: raw: check sk->sk_rcvbuf earlierEric Dumazet1-0/+7
There is no point cloning an skb and having to free the clone if the receive queue of the raw socket is full. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-08nexthop: Simplify dump error handlingIdo Schimmel1-9/+0
The only error that can happen during a nexthop dump is insufficient space in the skb caring the netlink messages (EMSGSIZE). If this happens and some messages were already filled in, the nexthop code returns the skb length to signal the netlink core that more objects need to be dumped. After commit b5a899154aa9 ("netlink: handle EMSGSIZE errors in the core") there is no need to handle this error in the nexthop code as it is now handled in the core. Simplify the code and simply return the error to the core. No regressions in nexthop tests: # ./fib_nexthops.sh Tests passed: 234 Tests failed: 0 Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-08net: add skb_data_unref() helperEric Dumazet2-3/+19
Similar to skb_unref(), add skb_data_unref() to save an expensive atomic operation (and cache line dirtying) when last reference on shinfo->dataref is released. I saw this opportunity on hosts with RAW sockets accidentally bound to UDP protocol, forcing an skb_clone() on all received packets. These RAW sockets had their receive queue full, so all clone packets were immediately dropped. When UDP recvmsg() consumes later the original skb, skb_release_data() is hitting atomic_sub_return() quite badly, because skb->clone has been set permanently. Note that this patch helps TCP TX performance, because TCP stack also use (fast) clones. This means that at least one of the two packets (the main skb or its clone) will no longer have to perform this atomic operation in skb_release_data(). Signed-off-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-08Merge tag 'wireless-next-2024-03-08' of ↵Jakub Kicinski132-1342/+3176
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.9 The fourth "new features" pull request for v6.9 with changes both in stack and in drivers. The theme in this pull request is to fix sparse warnings but we still have some left in wireless subsystem. Otherwise quite normal. Major changes: rtw89 * NL80211_EXT_FEATURE_SCAN_RANDOM_SN support * NL80211_EXT_FEATURE_SET_SCAN_DWELL support rtw88 * support for more rtw8811cu and rtw8821cu devices mt76 * mt76x2u: add Netgear WNDA3100v3 USB * mt7915: newer ADIE version support * mt7925: radio temperature sensor support * mt7996: remove GCMP IGTK offload * tag 'wireless-next-2024-03-08' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (125 commits) wifi: rtw89: wow: move release offload packet earlier for WoWLAN mode wifi: rtw89: wow: set security engine options for 802.11ax chips only wifi: rtw89: update suspend/resume for different generation wifi: rtw89: wow: update config mac function with different generation wifi: rtw89: update DMA function with different generation wifi: rtw89: wow: update WoWLAN status register for different generation wifi: rtw89: wow: update WoWLAN reason register for different chips wifi: brcm80211: handle pmk_op allocation failure wifi: rtw89: coex: Add coexistence policy to decrease WiFi packet CRC-ERR wifi: rtw89: coex: When Bluetooth not available don't set power/gain wifi: rtw89: coex: add return value to ensure H2C command is success or not wifi: rtw89: coex: Reorder H2C command index to align with firmware wifi: rtw89: coex: add BTC ctrl_info version 7 and related logic wifi: rtw89: coex: add init_info H2C command format version 7 wifi: rtw89: 8922a: add coexistence helpers of SW grant wifi: rtw89: mac: add coexistence helpers {cfg/get}_plt wifi: cw1200: restore endian swapping wifi: wlcore: sdio: Rate limit wl12xx_sdio_raw_{read,write}() failures warns wifi: rtlwifi: Remove rtl_intf_ops.read_efuse_byte wifi: rtw88: 8821c: Fix false alarm count ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-03-08Bluetooth: hci_sync: Fix UAF in hci_acl_create_conn_syncLuiz Augusto von Dentz1-0/+3
This fixes the following error caused by hci_conn being freed while hcy_acl_create_conn_sync is pending: ================================================================== BUG: KASAN: slab-use-after-free in hci_acl_create_conn_sync+0xa7/0x2e0 Write of size 2 at addr ffff888002ae0036 by task kworker/u3:0/848 CPU: 0 PID: 848 Comm: kworker/u3:0 Not tainted 6.8.0-rc6-g2ab3e8d67fc1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-1.fc38 04/01/2014 Workqueue: hci0 hci_cmd_sync_work Call Trace: <TASK> dump_stack_lvl+0x21/0x70 print_report+0xce/0x620 ? preempt_count_sub+0x13/0xc0 ? __virt_addr_valid+0x15f/0x310 ? hci_acl_create_conn_sync+0xa7/0x2e0 kasan_report+0xdf/0x110 ? hci_acl_create_conn_sync+0xa7/0x2e0 hci_acl_create_conn_sync+0xa7/0x2e0 ? __pfx_hci_acl_create_conn_sync+0x10/0x10 ? __pfx_lock_release+0x10/0x10 ? __pfx_hci_acl_create_conn_sync+0x10/0x10 hci_cmd_sync_work+0x138/0x1c0 process_one_work+0x405/0x800 ? __pfx_lock_acquire+0x10/0x10 ? __pfx_process_one_work+0x10/0x10 worker_thread+0x37b/0x670 ? __pfx_worker_thread+0x10/0x10 kthread+0x19b/0x1e0 ? kthread+0xfe/0x1e0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Allocated by task 847: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 __kasan_kmalloc+0x8f/0xa0 hci_conn_add+0xc6/0x970 hci_connect_acl+0x309/0x410 pair_device+0x4fb/0x710 hci_sock_sendmsg+0x933/0xef0 sock_write_iter+0x2c3/0x2d0 do_iter_readv_writev+0x21a/0x2e0 vfs_writev+0x21c/0x7b0 do_writev+0x14a/0x180 do_syscall_64+0x77/0x150 entry_SYSCALL_64_after_hwframe+0x6c/0x74 Freed by task 847: kasan_save_stack+0x33/0x60 kasan_save_track+0x14/0x30 kasan_save_free_info+0x3b/0x60 __kasan_slab_free+0xfa/0x150 kfree+0xcb/0x250 device_release+0x58/0xf0 kobject_put+0xbb/0x160 hci_conn_del+0x281/0x570 hci_conn_hash_flush+0xfc/0x130 hci_dev_close_sync+0x336/0x960 hci_dev_close+0x10e/0x140 hci_sock_ioctl+0x14a/0x5c0 sock_ioctl+0x58a/0x5d0 __x64_sys_ioctl+0x480/0xf60 do_syscall_64+0x77/0x150 entry_SYSCALL_64_after_hwframe+0x6c/0x74 Fixes: 45340097ce6e ("Bluetooth: hci_conn: Only do ACL connections sequentially") Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2024-03-08Bluetooth: Fix eir name lengthFrédéric Danis2-23/+8
According to Section 1.2 of Core Specification Supplement Part A the complete or short name strings are defined as utf8s, which should not include the trailing NULL for variable length array as defined in Core Specification Vol1 Part E Section 2.9.3. Removing the trailing NULL allows PTS to retrieve the random address based on device name, e.g. for SM/PER/KDU/BV-02-C, SM/PER/KDU/BV-08-C or GAP/BROB/BCST/BV-03-C. Fixes: f61851f64b17 ("Bluetooth: Fix append max 11 bytes of name to scan rsp data") Signed-off-by: Frédéric Danis <[email protected]> Signed-off-by: Luiz Augusto von Dentz <[email protected]>
2024-03-08Merge branch 'hns3-fixes'David S. Miller13-22/+74
Jijie Shao says: ==================== There are some bugfix for the HNS3 ethernet driver There are some bugfix for the HNS3 ethernet driver ==================== Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: hns3: add checking for vf id of mailboxJian Shen1-3/+4
Add checking for vf id of mailbox, in order to avoid array out-of-bounds risk. Signed-off-by: Jian Shen <[email protected]> Signed-off-by: Jijie Shao <[email protected]> Reviewed-by: Sunil Goutham <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: hns3: fix port duplex configure error in IMP resetJie Wang1-1/+4
Currently, the mac port is fixed to configured as full dplex mode in hclge_mac_init() when driver initialization or reset restore. Users may change the mode to half duplex with ethtool, so it may cause the user configuration dropped after reset. To fix it, don't change the duplex mode when resetting. Fixes: 2d03eacc0b7e ("net: hns3: Only update mac configuation when necessary") Signed-off-by: Jie Wang <[email protected]> Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: hns3: fix reset timeout under full functions and queuesPeiyang Wang2-2/+2
The cmdq reset command times out when all VFs are enabled and the queue is full. The hardware processing time exceeds the timeout set by the driver. In order to avoid the above extreme situations, the driver extends the reset timeout to 1 second. Signed-off-by: Peiyang Wang <[email protected]> Signed-off-by: Jijie Shao <[email protected]> Reviewed-by: Sunil Goutham <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: hns3: fix delete tc fail issueJijie Shao5-0/+22
When the tc is removed during reset, hns3 driver will return a errcode. But kernel ignores this errcode, As a result, the driver status is inconsistent with the kernel status. This patch retains the deletion status when the deletion fails and continues to delete after the reset to ensure that the status of the driver is consistent with that of kernel. Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: hns3: fix kernel crash when 1588 is received on HIP08 devicesYonglong Liu1-1/+1
The HIP08 devices does not register the ptp devices, so the hdev->ptp is NULL, but the hardware can receive 1588 messages, and set the HNS3_RXD_TS_VLD_B bit, so, if match this case, the access of hdev->ptp->flags will cause a kernel crash: [ 5888.946472] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018 [ 5888.946475] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000018 ... [ 5889.266118] pc : hclge_ptp_get_rx_hwts+0x40/0x170 [hclge] [ 5889.272612] lr : hclge_ptp_get_rx_hwts+0x34/0x170 [hclge] [ 5889.279101] sp : ffff800012c3bc50 [ 5889.283516] x29: ffff800012c3bc50 x28: ffff2040002be040 [ 5889.289927] x27: ffff800009116484 x26: 0000000080007500 [ 5889.296333] x25: 0000000000000000 x24: ffff204001c6f000 [ 5889.302738] x23: ffff204144f53c00 x22: 0000000000000000 [ 5889.309134] x21: 0000000000000000 x20: ffff204004220080 [ 5889.315520] x19: ffff204144f53c00 x18: 0000000000000000 [ 5889.321897] x17: 0000000000000000 x16: 0000000000000000 [ 5889.328263] x15: 0000004000140ec8 x14: 0000000000000000 [ 5889.334617] x13: 0000000000000000 x12: 00000000010011df [ 5889.340965] x11: bbfeff4d22000000 x10: 0000000000000000 [ 5889.347303] x9 : ffff800009402124 x8 : 0200f78811dfbb4d [ 5889.353637] x7 : 2200000000191b01 x6 : ffff208002a7d480 [ 5889.359959] x5 : 0000000000000000 x4 : 0000000000000000 [ 5889.366271] x3 : 0000000000000000 x2 : 0000000000000000 [ 5889.372567] x1 : 0000000000000000 x0 : ffff20400095c080 [ 5889.378857] Call trace: [ 5889.382285] hclge_ptp_get_rx_hwts+0x40/0x170 [hclge] [ 5889.388304] hns3_handle_bdinfo+0x324/0x410 [hns3] [ 5889.394055] hns3_handle_rx_bd+0x60/0x150 [hns3] [ 5889.399624] hns3_clean_rx_ring+0x84/0x170 [hns3] [ 5889.405270] hns3_nic_common_poll+0xa8/0x220 [hns3] [ 5889.411084] napi_poll+0xcc/0x264 [ 5889.415329] net_rx_action+0xd4/0x21c [ 5889.419911] __do_softirq+0x130/0x358 [ 5889.424484] irq_exit+0x134/0x154 [ 5889.428700] __handle_domain_irq+0x88/0xf0 [ 5889.433684] gic_handle_irq+0x78/0x2c0 [ 5889.438319] el1_irq+0xb8/0x140 [ 5889.442354] arch_cpu_idle+0x18/0x40 [ 5889.446816] default_idle_call+0x5c/0x1c0 [ 5889.451714] cpuidle_idle_call+0x174/0x1b0 [ 5889.456692] do_idle+0xc8/0x160 [ 5889.460717] cpu_startup_entry+0x30/0xfc [ 5889.465523] secondary_start_kernel+0x158/0x1ec [ 5889.470936] Code: 97ffab78 f9411c14 91408294 f9457284 (f9400c80) [ 5889.477950] SMP: stopping secondary CPUs [ 5890.514626] SMP: failed to stop secondary CPUs 0-69,71-95 [ 5890.522951] Starting crashdump kernel... Fixes: 0bf5eb788512 ("net: hns3: add support for PTP") Signed-off-by: Yonglong Liu <[email protected]> Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: hns3: Disable SerDes serial loopback for HiLink H60Hao Lan5-3/+18
When the hilink version is H60, the serdes serial loopback test is not supported. This patch add hilink version detection. When the version is H60, the serdes serial loopback test will be disable. Signed-off-by: Hao Lan <[email protected]> Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: hns3: add new 200G link modes for hisilicon deviceHao Lan2-11/+22
The hisilicon device now supports a new 200G link interface, which query from firmware in a new bit. Therefore, the HCLGE_SUPPORT_200G_R4_BIT capability bit has been added. The HCLGE_SUPPORT_200G_BIT has been renamed as HCLGE_SUPPORT_200G_R4_EXT_BIT, and the firmware has extended support for this mode. Fixes: ae6f010cb1a7 ("net: hns3: add support for 200G device") Signed-off-by: Hao Lan <[email protected]> Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: hns3: fix wrong judgment condition issueJijie Shao1-1/+1
In hns3_dcbnl_ieee_delapp, should check ieee_delapp not ieee_setapp. This path fix the wrong judgment. Fixes: 0ba22bcb222d ("net: hns3: add support config dscp map to tc") Signed-off-by: Jijie Shao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08Merge branch 'ionic-diet'David S. Miller8-504/+371
Shannon Nelson says: ==================== ionic: putting ionic on a diet Building on the performance work done in the previous patchset [Link] https://lore.kernel.org/netdev/[email protected]/ this patchset puts the ionic driver on a diet, decreasing the memory requirements per queue, and simplifies a few more bits of logic. We trimmed the queue management structs and gained some ground, but the most savings came from trimming the individual buffer descriptors. The original design used a single generic buffer descriptor for Tx, Rx and Adminq needs, but the Rx and Adminq descriptors really don't need all the info that the Tx descriptors track. By splitting up the descriptor types we can significantly reduce the descriptor sizes for Rx and Adminq use. There is a small reduction in the queue management structs, saving about 3 cachelines per queuepair: ionic_qcq: Before: /* size: 2176, cachelines: 34, members: 23 */ After: /* size: 2048, cachelines: 32, members: 23 */ We also remove an array of completion descriptor pointers, or about 8 Kbytes per queue. But the biggest savings came from splitting the desc_info struct into queue specific structs and trimming out what was unnecessary. Before: ionic_desc_info: /* size: 496, cachelines: 8, members: 10 */ After: ionic_tx_desc_info: /* size: 496, cachelines: 8, members: 6 */ ionic_rx_desc_info: /* size: 224, cachelines: 4, members: 2 */ ionic_admin_desc_info: /* size: 8, cachelines: 1, members: 1 */ In a 64 core host the ionic driver will default to 64 queuepairs of 1024 descriptors for Rx, 1024 for Tx, and 80 for Adminq and Notifyq. The total memory usage for 64 queues: Before: 65 * sizeof(ionic_qcq) 141,440 + 64 * 1024 * sizeof(ionic_desc_info) 32,505,856 + 64 * 1024 * sizeof(ionic_desc_info) 32,505,856 + 64 * 1024 * 2 * sizeof(ionic_qc_info) 16,384 + 1 * 80 * sizeof(ionic_desc_info) 39,690 ---------- 65,201,038 After: 65 * sizeof(ionic_qcq) 133,120 + 64 * 1024 * sizeof(ionic_tx_desc_info) 32,505,856 + 64 * 1024 * sizeof(ionic_rx_desc_info) 14,680,064 + (removed) 0 + 1 * 80 * sizeof(ionic_admin desc_info) 640 ---------- 47,319,680 This saves us approximately 18 Mbytes per port in a 64 core machine, a 28% savings in our memory needs. In addition, this improves our simple single thread / single queue iperf case on a 9100 MTU connection from 86.7 to 95 Gbits/sec. ==================== Signed-off-by: David S. Miller <[email protected]>
2024-03-08ionic: keep stats struct local to error handlingShannon Nelson1-22/+8
When possible, keep the stats struct references strictly in the error handling blocks and out of the fastpath. Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ionic: better dma-map error handlingShannon Nelson1-18/+14
Fix up a couple of small dma_addr handling issues - don't double-count dma-map-err stat in ionic_tx_map_skb() or ionic_xdp_post_frame() - return 0 on error from both ionic_tx_map_single() and ionic_tx_map_frag() and check for !dma_addr in ionic_tx_map_skb() and ionic_xdp_post_frame() - be sure to unmap buf_info[0] in ionic_tx_map_skb() error path - don't assign rx buf->dma_addr until error checked in ionic_rx_page_alloc() - remove unnecessary dma_addr_t casts Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ionic: remove unnecessary NULL testShannon Nelson1-6/+0
We call ionic_rx_page_alloc() only on existing buf_info structs from ionic_rx_fill(). There's no need for the additional NULL test. Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ionic: rearrange ionic_queue for better layoutShannon Nelson1-2/+2
A simple change to the struct ionic_queue layout removes some unnecessary padding and saves us a cacheline in the struct ionic_qcq layout. struct ionic_queue { Before: /* size: 256, cachelines: 4, members: 29 */ After: /* size: 192, cachelines: 3, members: 29 */ struct ionic_qcq { Before: /* size: 2112, cachelines: 33, members: 23 */ After: /* size: 2048, cachelines: 32, members: 23 */ Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ionic: rearrange ionic_qcqShannon Nelson1-4/+4
Rearange a few fields for better cache use and to put the flags field up into the first cacheline rather than the last. struct ionic_qcq Before: /* size: 2176, cachelines: 34, members: 23 */ After: /* size: 2112, cachelines: 33, members: 23 */ Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ionic: carry idev in ionic_cq structShannon Nelson3-19/+6
Remove the idev field from ionic_queue, which saves us a bit of space, and add it into ionic_cq where there's room within some cacheline padding. Use this pointer rather than doing a multi level reference from lif->ionic. Suggested-by: Neel Patel <[email protected]> Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ionic: refactor skb buildingShannon Nelson1-53/+65
The existing ionic_rx_frags() code is a bit of a mess and can be cleaned up by unrolling the first frag/header setup from the loop, then reworking the do-while-loop into a for-loop. We rename the function to a more descriptive ionic_rx_build_skb(). We also change a couple of related variable names for readability. Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ionic: fold adminq clean into service routineShannon Nelson1-21/+11
Since the AdminQ clean is a simple action called from only one place, fold it back into the service routine. Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ionic: use specialized desc info structsShannon Nelson5-86/+111
Make desc_info structure specific to the queue type, which allows us to cut down the Rx and AdminQ descriptor sizes by not including all the fields needed for the Tx desriptors. Before: struct ionic_desc_info { /* size: 464, cachelines: 8, members: 6 */ After: struct ionic_tx_desc_info { /* size: 464, cachelines: 8, members: 6 */ struct ionic_rx_desc_info { /* size: 224, cachelines: 4, members: 2 */ struct ionic_admin_desc_info { /* size: 8, cachelines: 1, members: 1 */ Suggested-by: Neel Patel <[email protected]> Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ionic: remove the cq_info to save more memoryShannon Nelson7-89/+38
With a little simple math we don't need another struct array to find the completion structs, so we can remove the ionic_cq_info altogether. This doesn't really save anything in the ionic_cq since it gets padded out to the cacheline, but it does remove the parallel array allocation of 8 * num_descriptors, or about 8 Kbytes per queue in a default configuration. Suggested-by: Neel Patel <[email protected]> Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ionic: remove callback pointer from desc_infoShannon Nelson5-91/+57
By reworking the queue service routines to have their own servicing loops we can remove the cb pointer from desc_info to save another 8 bytes per descriptor, This simplifies some of the queue handling indirection and makes the code a little easier to follow, and keeps service code in one place rather than jumping between code files. struct ionic_desc_info Before: /* size: 472, cachelines: 8, members: 7 */ After: /* size: 464, cachelines: 8, members: 6 */ Suggested-by: Neel Patel <[email protected]> Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ionic: move adminq-notifyq handling to main fileShannon Nelson3-65/+67
Move the AdminQ and NotifyQ queue handling to ionic_main.c with the rest of the adminq code. Suggested-by: Neel Patel <[email protected]> Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ionic: drop q mappingShannon Nelson3-35/+10
Now that we're not using desc_info pointers mapped in every q we can simplify and drop the unnecessary utility functions. Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ionic: remove desc, sg_desc and cmb_desc from desc_infoShannon Nelson4-60/+45
Remove the struct pointers from desc_info to use less space. Instead of pointers in every desc_info to its descriptor, we can use the queue descriptor index to find the individual desc, desc_info, and sgl structs in their parallel arrays. struct ionic_desc_info Before: /* size: 496, cachelines: 8, members: 10 */ After: /* size: 472, cachelines: 8, members: 7 */ Suggested-by: Neel Patel <[email protected]> Reviewed-by: Brett Creeley <[email protected]> Signed-off-by: Shannon Nelson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08Merge branch '40GbE' of ↵David S. Miller6-42/+53
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2024-03-06 (iavf, i40e, ixgbe) This series contains updates to iavf, i40e, and ixgbe drivers. Alexey Kodanev removes duplicate calls related to cloud filters on iavf and unnecessary null checks on i40e. Maciej adds helper functions for common code relating to updating statistics for ixgbe. ==================== Signed-off-by: David S. Miller <[email protected]>
2024-03-08Add Jeff Kirsher to .get_maintainer.ignoreJakub Kicinski1-0/+1
Jeff was retired as the Intel driver maintainer in commit 6667df916fce ("MAINTAINERS: Update MAINTAINERS for Intel ethernet drivers"), and his address bounces. But he has signed-off a lot of patches over the years so get_maintainer insists on CCing him. We haven't heard from him since he left Intel, so remapping the address via mailmap is also pointless. Add to ignored addresses. Signed-off-by: Jakub Kicinski <[email protected]> Acked-by: Tony Nguyen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08Merge branch 'ipv6-lockless-dump-addrs'David S. Miller1-90/+78
Eric Dumazet says: ==================== ipv6: lockless inet6_dump_addr() This series removes RTNL locking to dump ipv6 addresses. ==================== Signed-off-by: David S. Miller <[email protected]>
2024-03-08ipv6: remove RTNL protection from inet6_dump_addr()Eric Dumazet1-3/+6
We can now remove RTNL acquisition while running inet6_dump_addr(), inet6_dump_ifmcaddr() and inet6_dump_ifacaddr(). Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ipv6: use xa_array iterator to implement inet6_dump_addr()Eric Dumazet1-49/+30
inet6_dump_addr() can use the new xa_array iterator for better scalability. Make it ready for RCU-only protection. RTNL use is removed in the following patch. Also properly return 0 at the end of a dump to avoid and extra recvmsg() to get NLMSG_DONE. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ipv6: make in6_dump_addrs() locklessEric Dumazet1-9/+5
in6_dump_addrs() is called with RCU protection. There is no need holding idev->lock to iterate through unicast addresses. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08ipv6: make inet6_fill_ifaddr() locklessEric Dumazet1-29/+37
Make inet6_fill_ifaddr() lockless, and add approriate annotations on ifa->tstamp, ifa->valid_lft, ifa->preferred_lft, ifa->ifa_proto and ifa->rt_priority. Also constify 2nd argument of inet6_fill_ifaddr(), inet6_fill_ifmcaddr() and inet6_fill_ifacaddr(). Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08Merge tag 'ipsec-next-2024-03-06' of ↵David S. Miller3-18/+144
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== 1) Introduce forwarding of ICMP Error messages. That is specified in RFC 4301 but was never implemented. From Antony Antony. 2) Use KMEM_CACHE instead of kmem_cache_create in xfrm6_tunnel_init() and xfrm_policy_init(). From Kunwu Chan. 3) Do not allocate stats in the xfrm interface driver, this can be done on net core now. From Breno Leitao. ==================== Signed-off-by: David S. Miller <[email protected]>
2024-03-08Merge branch 'nexthop-group-stats'David S. Miller3-40/+363
Petr Machata says: ==================== Support for nexthop group statistics ECMP is a fundamental component in L3 designs. However, it's fragile. Many factors influence whether an ECMP group will operate as intended: hash policy (i.e. the set of fields that contribute to ECMP hash calculation), neighbor validity, hash seed (which might lead to polarization) or the type of ECMP group used (hash-threshold or resilient). At the same time, collection of statistics that would help an operator determine that the group performs as desired, is difficult. A solution that we present in this patchset is to add counters to next hop group entries. For SW-datapath deployments, this will on its own allow collection and evaluation of relevant statistics. For HW-datapath deployments, we further add a way to request that HW counters be installed for a given group, in-kernel interfaces to collect the HW statistics, and netlink interfaces to query them. For example: # ip nexthop replace id 4000 group 4001/4002 hw_stats on # ip -s -d nexthop show id 4000 id 4000 group 4001/4002 scope global proto unspec offload hw_stats on used on stats: id 4001 packets 5002 packets_hw 5000 id 4002 packets 4999 packets_hw 4999 The point of the patchset is visibility of ECMP balance, and that is influenced by packet headers, not their payload. Correspondingly, we only include packet counters in the statistics, not byte counters. We also decided to model HW statistics as a nexthop group attribute, not an arbitrary nexthop one. The latter would count any traffic going through a given nexthop, regardless of which ECMP group it is in, or any at all. The reason is again hat the point of the patchset is ECMP balance visibility, not arbitrary inspection of how busy a particular nexthop is. Implementation of individual-nexthop statistics is certainly possible, and could well follow the general approach we are taking in this patchset. For resilient groups, per-bucket statistics could be done in a similar manner as well. This patchset contains the core code. mlxsw support will be sent in a follow-up patch set. This patchset progresses as follows: - Patches #1 and #2 add support for a new next-hop object attribute, NHA_OP_FLAGS. That is meant to carry various op-specific signaling, in particular whether SW- and HW-collected nexthop stats should be part of the get or dump response. The idea is to avoid wasting message space, and time for collection of HW statistics, when the values are not needed. - Patches #3 and #4 add SW-datapath stats and corresponding UAPI. - Patches #5, #6 and #7 add support fro HW-datapath stats and UAPI. Individual drivers still need to contribute the appropriate HW-specific support code. v4: - Patch #2: - s/nla_get_bitfield32/nla_get_u32/ in __nh_valid_dump_req(). v3: - Patch #3: - Convert to u64_stats_t - Patch #4: - Give a symbolic name to the set of all valid dump flags for the NHA_OP_FLAGS attribute. - Convert to u64_stats_t - Patch #6: - Use a named constant for the NHA_HW_STATS_ENABLE policy. v2: - Patch #2: - Change OP_FLAGS to u32, enforce through NLA_POLICY_MASK - Patch #3: - Set err on nexthop_create_group() error path - Patch #4: - Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS - Rename jump target in nla_put_nh_group_stats() to avoid having to rename further in the patchset. - Patch #7: - Use uint to encode NHA_GROUP_STATS_ENTRY_PACKETS_HW - Do not cancel outside of nesting in nla_put_nh_group_stats() ==================== Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: nexthop: Expose nexthop group HW stats to user spaceIdo Schimmel3-8/+149
Add netlink support for reading NH group hardware stats. Stats collection is done through a new notifier, NEXTHOP_EVENT_HW_STATS_REPORT_DELTA. Drivers that implement HW counters for a given NH group are thereby asked to collect the stats and report back to core by calling nh_grp_hw_stats_report_delta(). This is similar to what netdevice L3 stats do. Besides exposing number of packets that passed in the HW datapath, also include information on whether any driver actually realizes the counters. The core can tell based on whether it got any _report_delta() reports from the drivers. This allows enabling the statistics at the group at any time, with drivers opting into supporting them. This is also in line with what netdevice L3 stats are doing. So as not to waste time and space, tie the collection and reporting of HW stats with a new op flag, NHA_OP_FLAG_DUMP_HW_STATS. Co-developed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: Kees Cook <[email protected]> # For the __counted_by bits Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: nexthop: Add ability to enable / disable hardware statisticsIdo Schimmel3-1/+19
Add netlink support for enabling collection of HW statistics on nexthop groups. Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: nexthop: Add hardware statistics notificationsIdo Schimmel2-0/+5
Add hw_stats field to several notifier structures to communicate to the drivers that HW statistics should be configured for nexthops within a given group. Signed-off-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: nexthop: Expose nexthop group stats to user spaceIdo Schimmel2-8/+117
Add netlink support for reading NH group stats. This data is only for statistics of the traffic in the SW datapath. HW nexthop group statistics will be added in the following patches. Emission of the stats is keyed to a new op_stats flag to avoid cluttering the netlink message with stats if the user doesn't need them: NHA_OP_FLAG_DUMP_STATS. Co-developed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: nexthop: Add nexthop group entry statsIdo Schimmel2-4/+37
Add nexthop group entry stats to count the number of packets forwarded via each nexthop in the group. The stats will be exposed to user space for better data path observability in the next patch. The per-CPU stats pointer is placed at the beginning of 'struct nh_grp_entry', so that all the fields accessed for the data path reside on the same cache line: struct nh_grp_entry { struct nexthop * nh; /* 0 8 */ struct nh_grp_entry_stats * stats; /* 8 8 */ u8 weight; /* 16 1 */ /* XXX 7 bytes hole, try to pack */ union { struct { atomic_t upper_bound; /* 24 4 */ } hthr; /* 24 4 */ struct { struct list_head uw_nh_entry; /* 24 16 */ u16 count_buckets; /* 40 2 */ u16 wants_buckets; /* 42 2 */ } res; /* 24 24 */ }; /* 24 24 */ struct list_head nh_list; /* 48 16 */ /* --- cacheline 1 boundary (64 bytes) --- */ struct nexthop * nh_parent; /* 64 8 */ /* size: 72, cachelines: 2, members: 6 */ /* sum members: 65, holes: 1, sum holes: 7 */ /* last cacheline: 8 bytes */ }; Co-developed-by: Petr Machata <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: nexthop: Add NHA_OP_FLAGSPetr Machata2-4/+23
In order to add per-nexthop statistics, but still not increase netlink message size for consumers that do not care about them, there needs to be a toggle through which the user indicates their desire to get the statistics. To that end, add a new attribute, NHA_OP_FLAGS. The idea is to be able to use the attribute for carrying of arbitrary operation-specific flags, i.e. not make it specific for get / dump. Add the new attribute to get and dump policies, but do not actually allow any flags yet -- those will come later as the flags themselves are defined. Add the necessary parsing code. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: nexthop: Adjust netlink policy parsing for a new attributePetr Machata1-30/+28
A following patch will introduce a new attribute, op-specific flags to adjust the behavior of an operation. Different operations will recognize different flags. - To make the differentiation possible, stop sharing the policies for get and del operations. - To allow querying for presence of the attribute, have all the attribute arrays sized to NHA_MAX, regardless of what is permitted by policy, and pass the corresponding value to nlmsg_parse() as well. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08octeontx2-pf: Add TC flower offload support for TCP flagsSai Krishna5-2/+23
This patch adds TC offload support for matching TCP flags from TCP header. Example usage: tc qdisc add dev eth0 ingress TC rule to drop the TCP SYN packets: tc filter add dev eth0 ingress protocol ip flower ip_proto tcp tcp_flags 0x02/0x3f skip_sw action drop Signed-off-by: Sai Krishna <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08tcp: Add skb addr and sock addr to arguments of tracepoint tcp_probe.fuyuanli1-2/+8
It is useful to expose skb addr and sock addr to user in tracepoint tcp_probe, so that we can get more information while monitoring receiving of tcp data, by ebpf or other ways. For example, we need to identify a packet by seq and end_seq when calculate transmit latency between layer 2 and layer 4 by ebpf, but which is not available in tcp_probe, so we can only use kprobe hooking tcp_rcv_established to get them. But we can use tcp_probe directly if skb addr and sock addr are available, which is more efficient. Signed-off-by: fuyuanli <[email protected]> Reviewed-by: Jason Xing <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: dqs: add NIC stall detector based on BQLJakub Kicinski5-0/+237
softnet_data->time_squeeze is sometimes used as a proxy for host overload or indication of scheduling problems. In practice this statistic is very noisy and has hard to grasp units - e.g. is 10 squeezes a second to be expected, or high? Delaying network (NAPI) processing leads to drops on NIC queues but also RTT bloat, impacting pacing and CA decisions. Stalls are a little hard to detect on the Rx side, because there may simply have not been any packets received in given period of time. Packet timestamps help a little bit, but again we don't know if packets are stale because we're not keeping up or because someone (*cough* cgroups) disabled IRQs for a long time. We can, however, use Tx as a proxy for Rx stalls. Most drivers use combined Rx+Tx NAPIs so if Tx gets starved so will Rx. On the Tx side we know exactly when packets get queued, and completed, so there is no uncertainty. This patch adds stall checks to BQL. Why BQL? Because it's a convenient place to add such checks, already called by most drivers, and it has copious free space in its structures (this patch adds no extra cache references or dirtying to the fast path). The algorithm takes one parameter - max delay AKA stall threshold and increments a counter whenever NAPI got delayed for at least that amount of time. It also records the length of the longest stall. To be precise every time NAPI has not polled for at least stall thrs we check if there were any Tx packets queued between last NAPI run and now - stall_thrs/2. Unlike the classic Tx watchdog this mechanism does not ignore stalls caused by Tx being disabled, or loss of link. I don't think the check is worth the complexity, and stall is a stall, whether due to host overload, flow control, link down... doesn't matter much to the application. We have been running this detector in production at Meta for 2 years, with the threshold of 8ms. It's the lowest value where false positives become rare. There's still a constant stream of reported stalls (especially without the ksoftirqd deferral patches reverted), those who like their stall metrics to be 0 may prefer higher value. Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-03-08net: chelsio: remove unused function calc_tx_descsColin Ian King1-14/+0
The inlined helper function calc_tx_descs is not used and is redundant. Remove it. Cleans up clang scan build warning: drivers/net/ethernet/chelsio/cxgb4/sge.c:814:28: warning: unused function 'calc_tx_descs' [-Wunused-function] Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: David S. Miller <[email protected]>