diff options
author | Linus Torvalds <torvalds@linux-foundation.org> | 2022-12-13 15:47:48 -0800 |
---|---|---|
committer | Linus Torvalds <torvalds@linux-foundation.org> | 2022-12-13 15:47:48 -0800 |
commit | 7e68dd7d07a28faa2e6574dd6b9dbd90cdeaae91 (patch) | |
tree | ae0427c5a3b905f24b3a44b510a9bcf35d9b67a3 /drivers/net/ethernet/ibm/ibmvnic.c | |
parent | 1ca06f1c1acecbe02124f14a37cce347b8c1a90c (diff) | |
parent | 7c4a6309e27f411743817fe74a832ec2d2798a4b (diff) |
Merge tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"Core:
- Allow live renaming when an interface is up
- Add retpoline wrappers for tc, improving considerably the
performances of complex queue discipline configurations
- Add inet drop monitor support
- A few GRO performance improvements
- Add infrastructure for atomic dev stats, addressing long standing
data races
- De-duplicate common code between OVS and conntrack offloading
infrastructure
- A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements
- Netfilter: introduce packet parser for tunneled packets
- Replace IPVS timer-based estimators with kthreads to scale up the
workload with the number of available CPUs
- Add the helper support for connection-tracking OVS offload
BPF:
- Support for user defined BPF objects: the use case is to allocate
own objects, build own object hierarchies and use the building
blocks to build own data structures flexibly, for example, linked
lists in BPF
- Make cgroup local storage available to non-cgroup attached BPF
programs
- Avoid unnecessary deadlock detection and failures wrt BPF task
storage helpers
- A relevant bunch of BPF verifier fixes and improvements
- Veristat tool improvements to support custom filtering, sorting,
and replay of results
- Add LLVM disassembler as default library for dumping JITed code
- Lots of new BPF documentation for various BPF maps
- Add bpf_rcu_read_{,un}lock() support for sleepable programs
- Add RCU grace period chaining to BPF to wait for the completion of
access from both sleepable and non-sleepable BPF programs
- Add support storing struct task_struct objects as kptrs in maps
- Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
values
- Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions
Protocols:
- TCP: implement Protective Load Balancing across switch links
- TCP: allow dynamically disabling TCP-MD5 static key, reverting back
to fast[er]-path
- UDP: Introduce optional per-netns hash lookup table
- IPv6: simplify and cleanup sockets disposal
- Netlink: support different type policies for each generic netlink
operation
- MPTCP: add MSG_FASTOPEN and FastOpen listener side support
- MPTCP: add netlink notification support for listener sockets events
- SCTP: add VRF support, allowing sctp sockets binding to VRF devices
- Add bridging MAC Authentication Bypass (MAB) support
- Extensions for Ethernet VPN bridging implementation to better
support multicast scenarios
- More work for Wi-Fi 7 support, comprising conversion of all the
existing drivers to internal TX queue usage
- IPSec: introduce a new offload type (packet offload) allowing
complete header processing and crypto offloading
- IPSec: extended ack support for more descriptive XFRM error
reporting
- RXRPC: increase SACK table size and move processing into a
per-local endpoint kernel thread, reducing considerably the
required locking
- IEEE 802154: synchronous send frame and extended filtering support,
initial support for scanning available 15.4 networks
- Tun: bump the link speed from 10Mbps to 10Gbps
- Tun/VirtioNet: implement UDP segmentation offload support
Driver API:
- PHY/SFP: improve power level switching between standard level 1 and
the higher power levels
- New API for netdev <-> devlink_port linkage
- PTP: convert existing drivers to new frequency adjustment
implementation
- DSA: add support for rx offloading
- Autoload DSA tagging driver when dynamically changing protocol
- Add new PCP and APPTRUST attributes to Data Center Bridging
- Add configuration support for 800Gbps link speed
- Add devlink port function attribute to enable/disable RoCE and
migratable
- Extend devlink-rate to support strict prioriry and weighted fair
queuing
- Add devlink support to directly reading from region memory
- New device tree helper to fetch MAC address from nvmem
- New big TCP helper to simplify temporary header stripping
New hardware / drivers:
- Ethernet:
- Marvel Octeon CNF95N and CN10KB Ethernet Switches
- Marvel Prestera AC5X Ethernet Switch
- WangXun 10 Gigabit NIC
- Motorcomm yt8521 Gigabit Ethernet
- Microchip ksz9563 Gigabit Ethernet Switch
- Microsoft Azure Network Adapter
- Linux Automation 10Base-T1L adapter
- PHY:
- Aquantia AQR112 and AQR412
- Motorcomm YT8531S
- PTP:
- Orolia ART-CARD
- WiFi:
- MediaTek Wi-Fi 7 (802.11be) devices
- RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB
devices
- Bluetooth:
- Broadcom BCM4377/4378/4387 Bluetooth chipsets
- Realtek RTL8852BE and RTL8723DS
- Cypress.CYW4373A0 WiFi + Bluetooth combo device
Drivers:
- CAN:
- gs_usb: bus error reporting support
- kvaser_usb: listen only and bus error reporting support
- Ethernet NICs:
- Intel (100G):
- extend action skbedit to RX queue mapping
- implement devlink-rate support
- support direct read from memory
- nVidia/Mellanox (mlx5):
- SW steering improvements, increasing rules update rate
- Support for enhanced events compression
- extend H/W offload packet manipulation capabilities
- implement IPSec packet offload mode
- nVidia/Mellanox (mlx4):
- better big TCP support
- Netronome Ethernet NICs (nfp):
- IPsec offload support
- add support for multicast filter
- Broadcom:
- RSS and PTP support improvements
- AMD/SolarFlare:
- netlink extened ack improvements
- add basic flower matches to offload, and related stats
- Virtual NICs:
- ibmvnic: introduce affinity hint support
- small / embedded:
- FreeScale fec: add initial XDP support
- Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood
- TI am65-cpsw: add suspend/resume support
- Mediatek MT7986: add RX wireless wthernet dispatch support
- Realtek 8169: enable GRO software interrupt coalescing per
default
- Ethernet high-speed switches:
- Microchip (sparx5):
- add support for Sparx5 TC/flower H/W offload via VCAP
- Mellanox mlxsw:
- add 802.1X and MAC Authentication Bypass offload support
- add ip6gre support
- Embedded Ethernet switches:
- Mediatek (mtk_eth_soc):
- improve PCS implementation, add DSA untag support
- enable flow offload support
- Renesas:
- add rswitch R-Car Gen4 gPTP support
- Microchip (lan966x):
- add full XDP support
- add TC H/W offload via VCAP
- enable PTP on bridge interfaces
- Microchip (ksz8):
- add MTU support for KSZ8 series
- Qualcomm 802.11ax WiFi (ath11k):
- support configuring channel dwell time during scan
- MediaTek WiFi (mt76):
- enable Wireless Ethernet Dispatch (WED) offload support
- add ack signal support
- enable coredump support
- remain_on_channel support
- Intel WiFi (iwlwifi):
- enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities
- 320 MHz channels support
- RealTek WiFi (rtw89):
- new dynamic header firmware format support
- wake-over-WLAN support"
* tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits)
ipvs: fix type warning in do_div() on 32 bit
net: lan966x: Remove a useless test in lan966x_ptp_add_trap()
net: ipa: add IPA v4.7 support
dt-bindings: net: qcom,ipa: Add SM6350 compatible
bnxt: Use generic HBH removal helper in tx path
IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver
selftests: forwarding: Add bridge MDB test
selftests: forwarding: Rename bridge_mdb test
bridge: mcast: Support replacement of MDB port group entries
bridge: mcast: Allow user space to specify MDB entry routing protocol
bridge: mcast: Allow user space to add (*, G) with a source list and filter mode
bridge: mcast: Add support for (*, G) with a source list and filter mode
bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source
bridge: mcast: Add a flag for user installed source entries
bridge: mcast: Expose __br_multicast_del_group_src()
bridge: mcast: Expose br_multicast_new_group_src()
bridge: mcast: Add a centralized error path
bridge: mcast: Place netlink policy before validation functions
bridge: mcast: Split (*, G) and (S, G) addition into different functions
bridge: mcast: Do not derive entry type from its filter mode
...
Diffstat (limited to 'drivers/net/ethernet/ibm/ibmvnic.c')
-rw-r--r-- | drivers/net/ethernet/ibm/ibmvnic.c | 239 |
1 files changed, 238 insertions, 1 deletions
diff --git a/drivers/net/ethernet/ibm/ibmvnic.c b/drivers/net/ethernet/ibm/ibmvnic.c index 9282381a438f..e19a6bb3f444 100644 --- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -68,6 +68,7 @@ #include <linux/workqueue.h> #include <linux/if_vlan.h> #include <linux/utsname.h> +#include <linux/cpu.h> #include "ibmvnic.h" @@ -171,6 +172,193 @@ static int send_version_xchg(struct ibmvnic_adapter *adapter) return ibmvnic_send_crq(adapter, &crq); } +static void ibmvnic_clean_queue_affinity(struct ibmvnic_adapter *adapter, + struct ibmvnic_sub_crq_queue *queue) +{ + if (!(queue && queue->irq)) + return; + + cpumask_clear(queue->affinity_mask); + + if (irq_set_affinity_and_hint(queue->irq, NULL)) + netdev_warn(adapter->netdev, + "%s: Clear affinity failed, queue addr = %p, IRQ = %d\n", + __func__, queue, queue->irq); +} + +static void ibmvnic_clean_affinity(struct ibmvnic_adapter *adapter) +{ + struct ibmvnic_sub_crq_queue **rxqs; + struct ibmvnic_sub_crq_queue **txqs; + int num_rxqs, num_txqs; + int rc, i; + + rc = 0; + rxqs = adapter->rx_scrq; + txqs = adapter->tx_scrq; + num_txqs = adapter->num_active_tx_scrqs; + num_rxqs = adapter->num_active_rx_scrqs; + + netdev_dbg(adapter->netdev, "%s: Cleaning irq affinity hints", __func__); + if (txqs) { + for (i = 0; i < num_txqs; i++) + ibmvnic_clean_queue_affinity(adapter, txqs[i]); + } + if (rxqs) { + for (i = 0; i < num_rxqs; i++) + ibmvnic_clean_queue_affinity(adapter, rxqs[i]); + } +} + +static int ibmvnic_set_queue_affinity(struct ibmvnic_sub_crq_queue *queue, + unsigned int *cpu, int *stragglers, + int stride) +{ + cpumask_var_t mask; + int i; + int rc = 0; + + if (!(queue && queue->irq)) + return rc; + + /* cpumask_var_t is either a pointer or array, allocation works here */ + if (!zalloc_cpumask_var(&mask, GFP_KERNEL)) + return -ENOMEM; + + /* while we have extra cpu give one extra to this irq */ + if (*stragglers) { + stride++; + (*stragglers)--; + } + /* atomic write is safer than writing bit by bit directly */ + for (i = 0; i < stride; i++) { + cpumask_set_cpu(*cpu, mask); + *cpu = cpumask_next_wrap(*cpu, cpu_online_mask, + nr_cpu_ids, false); + } + /* set queue affinity mask */ + cpumask_copy(queue->affinity_mask, mask); + rc = irq_set_affinity_and_hint(queue->irq, queue->affinity_mask); + free_cpumask_var(mask); + + return rc; +} + +/* assumes cpu read lock is held */ +static void ibmvnic_set_affinity(struct ibmvnic_adapter *adapter) +{ + struct ibmvnic_sub_crq_queue **rxqs = adapter->rx_scrq; + struct ibmvnic_sub_crq_queue **txqs = adapter->tx_scrq; + struct ibmvnic_sub_crq_queue *queue; + int num_rxqs = adapter->num_active_rx_scrqs; + int num_txqs = adapter->num_active_tx_scrqs; + int total_queues, stride, stragglers, i; + unsigned int num_cpu, cpu; + int rc = 0; + + netdev_dbg(adapter->netdev, "%s: Setting irq affinity hints", __func__); + if (!(adapter->rx_scrq && adapter->tx_scrq)) { + netdev_warn(adapter->netdev, + "%s: Set affinity failed, queues not allocated\n", + __func__); + return; + } + + total_queues = num_rxqs + num_txqs; + num_cpu = num_online_cpus(); + /* number of cpu's assigned per irq */ + stride = max_t(int, num_cpu / total_queues, 1); + /* number of leftover cpu's */ + stragglers = num_cpu >= total_queues ? num_cpu % total_queues : 0; + /* next available cpu to assign irq to */ + cpu = cpumask_next(-1, cpu_online_mask); + + for (i = 0; i < num_txqs; i++) { + queue = txqs[i]; + rc = ibmvnic_set_queue_affinity(queue, &cpu, &stragglers, + stride); + if (rc) + goto out; + + if (!queue) + continue; + + rc = __netif_set_xps_queue(adapter->netdev, + cpumask_bits(queue->affinity_mask), + i, XPS_CPUS); + if (rc) + netdev_warn(adapter->netdev, "%s: Set XPS on queue %d failed, rc = %d.\n", + __func__, i, rc); + } + + for (i = 0; i < num_rxqs; i++) { + queue = rxqs[i]; + rc = ibmvnic_set_queue_affinity(queue, &cpu, &stragglers, + stride); + if (rc) + goto out; + } + +out: + if (rc) { + netdev_warn(adapter->netdev, + "%s: Set affinity failed, queue addr = %p, IRQ = %d, rc = %d.\n", + __func__, queue, queue->irq, rc); + ibmvnic_clean_affinity(adapter); + } +} + +static int ibmvnic_cpu_online(unsigned int cpu, struct hlist_node *node) +{ + struct ibmvnic_adapter *adapter; + + adapter = hlist_entry_safe(node, struct ibmvnic_adapter, node); + ibmvnic_set_affinity(adapter); + return 0; +} + +static int ibmvnic_cpu_dead(unsigned int cpu, struct hlist_node *node) +{ + struct ibmvnic_adapter *adapter; + + adapter = hlist_entry_safe(node, struct ibmvnic_adapter, node_dead); + ibmvnic_set_affinity(adapter); + return 0; +} + +static int ibmvnic_cpu_down_prep(unsigned int cpu, struct hlist_node *node) +{ + struct ibmvnic_adapter *adapter; + + adapter = hlist_entry_safe(node, struct ibmvnic_adapter, node); + ibmvnic_clean_affinity(adapter); + return 0; +} + +static enum cpuhp_state ibmvnic_online; + +static int ibmvnic_cpu_notif_add(struct ibmvnic_adapter *adapter) +{ + int ret; + + ret = cpuhp_state_add_instance_nocalls(ibmvnic_online, &adapter->node); + if (ret) + return ret; + ret = cpuhp_state_add_instance_nocalls(CPUHP_IBMVNIC_DEAD, + &adapter->node_dead); + if (!ret) + return ret; + cpuhp_state_remove_instance_nocalls(ibmvnic_online, &adapter->node); + return ret; +} + +static void ibmvnic_cpu_notif_remove(struct ibmvnic_adapter *adapter) +{ + cpuhp_state_remove_instance_nocalls(ibmvnic_online, &adapter->node); + cpuhp_state_remove_instance_nocalls(CPUHP_IBMVNIC_DEAD, + &adapter->node_dead); +} + static long h_reg_sub_crq(unsigned long unit_address, unsigned long token, unsigned long length, unsigned long *number, unsigned long *irq) @@ -3626,6 +3814,8 @@ static int reset_sub_crq_queues(struct ibmvnic_adapter *adapter) if (!adapter->tx_scrq || !adapter->rx_scrq) return -EINVAL; + ibmvnic_clean_affinity(adapter); + for (i = 0; i < adapter->req_tx_queues; i++) { netdev_dbg(adapter->netdev, "Re-setting tx_scrq[%d]\n", i); rc = reset_one_sub_crq_queue(adapter, adapter->tx_scrq[i]); @@ -3675,6 +3865,7 @@ static void release_sub_crq_queue(struct ibmvnic_adapter *adapter, dma_unmap_single(dev, scrq->msg_token, 4 * PAGE_SIZE, DMA_BIDIRECTIONAL); free_pages((unsigned long)scrq->msgs, 2); + free_cpumask_var(scrq->affinity_mask); kfree(scrq); } @@ -3695,6 +3886,8 @@ static struct ibmvnic_sub_crq_queue *init_sub_crq_queue(struct ibmvnic_adapter dev_warn(dev, "Couldn't allocate crq queue messages page\n"); goto zero_page_failed; } + if (!zalloc_cpumask_var(&scrq->affinity_mask, GFP_KERNEL)) + goto cpumask_alloc_failed; scrq->msg_token = dma_map_single(dev, scrq->msgs, 4 * PAGE_SIZE, DMA_BIDIRECTIONAL); @@ -3747,6 +3940,8 @@ reg_failed: dma_unmap_single(dev, scrq->msg_token, 4 * PAGE_SIZE, DMA_BIDIRECTIONAL); map_failed: + free_cpumask_var(scrq->affinity_mask); +cpumask_alloc_failed: free_pages((unsigned long)scrq->msgs, 2); zero_page_failed: kfree(scrq); @@ -3758,6 +3953,7 @@ static void release_sub_crqs(struct ibmvnic_adapter *adapter, bool do_h_free) { int i; + ibmvnic_clean_affinity(adapter); if (adapter->tx_scrq) { for (i = 0; i < adapter->num_active_tx_scrqs; i++) { if (!adapter->tx_scrq[i]) @@ -4035,6 +4231,11 @@ static int init_sub_crq_irqs(struct ibmvnic_adapter *adapter) goto req_rx_irq_failed; } } + + cpus_read_lock(); + ibmvnic_set_affinity(adapter); + cpus_read_unlock(); + return rc; req_rx_irq_failed: @@ -6152,10 +6353,19 @@ static int ibmvnic_probe(struct vio_dev *dev, const struct vio_device_id *id) } dev_info(&dev->dev, "ibmvnic registered\n"); + rc = ibmvnic_cpu_notif_add(adapter); + if (rc) { + netdev_err(netdev, "Registering cpu notifier failed\n"); + goto cpu_notif_add_failed; + } + complete(&adapter->probe_done); return 0; +cpu_notif_add_failed: + unregister_netdev(netdev); + ibmvnic_register_fail: device_remove_file(&dev->dev, &dev_attr_failover); @@ -6206,6 +6416,8 @@ static void ibmvnic_remove(struct vio_dev *dev) spin_unlock_irqrestore(&adapter->state_lock, flags); + ibmvnic_cpu_notif_remove(adapter); + flush_work(&adapter->ibmvnic_reset); flush_delayed_work(&adapter->ibmvnic_delayed_reset); @@ -6336,15 +6548,40 @@ static struct vio_driver ibmvnic_driver = { /* module functions */ static int __init ibmvnic_module_init(void) { + int ret; + + ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "net/ibmvnic:online", + ibmvnic_cpu_online, + ibmvnic_cpu_down_prep); + if (ret < 0) + goto out; + ibmvnic_online = ret; + ret = cpuhp_setup_state_multi(CPUHP_IBMVNIC_DEAD, "net/ibmvnic:dead", + NULL, ibmvnic_cpu_dead); + if (ret) + goto err_dead; + + ret = vio_register_driver(&ibmvnic_driver); + if (ret) + goto err_vio_register; + pr_info("%s: %s %s\n", ibmvnic_driver_name, ibmvnic_driver_string, IBMVNIC_DRIVER_VERSION); - return vio_register_driver(&ibmvnic_driver); + return 0; +err_vio_register: + cpuhp_remove_multi_state(CPUHP_IBMVNIC_DEAD); +err_dead: + cpuhp_remove_multi_state(ibmvnic_online); +out: + return ret; } static void __exit ibmvnic_module_exit(void) { vio_unregister_driver(&ibmvnic_driver); + cpuhp_remove_multi_state(CPUHP_IBMVNIC_DEAD); + cpuhp_remove_multi_state(ibmvnic_online); } module_init(ibmvnic_module_init); |