aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-01-27Merge branch 'bnxt_en-next'David S. Miller8-58/+277
Michael Chan says: ==================== bnxt_en: Updates for net-next. This patch-set includes link up and link initialization improvements, RSS and aRFS improvements, devlink refactoring and registration improvements, devlink info support including documentation. v2: Removed the TC ingress rate limiting patch. The developer Harsha needs to rework some code. Use fw.psid suggested by Jakub Kicinski. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-27devlink: document devlink info versions reported by bnxt_en driverVasundhara Volam1-0/+33
Add the set of info versions reported by bnxt_en driver, including a description of what the version represents, and what modes (fixed, running, stored) it reports. v2: Use fw.psid. Cc: Jiri Pirko <[email protected]> Cc: Jakub Kicinski <[email protected]> Signed-off-by: Vasundhara Volam <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27bnxt_en: Add support for devlink info commandVasundhara Volam2-0/+138
Display the following information via devlink info command: - Driver name - Board id - Broad revision - Board Serial number - Board FW version - FW parameter set version - FW App version - FW management version - FW RoCE version Standard output example: $ devlink dev info pci/0000:3b:00.0 pci/0000:3b:00.0: driver bnxt_en serial_number 00-10-18-FF-FE-AD-05-00 versions: fixed: asic.id D802 asic.rev 1 running: fw 216.1.124.0 fw.psid 0.0.0 fw.app 216.1.122.0 fw.mgmt 864.0.32.0 fw.roce 216.1.15.0 [ This version has incorporated changes suggested by Jakub Kicinski to use generic devlink version tags. ] v2: Use fw.psid Cc: Jiri Pirko <[email protected]> Cc: Jakub Kicinski <[email protected]> Signed-off-by: Vasundhara Volam <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27devlink: add macro for "fw.roce"Vasundhara Volam2-0/+8
Add definition and documentation for the new generic info "fw.roce". v2: Remove board.nvm_cfg since fw.psid is similar. Cc: Jiri Pirko <[email protected]> Cc: Jakub Kicinski <[email protected]> Signed-off-by: Vasundhara Volam <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27bnxt_en: Rename switch_id to dsnVasundhara Volam3-6/+6
Instead of switch_id, renaming it to dsn will be more meaningful so that it can be used to display device serial number in follow up patch via devlink_info command. Signed-off-by: Vasundhara Volam <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27bnxt_en: Add support to update progress of flash updateVasundhara Volam1-1/+13
This patch adds status notification to devlink flash update while flashing is in progress. $ devlink dev flash pci/0000:05:00.0 file 103.pkg Preparing to flash Flashing done Cc: Jiri Pirko <[email protected]> Signed-off-by: Vasundhara Volam <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27bnxt_en: Move devlink_register before registering netdevVasundhara Volam2-5/+8
Latest kernels get the phys_port_name via devlink, if ndo_get_phys_port_name is not defined. To provide the phys_port_name correctly, register devlink before registering netdev. Also call devlink_port_type_eth_set() after registering netdev as devlink port updates the netdev structure and notifies user. Cc: Jiri Pirko <[email protected]> Signed-off-by: Vasundhara Volam <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27bnxt_en: Register devlink irrespective of firmware spec versionVasundhara Volam1-5/+6
This will allow to register for devlink port and use port features. Also register params only if firmware spec version is at least 0x10600 which will support reading/setting numbered variables in NVRAM. Signed-off-by: Vasundhara Volam <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27bnxt_en: Refactor bnxt_dl_register()Vasundhara Volam1-24/+36
Define bnxt_dl_params_register() and bnxt_dl_params_unregister() functions and move params register/unregister code to these newly defined functions. This patch is in preparation to register devlink irrespective of firmware spec. version in the next patch. Signed-off-by: Vasundhara Volam <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27bnxt_en: Disable workaround for lost interrupts on 575XX B0 and newer chips.Michael Chan2-1/+5
The hardware bug has been fixed on B0 and newer chips, so disable the workaround on these chips. Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27bnxt_en: Periodically check and remove aged-out ntuple filtersPavan Chebbi1-0/+7
Currently the only time we check and remove expired filters is when we are inserting new filters. Improving the aRFS expiry handling by adding code to do the above work periodically. Signed-off-by: Pavan Chebbi <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27bnxt_en: Do not accept fragments for aRFS flow steering.Michael Chan1-2/+4
In bnxt_rx_flow_steer(), if the dissected packet is a fragment, do not proceed to create the ntuple filter and return error instead. Otherwise we would create a filter with 0 source and destination ports because the dissected ports would not be available for fragments. Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27bnxt_en: Support UDP RSS hashing on 575XX chips.Michael Chan1-1/+1
575XX (P5) chips have the same UDP RSS hashing capability as P4 chips, so we can enable it on P5 chips. Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27bnxt_en: Remove the setting of dev_port.Michael Chan1-1/+0
The dev_port is meant to distinguish the network ports belonging to the same PCI function. Our devices only have one network port associated with each PCI function and so we should not set it for correctness. Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27bnxt_en: Improve bnxt_probe_phy().Michael Chan1-3/+3
If the 2nd parameter fw_dflt is not set, we are calling bnxt_probe_phy() after the firmware has reset. There is no need to query the current PHY settings from firmware as these settings may be different from the ethtool settings that the driver will re-establish later. So return earlier in bnxt_probe_phy() to save one firmware call. Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27bnxt_en: Improve link up detection.Michael Chan2-9/+9
In bnxt_update_phy_setting(), ethtool_get_link_ksettings() and bnxt_disable_an_for_lpbk(), we inconsistently use netif_carrier_ok() to determine link. Instead, we should use bp->link_info.link_up which has the true link state. The netif_carrier state may be off during self-test and while the device is being reset and may not always reflect the true link state. By always using bp->link_info.link_up, the code is now more consistent and more correct. Some unnecessary link toggles are now prevented with this patch. Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27Merge branch 'ethtool-netlink-interface-part-2'David S. Miller13-34/+592
Michal Kubecek says: ==================== ethtool netlink interface, part 2 This shorter series adds support for getting and setting of wake-on-lan settings and message mask (originally message level). Together with the code already in net-next, this will allow full implementation of "ethtool <dev>" and "ethtool -s <dev> ...". Older versions of the ethtool netlink series allowed getting WoL settings by unprivileged users and only filtered out the password but this was a source of controversy so for now, ETHTOOL_MSG_WOL_GET request always requires CAP_NET_ADMIN as ETHTOOL_GWOL ioctl request does. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-27ethtool: add WOL_NTF notificationMichal Kubecek5-3/+10
Send ETHTOOL_MSG_WOL_NTF notification whenever wake-on-lan settings of a device are modified using ETHTOOL_MSG_WOL_SET netlink message or ETHTOOL_SWOL ioctl request. As notifications can be received by anyone, do not include SecureOn(tm) password in notification messages. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27ethtool: set wake-on-lan settings with WOL_SET requestMichal Kubecek5-1/+102
Implement WOL_SET netlink request to set wake-on-lan settings. This is equivalent to ETHTOOL_SWOL ioctl request. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27ethtool: provide WoL settings with WOL_GET requestMichal Kubecek10-2/+176
Implement WOL_GET request to get wake-on-lan settings for a device, traditionally available via ETHTOOL_GWOL ioctl request. As part of the implementation, provide symbolic names for wake-on-line modes as ETH_SS_WOL_MODES string set. Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27ethtool: add DEBUG_NTF notificationMichal Kubecek5-0/+7
Send ETHTOOL_MSG_DEBUG_NTF notification message whenever debugging message mask for a device are modified using ETHTOOL_MSG_DEBUG_SET netlink message or ETHTOOL_SMSGLVL ioctl request. The notification message has the same format as reply to DEBUG_GET request. As with other ethtool notifications, netlink requests only trigger the notification if the mask is actually changed while ioctl request trigger it whenever the request results in calling the ethtool_ops handler. Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27ethtool: set message mask with DEBUG_SET requestMichal Kubecek5-1/+79
Implement DEBUG_SET netlink request to set debugging settings for a device. At the moment, only message mask corresponding to message level as set by ETHTOOL_SMSGLVL ioctl request can be set. (It is called message level in ioctl interface but almost all drivers interpret it as a bit mask.) Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27ethtool: provide message mask with DEBUG_GET requestMichal Kubecek11-17/+205
Implement DEBUG_GET request to get debugging settings for a device. At the moment, only message mask corresponding to message level as reported by ETHTOOL_GMSGLVL ioctl request is provided. (It is called message level in ioctl interface but almost all drivers interpret it as a bit mask.) As part of the implementation, provide symbolic names for message mask bits as ETH_SS_MSG_CLASSES string set. Signed-off-by: Michal Kubecek <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27ethtool: fix kernel-doc descriptionsMichal Kubecek2-12/+15
Fix missing or incorrect function argument and struct member descriptions. Signed-off-by: Michal Kubecek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27Merge tag 'wireless-drivers-next-2020-01-26' of ↵David S. Miller136-1410/+3495
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.6 Second set of patches for v5.6. Nothing special standing out, smaller new features and fixes allover. Major changes: ar5523 * add support for SMCWUSBT-G2 USB device iwlwifi * support new versions of the FTM FW APIs * support new version of the beacon template FW API * print some extra information when the driver is loaded rtw88 * support wowlan feature for 8822c * add support for WIPHY_WOWLAN_NET_DETECT brcmfmac * add initial support for monitor mode qtnfmac * add module parameter to enable DFS offloading in firmware * add support for STA HE rates * add support for TWT responder and spatial reuse ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-27Merge branch 'for-upstream' of ↵David S. Miller25-204/+1019
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Johan Hedberg says: ==================== pull request: bluetooth-next 2020-01-26 Here's (probably) the last bluetooth-next pull request for the 5.6 kernel. - Initial pieces of Bluetooth 5.2 Isochronous Channels support - mgmt: Various cleanups and a new Set Blocked Keys command - btusb: Added support for 04ca:3021 QCA_ROME device - hci_qca: Multiple fixes & cleanups - hci_bcm: Fixes & improved device tree support - Fixed attempts to create duplicate debugfs entries Please let me know if there are any issues pulling. Thanks. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-27drivers: net: xgene: Fix the order of the arguments of 'alloc_etherdev_mqs()'Christophe JAILLET1-1/+1
'alloc_etherdev_mqs()' expects first 'tx', then 'rx'. The semantic here looks reversed. Reorder the arguments passed to 'alloc_etherdev_mqs()' in order to keep the correct semantic. In fact, this is a no-op because both XGENE_NUM_[RT]X_RING are 8. Fixes: 107dec2749fe ("drivers: net: xgene: Add support for multiple queues") Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27r8169: don't set min_mtu/max_mtu if not neededHeiner Kallweit1-7/+5
Defaults for min_mtu and max_mtu are set by ether_setup(), which is called from devm_alloc_etherdev(). Let rtl_jumbo_max() only return a positive value if actually jumbo packets are supported. This also allows to remove constant Jumbo_1K which is a little misleading anyway. Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27mlxsw: minimal: Fix an error handling path in 'mlxsw_m_port_create()'Christophe JAILLET1-1/+1
An 'alloc_etherdev()' called is not ballanced by a corresponding 'free_netdev()' call in one error handling path. Slighly reorder the error handling code to catch the missed case. Fixes: c100e47caa8e ("mlxsw: minimal: Add ethtool support") Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net: dsa: Fix use-after-free in probing of DSA switch treeVladimir Oltean1-11/+25
DSA sets up a switch tree little by little. Every switch of the N members of the tree calls dsa_register_switch, and (N - 1) will just touch the dst->ports list with their ports and quickly exit. Only the last switch that calls dsa_register_switch will find all DSA links complete in dsa_tree_setup_routing_table, and not return zero as a result but instead go ahead and set up the entire DSA switch tree (practically on behalf of the other switches too). The trouble is that the (N - 1) switches don't clean up after themselves after they get an error such as EPROBE_DEFER. Their footprint left in dst->ports by dsa_switch_touch_ports is still there. And switch N, the one responsible with actually setting up the tree, is going to work with those stale dp, dp->ds and dp->ds->dev pointers. In particular ds and ds->dev might get freed by the device driver. Be there a 2-switch tree and the following calling order: - Switch 1 calls dsa_register_switch - Calls dsa_switch_touch_ports, populates dst->ports - Calls dsa_port_parse_cpu, gets -EPROBE_DEFER, exits. - Switch 2 calls dsa_register_switch - Calls dsa_switch_touch_ports, populates dst->ports - Probe doesn't get deferred, so it goes ahead. - Calls dsa_tree_setup_routing_table, which returns "complete == true" due to Switch 1 having called dsa_switch_touch_ports before. - Because the DSA links are complete, it calls dsa_tree_setup_switches now. - dsa_tree_setup_switches iterates through dst->ports, initializing the Switch 1 ds structure (invalid) and the Switch 2 ds structure (valid). - Undefined behavior (use after free, sometimes NULL pointers, etc). Real example below (debugging prints added by me, as well as guards against NULL pointers): [ 5.477947] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.313002] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.319932] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.329693] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.339458] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.349226] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.358991] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.368758] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.378524] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.388291] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.398057] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803df0b980 (dev ffffff803f775c00) [ 6.407912] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.417682] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.427446] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.437212] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.446979] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.456744] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.466512] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.476277] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.486043] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.495810] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.505577] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803da02f80 (dev 0000000000000000) [ 6.515433] dsa_tree_setup_switches: Setting up port 0 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.354120] dsa_tree_setup_switches: Setting up port 1 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.361045] dsa_tree_setup_switches: Setting up port 2 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.370805] dsa_tree_setup_switches: Setting up port 3 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.380571] dsa_tree_setup_switches: Setting up port 4 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.390337] dsa_tree_setup_switches: Setting up port 5 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.400104] dsa_tree_setup_switches: Setting up port 6 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.409872] dsa_tree_setup_switches: Setting up port 7 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.419637] dsa_tree_setup_switches: Setting up port 8 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.429403] dsa_tree_setup_switches: Setting up port 9 of switch ffffff803db15b80 (dev ffffff803d8e4800) [ 7.439169] dsa_tree_setup_switches: Setting up port 10 of switch ffffff803db15b80 (dev ffffff803d8e4800) The solution is to recognize that the functions that call dsa_switch_touch_ports (dsa_switch_parse_of, dsa_switch_parse) have side effects, and therefore one should clean up their side effects on error path. The cleanup of dst->ports was taken from dsa_switch_remove and moved into a dedicated dsa_switch_release_ports function, which should really be per-switch (free only the members of dst->ports that are also members of ds, instead of all switch ports). Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net: remove eth_change_mtuHeiner Kallweit2-17/+0
All usage of this function was removed three years ago, and the function was marked as deprecated: a52ad514fdf3 ("net: deprecate eth_change_mtu, remove usage") So I think we can remove it now. Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27Merge branch 'XDP-fixes-for-socionext-driver'David S. Miller1-2/+2
Lorenzo Bianconi says: ==================== XDP fixes for socionext driver Fix possible user-after-in XDP rx path Fix rx statistics accounting if no bpf program is attached ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-27net: socionext: fix xdp_result initialization in netsec_process_rxLorenzo Bianconi1-1/+1
Fix xdp_result initialization in netsec_process_rx in order to not increase rx counters if there is no bpf program attached to the xdp hook and napi_gro_receive returns GRO_DROP Fixes: ba2b232108d3c ("net: netsec: add XDP support") Signed-off-by: Lorenzo Bianconi <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Acked-by: Ilias Apalodimas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net: socionext: fix possible user-after-free in netsec_process_rxLorenzo Bianconi1-1/+1
Fix possible use-after-free in in netsec_process_rx that can occurs if the first packet is sent to the normal networking stack and the following one is dropped by the bpf program attached to the xdp hook. Fix the issue defining the skb pointer in the 'budget' loop Fixes: ba2b232108d3c ("net: netsec: add XDP support") Signed-off-by: Lorenzo Bianconi <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Acked-by: Ilias Apalodimas <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27Merge branch 'net-allow-per-net-notifier-to-follow-netdev-into-namespace'David S. Miller10-42/+131
Jiri Pirko says: ==================== net: allow per-net notifier to follow netdev into namespace Currently we have per-net notifier, which allows to get only notifications relevant to particular network namespace. That is enough for drivers that have netdevs local in a particular namespace (cannot move elsewhere). However if netdev can change namespace, per-net notifier cannot be used. Introduce dev_net variant that is basically per-net notifier with an extension that re-registers the per-net notifier upon netdev namespace change. Basically the per-net notifier follows the netdev into namespace. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-27mlx5: Use dev_net netdevice notifier registrationsJiri Pirko8-10/+28
Register the dev_net notifier and allow the per-net notifier to follow the device into different namespace. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net: introduce dev_net notifier register/unregister variantsJiri Pirko2-0/+63
Introduce dev_net variants of netdev notifier register/unregister functions and allow per-net notifier to follow the netdevice into the namespace it is moved to. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net: push code from net notifier reg/unreg into helpersJiri Pirko1-22/+38
Push the code which is done under rtnl lock in net notifier register and unregister function into separate helpers. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net: call call_netdevice_unregister_net_notifiers from unregisterJiri Pirko1-11/+3
The function does the same thing as the existing code, so rather call call_netdevice_unregister_net_notifiers() instead of code duplication. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27soreuseport: Cleanup duplicate initialization of more_reuse->max_socks.Kuniyuki Iwashima1-1/+0
reuseport_grow() does not need to initialize the more_reuse->max_socks again. It is already initialized in __reuseport_alloc(). Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27Merge branch 'Support-fraglist-GRO-GSO'David S. Miller9-30/+217
Steffen Klassert says: ==================== Support fraglist GRO/GSO This patchset adds support to do GRO/GSO by chaining packets of the same flow at the SKB frag_list pointer. This avoids the overhead to merge payloads into one big packet, and on the other end, if GSO is needed it avoids the overhead of splitting the big packet back to the native form. Patch 1 adds netdev feature flags to enable fraglist GRO, this implements one of the configuration options discussed at netconf 2019. Patch 2 adds a netdev software feature set that defaults to off and assigns the new fraglist GRO feature flag to it. Patch 3 adds the core infrastructure to do fraglist GRO/GSO. Patch 4 enables UDP to use fraglist GRO/GSO if configured. I have only meaningful forwarding performance measurements. I did some tests for the local receive path with netperf and iperf, but in this case the sender that generates the packets is the bottleneck. So the benchmarks are not that meaningful for the receive path. Paolo Abeni did some benchmarks of the local receive path for the RFC v2 version of this pachset, results can be found here: https://www.spinics.net/lists/netdev/msg551158.html I used my IPsec forwarding test setup for the performance measurements: ------------ ------------ -->| router 1 |-------->| router 2 |-- | ------------ ------------ | | | | -------------------- | --------|Spirent Testcenter|<---------- -------------------- net-next (September 7th 2019): Single stream UDP frame size 1460 Bytes: 1.161.000 fps (13.5 Gbps). ---------------------------------------------------------------------- net-next (September 7th 2019) + standard UDP GRO/GSO (not implemented in this patchset): Single stream UDP frame size 1460 Bytes: 1.801.000 fps (21 Gbps). ---------------------------------------------------------------------- net-next (September 7th 2019) + fraglist UDP GRO/GSO: Single stream UDP frame size 1460 Bytes: 2.860.000 fps (33.4 Gbps). ======================================================================= net-next (January 23th 2020): Single stream UDP frame size 1460 Bytes: 919.000 fps (10.73 Gbps). ---------------------------------------------------------------------- net-next (January 23th 2020) + fraglist UDP GRO/GSO: Single stream UDP frame size 1460 Bytes: 2.430.000 fps (28.38 Gbps). ----------------------------------------------------------------------- Changes from RFC v1: - Add IPv6 support. - Split patchset to enable UDP GRO by default before adding fraglist GRO support. - Mark fraglist GRO packets as CHECKSUM_NONE. - Take a refcount on the first segment skb when doing fraglist segmentation. With this we can use the same error handling path as with standard segmentation. Changes from RFC v2: - Add a netdev feature flag to configure listifyed GRO. - Fix UDP GRO enabling for IPv6. - Fix a rcu_read_lock() imbalance. - Fix error path in skb_segment_list(). Changes from RFC v3: - Rename NETIF_F_GRO_LIST to NETIF_F_GRO_FRAGLIST and add NETIF_F_GSO_FRAGLIST. - Move introduction of SKB_GSO_FRAGLIST to patch 2. - Use udpv6_encap_needed_key instead of udp_encap_needed_key in IPv6. - Move some missplaced code from patch 5 to patch 1 where it belongs to. Changes from RFC v4: - Drop the 'UDP: enable GRO by default' patch for now. Standard UDP GRO is not changed with this patchset. - Rebase to net-next current. Changes fom v1 (December 18th): - Do a full __copy_skb_header instead of tryng to find the really needed subset header fields. Thisa can be done later. - Mark all fraglist GRO packets with CHECKSUM_UNNECESSARY. - Rebase to net-next current. Changes fom v2 (January 24th): - Do the CHECKSUM_UNNECESSARY setting from IPv4 for IPv6 too. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-27udp: Support UDP fraglist GRO/GSO.Steffen Klassert3-26/+107
This patch extends UDP GRO to support fraglist GRO/GSO by using the previously introduced infrastructure. If the feature is enabled, all UDP packets are going to fraglist GRO (local input and forward). After validating the csum, we mark ip_summed as CHECKSUM_UNNECESSARY for fraglist GRO packets to make sure that the csum is not touched. Signed-off-by: Steffen Klassert <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net: Support GRO/GSO fraglist chaining.Steffen Klassert4-2/+97
This patch adds the core functions to chain/unchain GSO skbs at the frag_list pointer. This also adds a new GSO type SKB_GSO_FRAGLIST and a is_flist flag to napi_gro_cb which indicates that this flow will be GROed by fraglist chaining. Signed-off-by: Steffen Klassert <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net: Add a netdev software feature set that defaults to off.Steffen Klassert2-1/+4
The previous patch added the NETIF_F_GRO_FRAGLIST feature. This is a software feature that should default to off. Current software features default to on, so add a new feature set that defaults to off. Signed-off-by: Steffen Klassert <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net: Add fraglist GRO/GSO feature flagsSteffen Klassert4-1/+9
This adds new Fraglist GRO/GSO feature flags. They will be used to configure fraglist GRO/GSO what will be implemented with some followup paches. Signed-off-by: Steffen Klassert <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27mvneta driver disallow XDP program on hardware buffer managementSven Auhagen1-0/+6
Recently XDP Support was added to the mvneta driver for software buffer management only. It is still possible to attach an XDP program if hardware buffer management is used. It is not doing anything at that point. The patch disallows attaching XDP programs to mvneta if hardware buffer management is used. I am sorry about that. It is my first submission and I am having some troubles with the format of my emails. v4 -> v5: - Remove extra tabs v3 -> v4: - Please ignore v3 I accidentally submitted my other patch with git-send-mail and v4 is correct v2 -> v3: - My mailserver corrupted the patch resubmission with git-send-email v1 -> v2: - Fixing the patches indentation Signed-off-by: Sven Auhagen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27rxrpc: Fix use-after-free in rxrpc_receive_data()David Howells1-5/+7
The subpacket scanning loop in rxrpc_receive_data() references the subpacket count in the private data part of the sk_buff in the loop termination condition. However, when the final subpacket is pasted into the ring buffer, the function is no longer has a ref on the sk_buff and should not be looking at sp->* any more. This point is actually marked in the code when skb is cleared (but sp is not - which is an error). Fix this by caching sp->nr_subpackets in a local variable and using that instead. Also clear 'sp' to catch accesses after that point. This can show up as an oops in rxrpc_get_skb() if sp->nr_subpackets gets trashed by the sk_buff getting freed and reused in the meantime. Fixes: e2de6c404898 ("rxrpc: Use info in skbuff instead of reparsing a jumbo packet") Signed-off-by: David Howells <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net_sched: ematch: reject invalid TCF_EM_SIMPLEEric Dumazet1-0/+3
It is possible for malicious userspace to set TCF_EM_SIMPLE bit even for matches that should not have this bit set. This can fool two places using tcf_em_is_simple() 1) tcf_em_tree_destroy() -> memory leak of em->data if ops->destroy() is NULL 2) tcf_em_tree_dump() wrongly report/leak 4 low-order bytes of a kernel pointer. BUG: memory leak unreferenced object 0xffff888121850a40 (size 32): comm "syz-executor927", pid 7193, jiffies 4294941655 (age 19.840s) hex dump (first 32 bytes): 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000f67036ea>] kmemleak_alloc_recursive include/linux/kmemleak.h:43 [inline] [<00000000f67036ea>] slab_post_alloc_hook mm/slab.h:586 [inline] [<00000000f67036ea>] slab_alloc mm/slab.c:3320 [inline] [<00000000f67036ea>] __do_kmalloc mm/slab.c:3654 [inline] [<00000000f67036ea>] __kmalloc_track_caller+0x165/0x300 mm/slab.c:3671 [<00000000fab0cc8e>] kmemdup+0x27/0x60 mm/util.c:127 [<00000000d9992e0a>] kmemdup include/linux/string.h:453 [inline] [<00000000d9992e0a>] em_nbyte_change+0x5b/0x90 net/sched/em_nbyte.c:32 [<000000007e04f711>] tcf_em_validate net/sched/ematch.c:241 [inline] [<000000007e04f711>] tcf_em_tree_validate net/sched/ematch.c:359 [inline] [<000000007e04f711>] tcf_em_tree_validate+0x332/0x46f net/sched/ematch.c:300 [<000000007a769204>] basic_set_parms net/sched/cls_basic.c:157 [inline] [<000000007a769204>] basic_change+0x1d7/0x5f0 net/sched/cls_basic.c:219 [<00000000e57a5997>] tc_new_tfilter+0x566/0xf70 net/sched/cls_api.c:2104 [<0000000074b68559>] rtnetlink_rcv_msg+0x3b2/0x4b0 net/core/rtnetlink.c:5415 [<00000000b7fe53fb>] netlink_rcv_skb+0x61/0x170 net/netlink/af_netlink.c:2477 [<00000000e83a40d0>] rtnetlink_rcv+0x1d/0x30 net/core/rtnetlink.c:5442 [<00000000d62ba933>] netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] [<00000000d62ba933>] netlink_unicast+0x223/0x310 net/netlink/af_netlink.c:1328 [<0000000088070f72>] netlink_sendmsg+0x2c0/0x570 net/netlink/af_netlink.c:1917 [<00000000f70b15ea>] sock_sendmsg_nosec net/socket.c:639 [inline] [<00000000f70b15ea>] sock_sendmsg+0x54/0x70 net/socket.c:659 [<00000000ef95a9be>] ____sys_sendmsg+0x2d0/0x300 net/socket.c:2330 [<00000000b650f1ab>] ___sys_sendmsg+0x8a/0xd0 net/socket.c:2384 [<0000000055bfa74a>] __sys_sendmsg+0x80/0xf0 net/socket.c:2417 [<000000002abac183>] __do_sys_sendmsg net/socket.c:2426 [inline] [<000000002abac183>] __se_sys_sendmsg net/socket.c:2424 [inline] [<000000002abac183>] __x64_sys_sendmsg+0x23/0x30 net/socket.c:2424 Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <[email protected]> Reported-by: [email protected] Cc: Cong Wang <[email protected]> Acked-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net: include struct nhmsg size in nh nlmsg sizeStephen Worley1-1/+3
Include the size of struct nhmsg size when calculating how much of a payload to allocate in a new netlink nexthop notification message. Without this, we will fail to fill the skbuff at certain nexthop group sizes. You can reproduce the failure with the following iproute2 commands: ip link add dummy1 type dummy ip link add dummy2 type dummy ip link add dummy3 type dummy ip link add dummy4 type dummy ip link add dummy5 type dummy ip link add dummy6 type dummy ip link add dummy7 type dummy ip link add dummy8 type dummy ip link add dummy9 type dummy ip link add dummy10 type dummy ip link add dummy11 type dummy ip link add dummy12 type dummy ip link add dummy13 type dummy ip link add dummy14 type dummy ip link add dummy15 type dummy ip link add dummy16 type dummy ip link add dummy17 type dummy ip link add dummy18 type dummy ip link add dummy19 type dummy ip ro add 1.1.1.1/32 dev dummy1 ip ro add 1.1.1.2/32 dev dummy2 ip ro add 1.1.1.3/32 dev dummy3 ip ro add 1.1.1.4/32 dev dummy4 ip ro add 1.1.1.5/32 dev dummy5 ip ro add 1.1.1.6/32 dev dummy6 ip ro add 1.1.1.7/32 dev dummy7 ip ro add 1.1.1.8/32 dev dummy8 ip ro add 1.1.1.9/32 dev dummy9 ip ro add 1.1.1.10/32 dev dummy10 ip ro add 1.1.1.11/32 dev dummy11 ip ro add 1.1.1.12/32 dev dummy12 ip ro add 1.1.1.13/32 dev dummy13 ip ro add 1.1.1.14/32 dev dummy14 ip ro add 1.1.1.15/32 dev dummy15 ip ro add 1.1.1.16/32 dev dummy16 ip ro add 1.1.1.17/32 dev dummy17 ip ro add 1.1.1.18/32 dev dummy18 ip ro add 1.1.1.19/32 dev dummy19 ip next add id 1 via 1.1.1.1 dev dummy1 ip next add id 2 via 1.1.1.2 dev dummy2 ip next add id 3 via 1.1.1.3 dev dummy3 ip next add id 4 via 1.1.1.4 dev dummy4 ip next add id 5 via 1.1.1.5 dev dummy5 ip next add id 6 via 1.1.1.6 dev dummy6 ip next add id 7 via 1.1.1.7 dev dummy7 ip next add id 8 via 1.1.1.8 dev dummy8 ip next add id 9 via 1.1.1.9 dev dummy9 ip next add id 10 via 1.1.1.10 dev dummy10 ip next add id 11 via 1.1.1.11 dev dummy11 ip next add id 12 via 1.1.1.12 dev dummy12 ip next add id 13 via 1.1.1.13 dev dummy13 ip next add id 14 via 1.1.1.14 dev dummy14 ip next add id 15 via 1.1.1.15 dev dummy15 ip next add id 16 via 1.1.1.16 dev dummy16 ip next add id 17 via 1.1.1.17 dev dummy17 ip next add id 18 via 1.1.1.18 dev dummy18 ip next add id 19 via 1.1.1.19 dev dummy19 ip next add id 1111 group 1/2/3/4/5/6/7/8/9/10/11/12/13/14/15/16/17/18/19 ip next del id 1111 Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: Stephen Worley <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-01-27net_sched: walk through all child classes in tc_bind_tclass()Cong Wang1-11/+30
In a complex TC class hierarchy like this: tc qdisc add dev eth0 root handle 1:0 cbq bandwidth 100Mbit \ avpkt 1000 cell 8 tc class add dev eth0 parent 1:0 classid 1:1 cbq bandwidth 100Mbit \ rate 6Mbit weight 0.6Mbit prio 8 allot 1514 cell 8 maxburst 20 \ avpkt 1000 bounded tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \ sport 80 0xffff flowid 1:3 tc filter add dev eth0 parent 1:0 protocol ip prio 1 u32 match ip \ sport 25 0xffff flowid 1:4 tc class add dev eth0 parent 1:1 classid 1:3 cbq bandwidth 100Mbit \ rate 5Mbit weight 0.5Mbit prio 5 allot 1514 cell 8 maxburst 20 \ avpkt 1000 tc class add dev eth0 parent 1:1 classid 1:4 cbq bandwidth 100Mbit \ rate 3Mbit weight 0.3Mbit prio 5 allot 1514 cell 8 maxburst 20 \ avpkt 1000 where filters are installed on qdisc 1:0, so we can't merely search from class 1:1 when creating class 1:3 and class 1:4. We have to walk through all the child classes of the direct parent qdisc. Otherwise we would miss filters those need reverse binding. Fixes: 07d79fc7d94e ("net_sched: add reverse binding for tc class") Cc: Jamal Hadi Salim <[email protected]> Cc: Jiri Pirko <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>