aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet
AgeCommit message (Collapse)AuthorFilesLines
2022-12-25treewide: Convert del_timer*() to timer_shutdown*()Steven Rostedt (Google)3-5/+5
Due to several bugs caused by timers being re-armed after they are shutdown and just before they are freed, a new state of timers was added called "shutdown". After a timer is set to this state, then it can no longer be re-armed. The following script was run to find all the trivial locations where del_timer() or del_timer_sync() is called in the same function that the object holding the timer is freed. It also ignores any locations where the timer->function is modified between the del_timer*() and the free(), as that is not considered a "trivial" case. This was created by using a coccinelle script and the following commands: $ cat timer.cocci @@ expression ptr, slab; identifier timer, rfield; @@ ( - del_timer(&ptr->timer); + timer_shutdown(&ptr->timer); | - del_timer_sync(&ptr->timer); + timer_shutdown_sync(&ptr->timer); ) ... when strict when != ptr->timer ( kfree_rcu(ptr, rfield); | kmem_cache_free(slab, ptr); | kfree(ptr); ) $ spatch timer.cocci . > /tmp/t.patch $ patch -p1 < /tmp/t.patch Link: https://lore.kernel.org/lkml/20221123201306.823305113@linutronix.de/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Pavel Machek <pavel@ucw.cz> [ LED ] Acked-by: Kalle Valo <kvalo@kernel.org> [ wireless ] Acked-by: Paolo Abeni <pabeni@redhat.com> [ networking ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-21Merge tag 'net-6.2-rc1' of ↵Linus Torvalds14-74/+216
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, netfilter and can. Current release - regressions: - bpf: synchronize dispatcher update with bpf_dispatcher_xdp_func - rxrpc: - fix security setting propagation - fix null-deref in rxrpc_unuse_local() - fix switched parameters in peer tracing Current release - new code bugs: - rxrpc: - fix I/O thread startup getting skipped - fix locking issues in rxrpc_put_peer_locked() - fix I/O thread stop - fix uninitialised variable in rxperf server - fix the return value of rxrpc_new_incoming_call() - microchip: vcap: fix initialization of value and mask - nfp: fix unaligned io read of capabilities word Previous releases - regressions: - stop in-kernel socket users from corrupting socket's task_frag - stream: purge sk_error_queue in sk_stream_kill_queues() - openvswitch: fix flow lookup to use unmasked key - dsa: mv88e6xxx: avoid reg_lock deadlock in mv88e6xxx_setup_port() - devlink: - hold region lock when flushing snapshots - protect devlink dump by the instance lock Previous releases - always broken: - bpf: - prevent leak of lsm program after failed attach - resolve fext program type when checking map compatibility - skbuff: account for tail adjustment during pull operations - macsec: fix net device access prior to holding a lock - bonding: switch back when high prio link up - netfilter: flowtable: really fix NAT IPv6 offload - enetc: avoid buffer leaks on xdp_do_redirect() failure - unix: fix race in SOCK_SEQPACKET's unix_dgram_sendmsg() - dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq" * tag 'net-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits) net: fec: check the return value of build_skb() net: simplify sk_page_frag Treewide: Stop corrupting socket's task_frag net: Introduce sk_use_task_frag in struct sock. mctp: Remove device type check at unregister net: dsa: microchip: remove IRQF_TRIGGER_FALLING in request_threaded_irq can: kvaser_usb: hydra: help gcc-13 to figure out cmd_len can: flexcan: avoid unbalanced pm_runtime_enable warning Documentation: devlink: add missing toc entry for etas_es58x devlink doc mctp: serial: Fix starting value for frame check sequence nfp: fix unaligned io read of capabilities word net: stream: purge sk_error_queue in sk_stream_kill_queues() myri10ge: Fix an error handling path in myri10ge_probe() net: microchip: vcap: Fix initialization of value and mask rxrpc: Fix the return value of rxrpc_new_incoming_call() rxrpc: rxperf: Fix uninitialised variable rxrpc: Fix I/O thread stop rxrpc: Fix switched parameters in peer tracing rxrpc: Fix locking issues in rxrpc_put_peer_locked() rxrpc: Fix I/O thread startup getting skipped ...
2022-12-20net: fec: check the return value of build_skb()Wei Fang1-0/+8
The build_skb might return a null pointer but there is no check on the return value in the fec_enet_rx_queue(). So a null pointer dereference might occur. To avoid this, we check the return value of build_skb. If the return value is a null pointer, the driver will recycle the page and update the statistic of ndev. Then jump to rx_processing_done to clear the status flags of the BD so that the hardware can recycle the BD. Fixes: 95698ff6177b ("net: fec: using page pool to manage RX buffers") Signed-off-by: Wei Fang <wei.fang@nxp.com> Reviewed-by: Shenwei Wang <Shenwei.wang@nxp.com> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20221219022755.1047573-1-wei.fang@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-19nfp: fix unaligned io read of capabilities wordHuanhuan Wang1-1/+1
The address of 32-bit extend capability is not qword aligned, and may cause exception in some arch. Fixes: 484963ce9f1e ("nfp: extend capability and control words") Signed-off-by: Huanhuan Wang <huanhuan.wang@corigine.com> Reviewed-by: Niklas Söderlund <niklas.soderlund@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19myri10ge: Fix an error handling path in myri10ge_probe()Christophe JAILLET1-0/+1
Some memory allocated in myri10ge_probe_slices() is not released in the error handling path of myri10ge_probe(). Add the corresponding kfree(), as already done in the remove function. Fixes: 0dcffac1a329 ("myri10ge: add multislices support") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-19net: microchip: vcap: Fix initialization of value and maskHoratiu Vultur1-0/+2
Fix the following smatch warning: smatch warnings: drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c:103 vcap_debugfs_show_rule_keyfield() error: uninitialized symbol 'value'. drivers/net/ethernet/microchip/vcap/vcap_api_debugfs.c:106 vcap_debugfs_show_rule_keyfield() error: uninitialized symbol 'mask'. In case the vcap field was VCAP_FIELD_U128 and the key was different than IP6_S/DIP then the value and mask were not initialized, therefore initialize them. Fixes: 610c32b2ce66 ("net: microchip: vcap: Add vcap_get_rule") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-18Merge branch '1GbE' of ↵David S. Miller4-40/+188
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-12-15 (igc) Muhammad Husaini Zulkifli says: This patch series fixes bugs for the Time-Sensitive Networking(TSN) Qbv Scheduling features. An overview of each patch series is given below: Patch 1: Using a first flag bit to schedule a packet to the next cycle if packet cannot fit in current Qbv cycle. Patch 2: Enable strict cycle for Qbv scheduling. Patch 3: Prevent user to set basetime less than zero during tc config. Patch 4: Allow the basetime enrollment with zero value. Patch 5: Calculate the new end time value to exclude the time interval that exceed the cycle time as user can specify the cycle time in tc config. Patch 6: Resolve the HW bugs where the gate is not fully closed. --- This contains the net patches from this original pull request: https://lore.kernel.org/netdev/20221205212414.3197525-1-anthony.l.nguyen@intel.com/ ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-16net: ethernet: ti: am65-cpsw: fix CONFIG_PM #ifdefArnd Bergmann1-3/+1
The #ifdef check is incorrect and leads to a warning: drivers/net/ethernet/ti/am65-cpsw-nuss.c:1679:13: error: 'am65_cpsw_nuss_remove_rx_chns' defined but not used [-Werror=unused-function] 1679 | static void am65_cpsw_nuss_remove_rx_chns(void *data) It's better to remove the #ifdef here and use the modern SYSTEM_SLEEP_PM_OPS() macro instead. Fixes: 24bc19b05f1f ("net: ethernet: ti: am65-cpsw: Add suspend/resume support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://lore.kernel.org/r/20221215163918.611609-1-arnd@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-15igc: Set Qbv start_time and end_time to end_time if not being configured in GCLTan Tee Min1-1/+13
The default setting of end_time minus start_time is whole 1 second. Thus, if it's not being configured in any GCL entry then it will be staying at original 1 second. This patch is changing the start_time and end_time to be end_time as if setting zero will be having weird HW behavior where the gate will not be fully closed. Fixes: ec50a9d437f0 ("igc: Add support for taprio offloading") Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15igc: recalculate Qbv end_time by considering cycle timeTan Tee Min1-0/+15
Qbv users can specify a cycle time that is not equal to the total GCL intervals. Hence, recalculation is necessary here to exclude the time interval that exceeds the cycle time. As those GCL which exceeds the cycle time will be truncated. According to IEEE Std. 802.1Q-2018 section 8.6.9.2, once the end of the list is reached, it will switch to the END_OF_CYCLE state and leave the gates in the same state until the next cycle is started. Fixes: ec50a9d437f0 ("igc: Add support for taprio offloading") Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15igc: allow BaseTime 0 enrollment for QbvTan Tee Min3-1/+4
Introduce qbv_enable flag in igc_adapter struct to store the Qbv on/off. So this allow the BaseTime to enroll with zero value. Fixes: 61572d5f8f91 ("igc: Simplify TSN flags handling") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Signed-off-by: Tan Tee Min <tee.min.tan@linux.intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15igc: Add checking for basetime less than zeroMuhammad Husaini Zulkifli1-0/+3
Using the tc qdisc command, the user can set basetime to any value. Checking should be done on the driver's side to prevent registering basetime values that are less than zero. Fixes: ec50a9d437f0 ("igc: Add support for taprio offloading") Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15igc: Use strict cycles for Qbv schedulingVinicius Costa Gomes1-9/+2
Configuring strict cycle mode in the controller forces more well behaved transmissions when taprio is offloaded. When set this strict_cycle and strict_end, transmission is not enabled if the whole packet cannot be completed before end of the Qbv cycle. Fixes: 82faa9b79950 ("igc: Add support for ETF offloading") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15igc: Enhance Qbv scheduling by using first flag bitVinicius Costa Gomes3-29/+151
The I225 hardware has a limitation that packets can only be scheduled in the [0, cycle-time] interval. So, scheduling a packet to the start of the next cycle doesn't usually work. To overcome this, we use the Transmit Descriptor first flag to indicates that a packet should be the first packet (from a queue) in a cycle according to the section 7.5.2.9.3.4 The First Packet on Each QBV Cycle in Intel Discrete I225/6 User Manual. But this only works if there was any packet from that queue during the current cycle, to avoid this issue, we issue an empty packet if that's not the case. Also require one more descriptor to be available, to take into account the empty packet that might be issued. Test Setup: Talker: Use l2_tai to generate the launchtime into packet load. Listener: Use timedump.c to compute the delta between packet arrival and LaunchTime packet payload. Test Result: Before: 1666000610127300000,1666000610127300096,96,621273 1666000610127400000,1666000610127400192,192,621274 1666000610127500000,1666000610127500032,32,621275 1666000610127600000,1666000610127600128,128,621276 1666000610127700000,1666000610127700224,224,621277 1666000610127800000,1666000610127800064,64,621278 1666000610127900000,1666000610127900160,160,621279 1666000610128000000,1666000610128000000,0,621280 1666000610128100000,1666000610128100096,96,621281 1666000610128200000,1666000610128200192,192,621282 1666000610128300000,1666000610128300032,32,621283 1666000610128400000,1666000610128301056,-98944,621284 1666000610128500000,1666000610128302080,-197920,621285 1666000610128600000,1666000610128302848,-297152,621286 1666000610128700000,1666000610128303872,-396128,621287 1666000610128800000,1666000610128304896,-495104,621288 1666000610128900000,1666000610128305664,-594336,621289 1666000610129000000,1666000610128306688,-693312,621290 1666000610129100000,1666000610128307712,-792288,621291 1666000610129200000,1666000610128308480,-891520,621292 1666000610129300000,1666000610128309504,-990496,621293 1666000610129400000,1666000610128310528,-1089472,621294 1666000610129500000,1666000610128311296,-1188704,621295 1666000610129600000,1666000610128312320,-1287680,621296 1666000610129700000,1666000610128313344,-1386656,621297 1666000610129800000,1666000610128314112,-1485888,621298 1666000610129900000,1666000610128315136,-1584864,621299 1666000610130000000,1666000610128316160,-1683840,621300 1666000610130100000,1666000610128316928,-1783072,621301 1666000610130200000,1666000610128317952,-1882048,621302 1666000610130300000,1666000610128318976,-1981024,621303 1666000610130400000,1666000610128319744,-2080256,621304 1666000610130500000,1666000610128320768,-2179232,621305 1666000610130600000,1666000610128321792,-2278208,621306 1666000610130700000,1666000610128322816,-2377184,621307 1666000610130800000,1666000610128323584,-2476416,621308 1666000610130900000,1666000610128324608,-2575392,621309 1666000610131000000,1666000610128325632,-2674368,621310 1666000610131100000,1666000610128326400,-2773600,621311 1666000610131200000,1666000610128327424,-2872576,621312 1666000610131300000,1666000610128328448,-2971552,621313 1666000610131400000,1666000610128329216,-3070784,621314 1666000610131500000,1666000610131500032,32,621315 1666000610131600000,1666000610131600128,128,621316 1666000610131700000,1666000610131700224,224,621317 After: 1666073510646200000,1666073510646200064,64,2676462 1666073510646300000,1666073510646300160,160,2676463 1666073510646400000,1666073510646400256,256,2676464 1666073510646500000,1666073510646500096,96,2676465 1666073510646600000,1666073510646600192,192,2676466 1666073510646700000,1666073510646700032,32,2676467 1666073510646800000,1666073510646800128,128,2676468 1666073510646900000,1666073510646900224,224,2676469 1666073510647000000,1666073510647000064,64,2676470 1666073510647100000,1666073510647100160,160,2676471 1666073510647200000,1666073510647200256,256,2676472 1666073510647300000,1666073510647300096,96,2676473 1666073510647400000,1666073510647400192,192,2676474 1666073510647500000,1666073510647500032,32,2676475 1666073510647600000,1666073510647600128,128,2676476 1666073510647700000,1666073510647700224,224,2676477 1666073510647800000,1666073510647800064,64,2676478 1666073510647900000,1666073510647900160,160,2676479 1666073510648000000,1666073510648000000,0,2676480 1666073510648100000,1666073510648100096,96,2676481 1666073510648200000,1666073510648200192,192,2676482 1666073510648300000,1666073510648300032,32,2676483 1666073510648400000,1666073510648400128,128,2676484 1666073510648500000,1666073510648500224,224,2676485 1666073510648600000,1666073510648600064,64,2676486 1666073510648700000,1666073510648700160,160,2676487 1666073510648800000,1666073510648800000,0,2676488 1666073510648900000,1666073510648900096,96,2676489 1666073510649000000,1666073510649000192,192,2676490 1666073510649100000,1666073510649100032,32,2676491 1666073510649200000,1666073510649200128,128,2676492 1666073510649300000,1666073510649300224,224,2676493 1666073510649400000,1666073510649400064,64,2676494 1666073510649500000,1666073510649500160,160,2676495 1666073510649600000,1666073510649600000,0,2676496 1666073510649700000,1666073510649700096,96,2676497 1666073510649800000,1666073510649800192,192,2676498 1666073510649900000,1666073510649900032,32,2676499 1666073510650000000,1666073510650000128,128,2676500 Fixes: 82faa9b79950 ("igc: Add support for ETF offloading") Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Co-developed-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Signed-off-by: Aravindhan Gunasekaran <aravindhan.gunasekaran@intel.com> Co-developed-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Signed-off-by: Muhammad Husaini Zulkifli <muhammad.husaini.zulkifli@intel.com> Signed-off-by: Malli C <mallikarjuna.chilakala@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-12-15ravb: Fix "failed to switch device to config mode" message during unbindBiju Das1-1/+1
This patch fixes the error "ravb 11c20000.ethernet eth0: failed to switch device to config mode" during unbind. We are doing register access after pm_runtime_put_sync(). We usually do cleanup in reverse order of init. Currently in remove(), the "pm_runtime_put_sync" is not in reverse order. Probe reset_control_deassert(rstc); pm_runtime_enable(&pdev->dev); pm_runtime_get_sync(&pdev->dev); remove pm_runtime_put_sync(&pdev->dev); unregister_netdev(ndev); .. ravb_mdio_release(priv); pm_runtime_disable(&pdev->dev); Consider the call to unregister_netdev() unregister_netdev->unregister_netdevice_queue->rollback_registered_many that calls the below functions which access the registers after pm_runtime_put_sync() 1) ravb_get_stats 2) ravb_close Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Cc: stable@vger.kernel.org Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221214105118.2495313-1-biju.das.jz@bp.renesas.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-15net: stmmac: fix errno when create_singlethread_workqueue() failsGaosheng Cui1-0/+1
We should set the return value to -ENOMEM explicitly when create_singlethread_workqueue() fails in stmmac_dvr_probe(), otherwise we'll lose the error value. Fixes: a137f3f27f92 ("net: stmmac: fix possible memory leak in stmmac_dvr_probe()") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221214080117.3514615-1-cuigaosheng1@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-15r6040: Fix kmemleak in probe and removeLi Zetao1-1/+4
There is a memory leaks reported by kmemleak: unreferenced object 0xffff888116111000 (size 2048): comm "modprobe", pid 817, jiffies 4294759745 (age 76.502s) hex dump (first 32 bytes): 00 c4 0a 04 81 88 ff ff 08 10 11 16 81 88 ff ff ................ 08 10 11 16 81 88 ff ff 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff815bcd82>] kmalloc_trace+0x22/0x60 [<ffffffff827e20ee>] phy_device_create+0x4e/0x90 [<ffffffff827e6072>] get_phy_device+0xd2/0x220 [<ffffffff827e7844>] mdiobus_scan+0xa4/0x2e0 [<ffffffff827e8be2>] __mdiobus_register+0x482/0x8b0 [<ffffffffa01f5d24>] r6040_init_one+0x714/0xd2c [r6040] ... The problem occurs in probe process as follows: r6040_init_one: mdiobus_register mdiobus_scan <- alloc and register phy_device, the reference count of phy_device is 3 r6040_mii_probe phy_connect <- connect to the first phy_device, so the reference count of the first phy_device is 4, others are 3 register_netdev <- fault inject succeeded, goto error handling path // error handling path err_out_mdio_unregister: mdiobus_unregister(lp->mii_bus); err_out_mdio: mdiobus_free(lp->mii_bus); <- the reference count of the first phy_device is 1, it is not released and other phy_devices are released // similarly, the remove process also has the same problem The root cause is traced to the phy_device is not disconnected when removes one r6040 device in r6040_remove_one() or on error handling path after r6040_mii probed successfully. In r6040_mii_probe(), a net ethernet device is connected to the first PHY device of mii_bus, in order to notify the connected driver when the link status changes, which is the default behavior of the PHY infrastructure to handle everything. Therefore the phy_device should be disconnected when removes one r6040 device or on error handling path. Fix it by adding phy_disconnect() when removes one r6040 device or on error handling path after r6040_mii probed successfully. Fixes: 3831861b4ad8 ("r6040: implement phylib") Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221213125614.927754-1-lizetao1@huawei.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-14net: enetc: avoid buffer leaks on xdp_do_redirect() failureVladimir Oltean1-27/+8
Before enetc_clean_rx_ring_xdp() calls xdp_do_redirect(), each software BD in the RX ring between index orig_i and i can have one of 2 refcount values on its page. We are the owner of the current buffer that is being processed, so the refcount will be at least 1. If the current owner of the buffer at the diametrically opposed index in the RX ring (i.o.w, the other half of this page) has not yet called kfree(), this page's refcount could even be 2. enetc_page_reusable() in enetc_flip_rx_buff() tests for the page refcount against 1, and [ if it's 2 ] does not attempt to reuse it. But if enetc_flip_rx_buff() is put after the xdp_do_redirect() call, the page refcount can have one of 3 values. It can also be 0, if there is no owner of the other page half, and xdp_do_redirect() for this buffer ran so far that it triggered a flush of the devmap/cpumap bulk queue, and the consumers of those bulk queues also freed the buffer, all by the time xdp_do_redirect() returns the execution back to enetc. This is the reason why enetc_flip_rx_buff() is called before xdp_do_redirect(), but there is a big flaw with that reasoning: enetc_flip_rx_buff() will set rx_swbd->page = NULL on both sides of the enetc_page_reusable() branch, and if xdp_do_redirect() returns an error, we call enetc_xdp_free(), which does not deal gracefully with that. In fact, what happens is quite special. The page refcounts start as 1. enetc_flip_rx_buff() figures they're reusable, transfers these rx_swbd->page pointers to a different rx_swbd in enetc_reuse_page(), and bumps the refcount to 2. When xdp_do_redirect() later returns an error, we call the no-op enetc_xdp_free(), but we still haven't lost the reference to that page. A copy of it is still at rx_ring->next_to_alloc, but that has refcount 2 (and there are no concurrent owners of it in flight, to drop the refcount). What really kills the system is when we'll flip the rx_swbd->page the second time around. With an updated refcount of 2, the page will not be reusable and we'll really leak it. Then enetc_new_page() will have to allocate more pages, which will then eventually leak again on further errors from xdp_do_redirect(). The problem, summarized, is that we zeroize rx_swbd->page before we're completely done with it, and this makes it impossible for the error path to do something with it. Since the packet is potentially multi-buffer and therefore the rx_swbd->page is potentially an array, manual passing of the old pointers between enetc_flip_rx_buff() and enetc_xdp_free() is a bit difficult. For the sake of going with a simple solution, we accept the possibility of racing with xdp_do_redirect(), and we move the flip procedure to execute only on the redirect success path. By racing, I mean that the page may be deemed as not reusable by enetc (having a refcount of 0), but there will be no leak in that case, either. Once we accept that, we have something better to do with buffers on XDP_REDIRECT failure. Since we haven't performed half-page flipping yet, we won't, either (and this way, we can avoid enetc_xdp_free() completely, which gives the entire page to the slab allocator). Instead, we'll call enetc_xdp_drop(), which will recycle this half of the buffer back to the RX ring. Fixes: 9d2b68cc108d ("net: enetc: add support for XDP_REDIRECT") Suggested-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20221213001908.2347046-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-14Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds1-2/+1
Pull rdma updates from Jason Gunthorpe: "Usual size of updates, a new driver, and most of the bulk focusing on rxe: - Usual typos, style, and language updates - Driver updates for mlx5, irdma, siw, rts, srp, hfi1, hns, erdma, mlx4, srp - Lots of RXE updates: * Improve reply error handling for bad MR operations * Code tidying * Debug printing uses common loggers * Remove half implemented RD related stuff * Support IBA's recently defined Atomic Write and Flush operations - erdma support for atomic operations - New driver 'mana' for Ethernet HW available in Azure VMs. This driver only supports DPDK" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (122 commits) IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces RDMA: Add missed netdev_put() for the netdevice_tracker RDMA/rxe: Enable RDMA FLUSH capability for rxe device RDMA/cm: Make QP FLUSHABLE for supported device RDMA/rxe: Implement flush completion RDMA/rxe: Implement flush execution in responder side RDMA/rxe: Implement RC RDMA FLUSH service in requester side RDMA/rxe: Extend rxe packet format to support flush RDMA/rxe: Allow registering persistent flag for pmem MR only RDMA/rxe: Extend rxe user ABI to support flush RDMA: Extend RDMA kernel verbs ABI to support flush RDMA: Extend RDMA user ABI to support flush RDMA/rxe: Fix incorrect responder length checking RDMA/rxe: Fix oops with zero length reads RDMA/mlx5: Remove not-used IB_FLOW_SPEC_IB define RDMA/hns: Fix XRC caps on HIP08 RDMA/hns: Fix error code of CMD RDMA/hns: Fix page size cap from firmware RDMA/hns: Fix PBL page MTR find RDMA/hns: Fix AH attr queried by query_qp ...
2022-12-13igb: Initialize mailbox message for VF resetTony Nguyen1-1/+1
When a MAC address is not assigned to the VF, that portion of the message sent to the VF is not set. The memory, however, is allocated from the stack meaning that information may be leaked to the VM. Initialize the message buffer to 0 so that no information is passed to the VM in this case. Fixes: 6ddbc4cf1f4d ("igb: Indicate failure on vf reset for empty mac address") Reported-by: Akihiko Odaki <akihiko.odaki@daynix.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Link: https://lore.kernel.org/r/20221212190031.3983342-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-13Merge tag 'net-next-6.2' of ↵Linus Torvalds545-12984/+47522
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Allow live renaming when an interface is up - Add retpoline wrappers for tc, improving considerably the performances of complex queue discipline configurations - Add inet drop monitor support - A few GRO performance improvements - Add infrastructure for atomic dev stats, addressing long standing data races - De-duplicate common code between OVS and conntrack offloading infrastructure - A bunch of UBSAN_BOUNDS/FORTIFY_SOURCE improvements - Netfilter: introduce packet parser for tunneled packets - Replace IPVS timer-based estimators with kthreads to scale up the workload with the number of available CPUs - Add the helper support for connection-tracking OVS offload BPF: - Support for user defined BPF objects: the use case is to allocate own objects, build own object hierarchies and use the building blocks to build own data structures flexibly, for example, linked lists in BPF - Make cgroup local storage available to non-cgroup attached BPF programs - Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers - A relevant bunch of BPF verifier fixes and improvements - Veristat tool improvements to support custom filtering, sorting, and replay of results - Add LLVM disassembler as default library for dumping JITed code - Lots of new BPF documentation for various BPF maps - Add bpf_rcu_read_{,un}lock() support for sleepable programs - Add RCU grace period chaining to BPF to wait for the completion of access from both sleepable and non-sleepable BPF programs - Add support storing struct task_struct objects as kptrs in maps - Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer values - Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions Protocols: - TCP: implement Protective Load Balancing across switch links - TCP: allow dynamically disabling TCP-MD5 static key, reverting back to fast[er]-path - UDP: Introduce optional per-netns hash lookup table - IPv6: simplify and cleanup sockets disposal - Netlink: support different type policies for each generic netlink operation - MPTCP: add MSG_FASTOPEN and FastOpen listener side support - MPTCP: add netlink notification support for listener sockets events - SCTP: add VRF support, allowing sctp sockets binding to VRF devices - Add bridging MAC Authentication Bypass (MAB) support - Extensions for Ethernet VPN bridging implementation to better support multicast scenarios - More work for Wi-Fi 7 support, comprising conversion of all the existing drivers to internal TX queue usage - IPSec: introduce a new offload type (packet offload) allowing complete header processing and crypto offloading - IPSec: extended ack support for more descriptive XFRM error reporting - RXRPC: increase SACK table size and move processing into a per-local endpoint kernel thread, reducing considerably the required locking - IEEE 802154: synchronous send frame and extended filtering support, initial support for scanning available 15.4 networks - Tun: bump the link speed from 10Mbps to 10Gbps - Tun/VirtioNet: implement UDP segmentation offload support Driver API: - PHY/SFP: improve power level switching between standard level 1 and the higher power levels - New API for netdev <-> devlink_port linkage - PTP: convert existing drivers to new frequency adjustment implementation - DSA: add support for rx offloading - Autoload DSA tagging driver when dynamically changing protocol - Add new PCP and APPTRUST attributes to Data Center Bridging - Add configuration support for 800Gbps link speed - Add devlink port function attribute to enable/disable RoCE and migratable - Extend devlink-rate to support strict prioriry and weighted fair queuing - Add devlink support to directly reading from region memory - New device tree helper to fetch MAC address from nvmem - New big TCP helper to simplify temporary header stripping New hardware / drivers: - Ethernet: - Marvel Octeon CNF95N and CN10KB Ethernet Switches - Marvel Prestera AC5X Ethernet Switch - WangXun 10 Gigabit NIC - Motorcomm yt8521 Gigabit Ethernet - Microchip ksz9563 Gigabit Ethernet Switch - Microsoft Azure Network Adapter - Linux Automation 10Base-T1L adapter - PHY: - Aquantia AQR112 and AQR412 - Motorcomm YT8531S - PTP: - Orolia ART-CARD - WiFi: - MediaTek Wi-Fi 7 (802.11be) devices - RealTek rtw8821cu, rtw8822bu, rtw8822cu and rtw8723du USB devices - Bluetooth: - Broadcom BCM4377/4378/4387 Bluetooth chipsets - Realtek RTL8852BE and RTL8723DS - Cypress.CYW4373A0 WiFi + Bluetooth combo device Drivers: - CAN: - gs_usb: bus error reporting support - kvaser_usb: listen only and bus error reporting support - Ethernet NICs: - Intel (100G): - extend action skbedit to RX queue mapping - implement devlink-rate support - support direct read from memory - nVidia/Mellanox (mlx5): - SW steering improvements, increasing rules update rate - Support for enhanced events compression - extend H/W offload packet manipulation capabilities - implement IPSec packet offload mode - nVidia/Mellanox (mlx4): - better big TCP support - Netronome Ethernet NICs (nfp): - IPsec offload support - add support for multicast filter - Broadcom: - RSS and PTP support improvements - AMD/SolarFlare: - netlink extened ack improvements - add basic flower matches to offload, and related stats - Virtual NICs: - ibmvnic: introduce affinity hint support - small / embedded: - FreeScale fec: add initial XDP support - Marvel mv643xx_eth: support MII/GMII/RGMII modes for Kirkwood - TI am65-cpsw: add suspend/resume support - Mediatek MT7986: add RX wireless wthernet dispatch support - Realtek 8169: enable GRO software interrupt coalescing per default - Ethernet high-speed switches: - Microchip (sparx5): - add support for Sparx5 TC/flower H/W offload via VCAP - Mellanox mlxsw: - add 802.1X and MAC Authentication Bypass offload support - add ip6gre support - Embedded Ethernet switches: - Mediatek (mtk_eth_soc): - improve PCS implementation, add DSA untag support - enable flow offload support - Renesas: - add rswitch R-Car Gen4 gPTP support - Microchip (lan966x): - add full XDP support - add TC H/W offload via VCAP - enable PTP on bridge interfaces - Microchip (ksz8): - add MTU support for KSZ8 series - Qualcomm 802.11ax WiFi (ath11k): - support configuring channel dwell time during scan - MediaTek WiFi (mt76): - enable Wireless Ethernet Dispatch (WED) offload support - add ack signal support - enable coredump support - remain_on_channel support - Intel WiFi (iwlwifi): - enable Wi-Fi 7 Extremely High Throughput (EHT) PHY capabilities - 320 MHz channels support - RealTek WiFi (rtw89): - new dynamic header firmware format support - wake-over-WLAN support" * tag 'net-next-6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2002 commits) ipvs: fix type warning in do_div() on 32 bit net: lan966x: Remove a useless test in lan966x_ptp_add_trap() net: ipa: add IPA v4.7 support dt-bindings: net: qcom,ipa: Add SM6350 compatible bnxt: Use generic HBH removal helper in tx path IPv6/GRO: generic helper to remove temporary HBH/jumbo header in driver selftests: forwarding: Add bridge MDB test selftests: forwarding: Rename bridge_mdb test bridge: mcast: Support replacement of MDB port group entries bridge: mcast: Allow user space to specify MDB entry routing protocol bridge: mcast: Allow user space to add (*, G) with a source list and filter mode bridge: mcast: Add support for (*, G) with a source list and filter mode bridge: mcast: Avoid arming group timer when (S, G) corresponds to a source bridge: mcast: Add a flag for user installed source entries bridge: mcast: Expose __br_multicast_del_group_src() bridge: mcast: Expose br_multicast_new_group_src() bridge: mcast: Add a centralized error path bridge: mcast: Place netlink policy before validation functions bridge: mcast: Split (*, G) and (S, G) addition into different functions bridge: mcast: Do not derive entry type from its filter mode ...
2022-12-13Merge tag 'dma-mapping-6.2-2022-12-13' of ↵Linus Torvalds1-4/+2
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - reduce the swiotlb buffer size on allocation failure (Alexey Kardashevskiy) - clean up passing of bogus GFP flags to the dma-coherent allocator (Christoph Hellwig) * tag 'dma-mapping-6.2-2022-12-13' of git://git.infradead.org/users/hch/dma-mapping: dma-mapping: reject __GFP_COMP in dma_alloc_attrs ALSA: memalloc: don't pass bogus GFP_ flags to dma_alloc_* s390/ism: don't pass bogus GFP_ flags to dma_alloc_coherent cnic: don't pass bogus GFP_ flags to dma_alloc_coherent RDMA/qib: don't pass bogus GFP_ flags to dma_alloc_coherent RDMA/hfi1: don't pass bogus GFP_ flags to dma_alloc_coherent media: videobuf-dma-contig: use dma_mmap_coherent swiotlb: reduce the swiotlb buffer size on allocation failure
2022-12-13Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netPaolo Abeni15-41/+58
Merge in the left-over fixes before the net-next pull-request. net/mptcp/subflow.c d3295fee3c75 ("mptcp: use proper req destructor for IPv6") 36b122baf6a8 ("mptcp: add subflow_v(4,6)_send_synack()") Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-12-12net: lan966x: Remove a useless test in lan966x_ptp_add_trap()Christophe JAILLET1-2/+0
vcap_alloc_rule() can't return NULL. So remove some dead-code Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/27992ffcee47fc865ce87274d6dfcffe7a1e69e0.1670873784.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-12Merge tag 'random-6.2-rc1-for-linus' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: - Replace prandom_u32_max() and various open-coded variants of it, there is now a new family of functions that uses fast rejection sampling to choose properly uniformly random numbers within an interval: get_random_u32_below(ceil) - [0, ceil) get_random_u32_above(floor) - (floor, U32_MAX] get_random_u32_inclusive(floor, ceil) - [floor, ceil] Coccinelle was used to convert all current users of prandom_u32_max(), as well as many open-coded patterns, resulting in improvements throughout the tree. I'll have a "late" 6.1-rc1 pull for you that removes the now unused prandom_u32_max() function, just in case any other trees add a new use case of it that needs to converted. According to linux-next, there may be two trivial cases of prandom_u32_max() reintroductions that are fixable with a 's/.../.../'. So I'll have for you a final conversion patch doing that alongside the removal patch during the second week. This is a treewide change that touches many files throughout. - More consistent use of get_random_canary(). - Updates to comments, documentation, tests, headers, and simplification in configuration. - The arch_get_random*_early() abstraction was only used by arm64 and wasn't entirely useful, so this has been replaced by code that works in all relevant contexts. - The kernel will use and manage random seeds in non-volatile EFI variables, refreshing a variable with a fresh seed when the RNG is initialized. The RNG GUID namespace is then hidden from efivarfs to prevent accidental leakage. These changes are split into random.c infrastructure code used in the EFI subsystem, in this pull request, and related support inside of EFISTUB, in Ard's EFI tree. These are co-dependent for full functionality, but the order of merging doesn't matter. - Part of the infrastructure added for the EFI support is also used for an improvement to the way vsprintf initializes its siphash key, replacing an sleep loop wart. - The hardware RNG framework now always calls its correct random.c input function, add_hwgenerator_randomness(), rather than sometimes going through helpers better suited for other cases. - The add_latent_entropy() function has long been called from the fork handler, but is a no-op when the latent entropy gcc plugin isn't used, which is fine for the purposes of latent entropy. But it was missing out on the cycle counter that was also being mixed in beside the latent entropy variable. So now, if the latent entropy gcc plugin isn't enabled, add_latent_entropy() will expand to a call to add_device_randomness(NULL, 0), which adds a cycle counter, without the absent latent entropy variable. - The RNG is now reseeded from a delayed worker, rather than on demand when used. Always running from a worker allows it to make use of the CPU RNG on platforms like S390x, whose instructions are too slow to do so from interrupts. It also has the effect of adding in new inputs more frequently with more regularity, amounting to a long term transcript of random values. Plus, it helps a bit with the upcoming vDSO implementation (which isn't yet ready for 6.2). - The jitter entropy algorithm now tries to execute on many different CPUs, round-robining, in hopes of hitting even more memory latencies and other unpredictable effects. It also will mix in a cycle counter when the entropy timer fires, in addition to being mixed in from the main loop, to account more explicitly for fluctuations in that timer firing. And the state it touches is now kept within the same cache line, so that it's assured that the different execution contexts will cause latencies. * tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits) random: include <linux/once.h> in the right header random: align entropy_timer_state to cache line random: mix in cycle counter when jitter timer fires random: spread out jitter callback to different CPUs random: remove extraneous period and add a missing one in comments efi: random: refresh non-volatile random seed when RNG is initialized vsprintf: initialize siphash key using notifier random: add back async readiness notifier random: reseed in delayed work rather than on-demand random: always mix cycle counter in add_latent_entropy() hw_random: use add_hwgenerator_randomness() for early entropy random: modernize documentation comment on get_random_bytes() random: adjust comment to account for removed function random: remove early archrandom abstraction random: use random.trust_{bootloader,cpu} command line option only stackprotector: actually use get_random_canary() stackprotector: move get_random_canary() into stackprotector.h treewide: use get_random_u32_inclusive() when possible treewide: use get_random_u32_{above,below}() instead of manual loop treewide: use get_random_u32_below() instead of deprecated function ...
2022-12-12bnxt: Use generic HBH removal helper in tx pathCoco Li1-1/+25
Eric Dumazet implemented Big TCP that allowed bigger TSO/GRO packet sizes for IPv6 traffic. See patch series: 'commit 89527be8d8d6 ("net: add IFLA_TSO_{MAX_SIZE|SEGS} attributes")' This reduces the number of packets traversing the networking stack and should usually improves performance. However, it also inserts a temporary Hop-by-hop IPv6 extension header. Using the HBH header removal method in the previous patch, the extra header be removed in bnxt drivers to allow it to send big TCP packets (bigger TSO packets) as well. Tested: Compiled locally To further test functional correctness, update the GSO/GRO limit on the physical NIC: ip link set eth0 gso_max_size 181000 ip link set eth0 gro_max_size 181000 Note that if there are bonding or ipvan devices on top of the physical NIC, their GSO sizes need to be updated as well. Then, IPv6/TCP packets with sizes larger than 64k can be observed. Signed-off-by: Coco Li <lixiaoyan@google.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Tested-by: Michael Chan <michael.chan@broadcom.com> Link: https://lore.kernel.org/r/20221210041646.3587757-2-lixiaoyan@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-12qlcnic: Clean up some inconsistent indentingJiapeng Chong1-1/+1
No functional modification involved. drivers/net/ethernet/qlogic/qlcnic/qlcnic_ethtool.c:714 qlcnic_validate_ring_count() warn: inconsistent indenting. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3419 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20221212055813.91154-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-12i40e: allow toggling loopback mode via ndo_set_features callbackTirthendu Sarkar4-4/+61
Add support for NETIF_F_LOOPBACK. This feature can be set via: $ ethtool -K eth0 loopback <on|off> This sets the MAC Tx->Rx loopback. This feature is used for the xsk selftests, and might have other uses too. Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Tested-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Link: https://lore.kernel.org/r/20221209185553.2520088-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-12i40e: Fix the inability to attach XDP program on downed interfaceBartosz Staszewski1-12/+24
Whenever trying to load XDP prog on downed interface, function i40e_xdp was passing vsi->rx_buf_len field to i40e_xdp_setup() which was equal 0. i40e_open() calls i40e_vsi_configure_rx() which configures that field, but that only happens when interface is up. When it is down, i40e_open() is not being called, thus vsi->rx_buf_len is not set. Solution for this is calculate buffer length in newly created function - i40e_calculate_vsi_rx_buf_len() that return actual buffer length. Buffer length is being calculated based on the same rules applied previously in i40e_vsi_configure_rx() function. Fixes: 613142b0bb88 ("i40e: Log error for oversized MTU on device") Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions") Signed-off-by: Bartosz Staszewski <bartoszx.staszewski@intel.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Shwetha Nagaraju <Shwetha.nagaraju@intel.com> Reviewed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Saeed Mahameed <saeed@kernel.com> Link: https://lore.kernel.org/r/20221209185411.2519898-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-12stmmac: fix potential division by 0Piergiorgio Beruto2-2/+3
When the MAC is connected to a 10 Mb/s PHY and the PTP clock is derived from the MAC reference clock (default), the clk_ptp_rate becomes too small and the calculated sub second increment becomes 0 when computed by the stmmac_config_sub_second_increment() function within stmmac_init_tstamp_counter(). Therefore, the subsequent div_u64 in stmmac_init_tstamp_counter() operation triggers a divide by 0 exception as shown below. [ 95.062067] socfpga-dwmac ff700000.ethernet eth0: Register MEM_TYPE_PAGE_POOL RxQ-0 [ 95.076440] socfpga-dwmac ff700000.ethernet eth0: PHY [stmmac-0:08] driver [NCN26000] (irq=49) [ 95.095964] dwmac1000: Master AXI performs any burst length [ 95.101588] socfpga-dwmac ff700000.ethernet eth0: No Safety Features support found [ 95.109428] Division by zero in kernel. [ 95.113447] CPU: 0 PID: 239 Comm: ifconfig Not tainted 6.1.0-rc7-centurion3-1.0.3.0-01574-gb624218205b7-dirty #77 [ 95.123686] Hardware name: Altera SOCFPGA [ 95.127695] unwind_backtrace from show_stack+0x10/0x14 [ 95.132938] show_stack from dump_stack_lvl+0x40/0x4c [ 95.137992] dump_stack_lvl from Ldiv0+0x8/0x10 [ 95.142527] Ldiv0 from __aeabi_uidivmod+0x8/0x18 [ 95.147232] __aeabi_uidivmod from div_u64_rem+0x1c/0x40 [ 95.152552] div_u64_rem from stmmac_init_tstamp_counter+0xd0/0x164 [ 95.158826] stmmac_init_tstamp_counter from stmmac_hw_setup+0x430/0xf00 [ 95.165533] stmmac_hw_setup from __stmmac_open+0x214/0x2d4 [ 95.171117] __stmmac_open from stmmac_open+0x30/0x44 [ 95.176182] stmmac_open from __dev_open+0x11c/0x134 [ 95.181172] __dev_open from __dev_change_flags+0x168/0x17c [ 95.186750] __dev_change_flags from dev_change_flags+0x14/0x50 [ 95.192662] dev_change_flags from devinet_ioctl+0x2b4/0x604 [ 95.198321] devinet_ioctl from inet_ioctl+0x1ec/0x214 [ 95.203462] inet_ioctl from sock_ioctl+0x14c/0x3c4 [ 95.208354] sock_ioctl from vfs_ioctl+0x20/0x38 [ 95.212984] vfs_ioctl from sys_ioctl+0x250/0x844 [ 95.217691] sys_ioctl from ret_fast_syscall+0x0/0x4c [ 95.222743] Exception stack(0xd0ee1fa8 to 0xd0ee1ff0) [ 95.227790] 1fa0: 00574c4f be9aeca4 00000003 00008914 be9aeca4 be9aec50 [ 95.235945] 1fc0: 00574c4f be9aeca4 0059f078 00000036 be9aee8c be9aef7a 00000015 00000000 [ 95.244096] 1fe0: 005a01f0 be9aec38 004d7484 b6e67d74 Signed-off-by: Piergiorgio Beruto <piergiorgio.beruto@gmail.com> Fixes: 91a2559c1dc5 ("net: stmmac: Fix sub-second increment") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/de4c64ccac9084952c56a06a8171d738604c4770.1670678513.git.piergiorgio.beruto@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-12octeontx2-af: cn10k: mcs: Fix a resource leak in the probe and remove functionsChristophe JAILLET1-1/+5
In mcs_register_interrupts(), a call to request_irq() is not balanced by a corresponding free_irq(), neither in the error handling path, nor in the remove function. Add the missing calls. Fixes: 6c635f78c474 ("octeontx2-af: cn10k: mcs: Handle MCS block interrupts") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Link: https://lore.kernel.org/r/69f153db5152a141069f990206e7389f961d41ec.1670693669.git.christophe.jaillet@wanadoo.fr Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-12net: hns3: use strscpy() to instead of strncpy()Xu Panda1-4/+2
The implementation of strscpy() is more robust and safer. That's now the recommended way to copy NUL terminated strings. Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Signed-off-by: Yang Yang <yang.yang29@zte.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Link: https://lore.kernel.org/r/202212091538591375035@zte.com.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-12net: ethernet: ti: am65-cpsw: Fix PM runtime leakage in ↵Roger Quadros1-4/+9
am65_cpsw_nuss_ndo_slave_open() Ensure pm_runtime_put() is issued in error path. Reported-by: Jakub Kicinski <kuba@kernel.org> Fixes: 93a76530316a ("net: ethernet: ti: introduce am65x/j721e gigabit eth subsystem driver") Signed-off-by: Roger Quadros <rogerq@kernel.org> Reviewed-by: Saeed Mahameed <saeed@kernel.org> Link: https://lore.kernel.org/r/20221208105534.63709-1-rogerq@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-12net: amd-xgbe: Check only the minimum speed for active/passive cablesTom Lendacky1-12/+2
There are cables that exist that can support speeds in excess of 10GbE. The driver, however, restricts the EEPROM advertised nominal bitrate to a specific range, which can prevent usage of cables that can support, for example, up to 25GbE. Rather than checking that an active or passive cable supports a specific range, only check for a minimum supported speed. Fixes: abf0a1c2b26a ("amd-xgbe: Add support for SFP+ modules") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12net: amd-xgbe: Fix logic around active and passive cablesTom Lendacky1-3/+6
SFP+ active and passive cables are copper cables with fixed SFP+ end connectors. Due to a misinterpretation of this, SFP+ active cables could end up not being recognized, causing the driver to fail to establish a connection. Introduce a new enum in SFP+ cable types, XGBE_SFP_CABLE_FIBER, that is the default cable type, and handle active and passive cables when they are specifically detected. Fixes: abf0a1c2b26a ("amd-xgbe: Add support for SFP+ modules") Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12myri10ge: use strscpy() to instead of strncpy()Xu Panda1-2/+1
The implementation of strscpy() is more robust and safer. That's now the recommended way to copy NUL terminated strings. Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Signed-off-by: Yang Yang <yang.yang29@zte.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12liquidio: use strscpy() to instead of strncpy()Xu Panda1-4/+3
The implementation of strscpy() is more robust and safer. That's now the recommended way to copy NUL terminated strings. Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Signed-off-by: Yang Yang <yang.yang29@zte.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12hns: use strscpy() to instead of strncpy()Xu Panda1-7/+4
The implementation of strscpy() is more robust and safer. That's now the recommended way to copy NUL terminated strings. Signed-off-by: Xu Panda <xu.panda@zte.com.cn> Signed-off-by: Yang Yang <yang.yang29@zte.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12net: amd: lance: don't call dev_kfree_skb() under spin_lock_irqsave()Yang Yingliang2-2/+2
It is not allowed to call kfree_skb() or consume_skb() from hardware interrupt context or with hardware interrupts being disabled. It should use dev_kfree_skb_irq() or dev_consume_skb_irq() instead. The difference between them is free reason, dev_kfree_skb_irq() means the SKB is dropped in error and dev_consume_skb_irq() means the SKB is consumed in normal. In these two cases, dev_kfree_skb() is called consume the xmited SKB, so replace it with dev_consume_skb_irq(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12net: ethernet: dnet: don't call dev_kfree_skb() under spin_lock_irqsave()Yang Yingliang1-2/+2
It is not allowed to call kfree_skb() or consume_skb() from hardware interrupt context or with hardware interrupts being disabled. In this case, the lock is used to protected 'bp', so we can move dev_kfree_skb() after the spin_unlock_irqrestore(). Fixes: 4796417417a6 ("dnet: Dave DNET ethernet controller driver (updated)") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12net: emaclite: don't call dev_kfree_skb() under spin_lock_irqsave()Yang Yingliang1-1/+1
It is not allowed to call kfree_skb() or consume_skb() from hardware interrupt context or with hardware interrupts being disabled. It should use dev_kfree_skb_irq() or dev_consume_skb_irq() instead. The difference between them is free reason, dev_kfree_skb_irq() means the SKB is dropped in error and dev_consume_skb_irq() means the SKB is consumed in normal. In this case, dev_kfree_skb() is called in xemaclite_tx_timeout() to drop the SKB, when tx timeout, so replace it with dev_kfree_skb_irq(). Fixes: bb81b2ddfa19 ("net: add Xilinx emac lite device driver") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12net: apple: bmac: don't call dev_kfree_skb() under spin_lock_irqsave()Yang Yingliang1-1/+1
It is not allowed to call kfree_skb() or consume_skb() from hardware interrupt context or with hardware interrupts being disabled. It should use dev_kfree_skb_irq() or dev_consume_skb_irq() instead. The difference between them is free reason, dev_kfree_skb_irq() means the SKB is dropped in error and dev_consume_skb_irq() means the SKB is consumed in normal. In this case, dev_kfree_skb() is called in bmac_tx_timeout() to drop the SKB, when tx timeout, so replace it with dev_kfree_skb_irq(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12net: apple: mace: don't call dev_kfree_skb() under spin_lock_irqsave()Yang Yingliang1-1/+1
It is not allowed to call kfree_skb() or consume_skb() from hardware interrupt context or with hardware interrupts being disabled. It should use dev_kfree_skb_irq() or dev_consume_skb_irq() instead. The difference between them is free reason, dev_kfree_skb_irq() means the SKB is dropped in error and dev_consume_skb_irq() means the SKB is consumed in normal. In this case, dev_kfree_skb() is called in mace_tx_timeout() to drop the SKB, when tx timeout, so replace it with dev_kfree_skb_irq(). Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12ethernet: s2io: don't call dev_kfree_skb() under spin_lock_irqsave()Yang Yingliang1-1/+1
It is not allowed to call kfree_skb() or consume_skb() from hardware interrupt context or with hardware interrupts being disabled. It should use dev_kfree_skb_irq() or dev_consume_skb_irq() instead. The difference between them is free reason, dev_kfree_skb_irq() means the SKB is dropped in error and dev_consume_skb_irq() means the SKB is consumed in normal. In this case, dev_kfree_skb() is called in free_tx_buffers() to drop the SKBs in tx buffers, when the card is down, so replace it with dev_kfree_skb_irq() here. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-12net: stmmac: Add check for taprio basetime configurationMichael Sit Wei Hong1-0/+3
Adds a boundary check to prevent negative basetime input from user while configuring taprio. Signed-off-by: Michael Sit Wei Hong <michael.wei.hong.sit@intel.com> Signed-off-by: Lai Peter Jun Ann <jun.ann.lai@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2022-12-09Merge tag 'ipsec-next-2022-12-09' of ↵Jakub Kicinski19-445/+1537
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next 2022-12-09 1) Add xfrm packet offload core API. From Leon Romanovsky. 2) Add xfrm packet offload support for mlx5. From Leon Romanovsky and Raed Salem. 3) Fix a typto in a error message. From Colin Ian King. * tag 'ipsec-next-2022-12-09' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next: (38 commits) xfrm: Fix spelling mistake "oflload" -> "offload" net/mlx5e: Open mlx5 driver to accept IPsec packet offload net/mlx5e: Handle ESN update events net/mlx5e: Handle hardware IPsec limits events net/mlx5e: Update IPsec soft and hard limits net/mlx5e: Store all XFRM SAs in Xarray net/mlx5e: Provide intermediate pointer to access IPsec struct net/mlx5e: Skip IPsec encryption for TX path without matching policy net/mlx5e: Add statistics for Rx/Tx IPsec offloaded flows net/mlx5e: Improve IPsec flow steering autogroup net/mlx5e: Configure IPsec packet offload flow steering net/mlx5e: Use same coding pattern for Rx and Tx flows net/mlx5e: Add XFRM policy offload logic net/mlx5e: Create IPsec policy offload tables net/mlx5e: Generalize creation of default IPsec miss group and rule net/mlx5e: Group IPsec miss handles into separate struct net/mlx5e: Make clear what IPsec rx_err does net/mlx5e: Flatten the IPsec RX add rule path net/mlx5e: Refactor FTE setup code to be more clear net/mlx5e: Move IPsec flow table creation to separate function ... ==================== Link: https://lore.kernel.org/r/20221209093310.4018731-1-steffen.klassert@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-09skbuff: Introduce slab_build_skb()Kees Cook2-2/+2
syzkaller reported: BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294 Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295 For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to build_skb(). When build_skb() is passed a frag_size of 0, it means the buffer came from kmalloc. In these cases, ksize() is used to find its actual size, but since the allocation may not have been made to that size, actually perform the krealloc() call so that all the associated buffer size checking will be correctly notified (and use the "new" pointer so that compiler hinting works correctly). Split this logic out into a new interface, slab_build_skb(), but leave the original 0 checking for now to catch any stragglers. Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function") Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: pepsipu <soopthegoop@gmail.com> Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com Cc: Vlastimil Babka <vbabka@suse.cz> Cc: kasan-dev <kasan-dev@googlegroups.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: ast@kernel.org Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Hao Luo <haoluo@google.com> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: jolsa@kernel.org Cc: KP Singh <kpsingh@kernel.org> Cc: martin.lau@linux.dev Cc: Stanislav Fomichev <sdf@google.com> Cc: song@kernel.org Cc: Yonghong Song <yhs@fb.com> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221208060256.give.994-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-09net: bcmgenet: Remove the unused functionJiapeng Chong1-18/+0
The function dmadesc_get_addr() is defined in the bcmgenet.c file, but not called elsewhere, so remove this unused function. drivers/net/ethernet/broadcom/genet/bcmgenet.c:120:26: warning: unused function 'dmadesc_get_addr'. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3401 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://lore.kernel.org/r/20221209033723.32452-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-09Merge tag 'mlx5-updates-2022-12-08' of ↵Jakub Kicinski31-193/+1295
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2022-12-08 1) Support range match action in SW steering Yevgeny Kliteynik says: ======================= The following patch series adds support for a range match action in SW Steering. SW steering is able to match only on the exact values of the packet fields, as requested by the user: the user provides mask for the fields that are of interest, and the exact values to be matched on when the traffic is handled. The following patch series add new type of action - Range Match, where the user provides a field to be matched on and a range of values (min to max) that will be considered as hit. There are several new notions that were implemented in order to support Range Match: - MATCH_RANGES Steering Table Entry (STE): the new STE type that allows matching the packets' fields on the range of values instead of a specific value. - Match Definer: this is a general FW object that defines which fields in the packet will be referenced by the mask and tag of each STE. Match definer ID is part of STE fields, and it defines how the HW needs to interpret the STE's mask/tag values. Till now SW steering used the definers that were managed by FW and implemented the STE layout as described by the HW spec. Now that we're adding a new type of STE, SW steering needs to also be able to define this new STE's layout, and this is do ======================= 2) From OZ add support for meter mtu offload 2.1: Refactor the code to allow both metering and range post actions as a pre-step for adding police mtu offload support. 2.2: Instantiate mtu green/red flow tables with a single match-all rule. Add the green/red actions to the hit/miss table accordingly 2.3: Initialize the meter object with the TC police mtu parameter. Use the hardware range match action feature. 3) From MaorD, support routes with more than 2 nexthops in multipath 4) Michael and Or, improve and extend vport representor counters. * tag 'mlx5-updates-2022-12-08' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5: Expose steering dropped packets counter net/mlx5: Refactor and expand rep vport stat group net/mlx5e: multipath, support routes with more than 2 nexthops net/mlx5e: TC, add support for meter mtu offload net/mlx5e: meter, add mtu post meter tables net/mlx5e: meter, refactor to allow multiple post meter tables net/mlx5: DR, Add support for range match action net/mlx5: DR, Add function that tells if STE miss addr has been initialized net/mlx5: DR, Some refactoring of miss address handling net/mlx5: DR, Manage definers with refcounts net/mlx5: DR, Handle FT action in a separate function net/mlx5: DR, Rework is_fw_table function net/mlx5: DR, Add functions to create/destroy MATCH_DEFINER general object net/mlx5: fs, add match on ranges API net/mlx5: mlx5_ifc updates for MATCH_DEFINER general object ==================== Link: https://lore.kernel.org/r/20221209001420.142794-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-12-09Merge branch '100GbE' of ↵Jakub Kicinski5-475/+463
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2022-12-08 (ice) Jacob Keller says: This series of patches primarily consists of changes to fix some corner cases that can cause Tx timestamp failures. The issues were discovered and reported by Siddaraju DH and primarily affect E822 hardware, though this series also includes some improvements that affect E810 hardware as well. The primary issue is regarding the way that E822 determines when to generate timestamp interrupts. If the driver reads timestamp indexes which do not have a valid timestamp, the E822 interrupt tracking logic can get stuck. This is due to the way that E822 hardware tracks timestamp index reads internally. I was previously unaware of this behavior as it is significantly different in E810 hardware. Most of the fixes target refactors to ensure that the ice driver does not read timestamp indexes which are not valid on E822 hardware. This is done by using the Tx timestamp ready bitmap register from the PHY. This register indicates what timestamp indexes have outstanding timestamps waiting to be captured. Care must be taken in all cases where we read the timestamp registers, and thus all flows which might have read these registers are refactored. The ice_ptp_tx_tstamp function is modified to consolidate as much of the logic relating to these registers as possible. It now handles discarding stale timestamps which are old or which occurred after a PHC time update. This replaces previously standalone thread functions like the periodic work function and the ice_ptp_flush_tx_tracker function. In addition, some minor cleanups noticed while writing these refactors are included. The remaining patches refactor the E822 implementation to remove the "bypass" mode for timestamps. The E822 hardware has the ability to provide a more precise timestamp by making use of measurements of the precise way that packets flow through the hardware pipeline. These measurements are known as "Vernier" calibration. The "bypass" mode disables many of these measurements in favor of a faster start up time for Tx and Rx timestamping. Instead, once these measurements were captured, the driver tries to reconfigure the PHY to enable the vernier calibrations. Unfortunately this recalibration does not work. Testing indicates that the PHY simply remains in bypass mode without the increased timestamp precision. Remove the attempt at recalibration and always use vernier mode. This has one disadvantage that Tx and Rx timestamps cannot begin until after at least one packet of that type goes through the hardware pipeline. Because of this, further refactor the driver to separate Tx and Rx vernier calibration. Complete the Tx and Rx independently, enabling the appropriate type of timestamp as soon as the relevant packet has traversed the hardware pipeline. This was reported by Milena Olech. Note that although these might be considered "bug fixes", the required changes in order to appropriately resolve these issues is large. Thus it does not feel suitable to send this series to net. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue: ice: reschedule ice_ptp_wait_for_offset_valid during reset ice: make Tx and Rx vernier offset calibration independent ice: only check set bits in ice_ptp_flush_tx_tracker ice: handle flushing stale Tx timestamps in ice_ptp_tx_tstamp ice: cleanup allocations in ice_ptp_alloc_tx_tracker ice: protect init and calibrating check in ice_ptp_request_ts ice: synchronize the misc IRQ when tearing down Tx tracker ice: check Tx timestamp memory register for ready timestamps ice: handle discarding old Tx requests in ice_ptp_tx_tstamp ice: always call ice_ptp_link_change and make it void ice: fix misuse of "link err" with "link status" ice: Reset TS memory for all quads ice: Remove the E822 vernier "bypass" logic ice: Use more generic names for ice_ptp_tx fields ==================== Link: https://lore.kernel.org/r/20221208213932.1274143-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>