aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-04-18mm: kmsan: handle alloc failures in kmsan_vmap_pages_range_noflush()Alexander Potapenko3-19/+34
As reported by Dipanjan Das, when KMSAN is used together with kernel fault injection (or, generally, even without the latter), calls to kcalloc() or __vmap_pages_range_noflush() may fail, leaving the metadata mappings for the virtual mapping in an inconsistent state. When these metadata mappings are accessed later, the kernel crashes. To address the problem, we return a non-zero error code from kmsan_vmap_pages_range_noflush() in the case of any allocation/mapping failure inside it, and make vmap_pages_range_noflush() return an error if KMSAN fails to allocate the metadata. This patch also removes KMSAN_WARN_ON() from vmap_pages_range_noflush(), as these allocation failures are not fatal anymore. Link: https://lkml.kernel.org/r/[email protected] Fixes: b073d7f8aee4 ("mm: kmsan: maintain KMSAN metadata for page operations") Signed-off-by: Alexander Potapenko <[email protected]> Reported-by: Dipanjan Das <[email protected]> Link: https://lore.kernel.org/linux-mm/CANX2M5ZRrRA64k0hOif02TjmY9kbbO2aCBPyq79es34RXZ=cAw@mail.gmail.com/ Reviewed-by: Marco Elver <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Uladzislau Rezki (Sony) <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18tools/Makefile: do missed s/vm/mm/SeongJae Park1-7/+7
Commit 799fb82aa132 ("tools/vm: rename tools/vm to tools/mm") missed renaming 'vm' in 'tools/Makefile' to 'mm'. As a result, 'make clean' under 'tools/' directory fails as below: $ make -C tools clean DESCEND vm make[1]: Entering directory '/linux/tools/vm' make[1]: *** No rule to make target 'clean'. Stop. make[1]: Leaving directory '/linux/tools/vm' make: *** [Makefile:173: vm_clean] Error 2 make: Leaving directory '/linux/tools' Do the missed rename. Link: https://lkml.kernel.org/r/[email protected] Fixes: 799fb82aa132 ("tools/vm: rename tools/vm to tools/mm") Signed-off-by: SeongJae Park <[email protected]> Reported-by: Ricardo Pardini <[email protected]> Link: https://lore.kernel.org/linux-mm/[email protected]/ Tested-by: Ricardo Pardini <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm: fix memory leak on mm_init error handlingMathieu Desnoyers1-0/+1
commit f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter") introduces a memory leak by missing a call to destroy_context() when a percpu_counter fails to allocate. Before introducing the per-cpu counter allocations, init_new_context() was the last call that could fail in mm_init(), and thus there was no need to ever invoke destroy_context() in the error paths. Adding the following percpu counter allocations adds error paths after init_new_context(), which means its associated destroy_context() needs to be called when percpu counters fail to allocate. Link: https://lkml.kernel.org/r/[email protected] Fixes: f1a7941243c1 ("mm: convert mm's rss stats into percpu_counter") Signed-off-by: Mathieu Desnoyers <[email protected]> Acked-by: Shakeel Butt <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlockTetsuo Handa1-0/+16
syzbot is reporting circular locking dependency which involves zonelist_update_seq seqlock [1], for this lock is checked by memory allocation requests which do not need to be retried. One deadlock scenario is kmalloc(GFP_ATOMIC) from an interrupt handler. CPU0 ---- __build_all_zonelists() { write_seqlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount odd // e.g. timer interrupt handler runs at this moment some_timer_func() { kmalloc(GFP_ATOMIC) { __alloc_pages_slowpath() { read_seqbegin(&zonelist_update_seq) { // spins forever because zonelist_update_seq.seqcount is odd } } } } // e.g. timer interrupt handler finishes write_sequnlock(&zonelist_update_seq); // makes zonelist_update_seq.seqcount even } This deadlock scenario can be easily eliminated by not calling read_seqbegin(&zonelist_update_seq) from !__GFP_DIRECT_RECLAIM allocation requests, for retry is applicable to only __GFP_DIRECT_RECLAIM allocation requests. But Michal Hocko does not know whether we should go with this approach. Another deadlock scenario which syzbot is reporting is a race between kmalloc(GFP_ATOMIC) from tty_insert_flip_string_and_push_buffer() with port->lock held and printk() from __build_all_zonelists() with zonelist_update_seq held. CPU0 CPU1 ---- ---- pty_write() { tty_insert_flip_string_and_push_buffer() { __build_all_zonelists() { write_seqlock(&zonelist_update_seq); build_zonelists() { printk() { vprintk() { vprintk_default() { vprintk_emit() { console_unlock() { console_flush_all() { console_emit_next_record() { con->write() = serial8250_console_write() { spin_lock_irqsave(&port->lock, flags); tty_insert_flip_string() { tty_insert_flip_string_fixed_flag() { __tty_buffer_request_room() { tty_buffer_alloc() { kmalloc(GFP_ATOMIC | __GFP_NOWARN) { __alloc_pages_slowpath() { zonelist_iter_begin() { read_seqbegin(&zonelist_update_seq); // spins forever because zonelist_update_seq.seqcount is odd spin_lock_irqsave(&port->lock, flags); // spins forever because port->lock is held } } } } } } } } spin_unlock_irqrestore(&port->lock, flags); // message is printed to console spin_unlock_irqrestore(&port->lock, flags); } } } } } } } } } write_sequnlock(&zonelist_update_seq); } } } This deadlock scenario can be eliminated by preventing interrupt context from calling kmalloc(GFP_ATOMIC) and preventing printk() from calling console_flush_all() while zonelist_update_seq.seqcount is odd. Since Petr Mladek thinks that __build_all_zonelists() can become a candidate for deferring printk() [2], let's address this problem by disabling local interrupts in order to avoid kmalloc(GFP_ATOMIC) and disabling synchronous printk() in order to avoid console_flush_all() . As a side effect of minimizing duration of zonelist_update_seq.seqcount being odd by disabling synchronous printk(), latency at read_seqbegin(&zonelist_update_seq) for both !__GFP_DIRECT_RECLAIM and __GFP_DIRECT_RECLAIM allocation requests will be reduced. Although, from lockdep perspective, not calling read_seqbegin(&zonelist_update_seq) (i.e. do not record unnecessary locking dependency) from interrupt context is still preferable, even if we don't allow calling kmalloc(GFP_ATOMIC) inside write_seqlock(&zonelist_update_seq)/write_sequnlock(&zonelist_update_seq) section... Link: https://lkml.kernel.org/r/[email protected] Fixes: 3d36424b3b58 ("mm/page_alloc: fix race condition between build_all_zonelists and page allocation") Link: https://lkml.kernel.org/r/ZCrs+1cDqPWTDFNM@alley [2] Reported-by: syzbot <[email protected]> Link: https://syzkaller.appspot.com/bug?extid=223c7461c58c58a4cb10 [1] Signed-off-by: Tetsuo Handa <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Petr Mladek <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Ilpo Järvinen <[email protected]> Cc: John Ogness <[email protected]> Cc: Patrick Daly <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()Ondrej Mosnacek1-29/+40
Linux Security Modules (LSMs) that implement the "capable" hook will usually emit an access denial message to the audit log whenever they "block" the current task from using the given capability based on their security policy. The occurrence of a denial is used as an indication that the given task has attempted an operation that requires the given access permission, so the callers of functions that perform LSM permission checks must take care to avoid calling them too early (before it is decided if the permission is actually needed to perform the requested operation). The __sys_setres[ug]id() functions violate this convention by first calling ns_capable_setid() and only then checking if the operation requires the capability or not. It means that any caller that has the capability granted by DAC (task's capability set) but not by MAC (LSMs) will generate a "denied" audit record, even if is doing an operation for which the capability is not required. Fix this by reordering the checks such that ns_capable_setid() is checked last and -EPERM is returned immediately if it returns false. While there, also do two small optimizations: * move the capability check before prepare_creds() and * bail out early in case of a no-op. Link: https://lkml.kernel.org/r/[email protected] Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Ondrej Mosnacek <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2023-04-18veth: take into account peer device for NETDEV_XDP_ACT_NDO_XMIT xdp_features ↵Lorenzo Bianconi1-6/+11
flag For veth pairs, NETDEV_XDP_ACT_NDO_XMIT is supported by the current device if the peer one is running a XDP program or if it has GRO enabled. Fix the xdp_features flags reporting considering peer device and not current one for NETDEV_XDP_ACT_NDO_XMIT. Fixes: fccca038f300 ("veth: take into account device reconfiguration for xdp_features flag") Signed-off-by: Lorenzo Bianconi <[email protected]> Link: https://lore.kernel.org/r/4f1ca6f6f6b42ae125bfdb5c7782217c83968b2e.1681767806.git.lorenzo@kernel.org Signed-off-by: Alexei Starovoitov <[email protected]>
2023-04-18Merge tag 'mmc-v6.3-rc3' of ↵Linus Torvalds2-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC host: - sdhci_am654: Fix support for UHS-I SDR12 and SDR25 speed modes MEMSTICK: - Fix memory leak if card device never gets registered" * tag 'mmc-v6.3-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: memstick: fix memory leak if card device is never registered mmc: sdhci_am654: Set HIGH_SPEED_ENA for SDR12 and SDR25
2023-04-18Merge tag 'arm-fixes-6.3-3' of ↵Linus Torvalds37-98/+88
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull ARM SoC fixes from Arnd Bergmann: "There are a number of updates for devicetree files for Qualcomm, Rockchips, and NXP i.MX platforms, addressing mistakes in the DT contents: - Wrong GPIO polarity on some boards - Lower SD card interface speed for better stability - Incorrect power supply, clock, pmic, cache properties - Disable broken hbr3 on sc7280-herobrine - Devicetree warning fixes The only other changes are: - A regression fix for the Amlogic performance monitoring unit driver, along with two related DT changes. - imx_v6_v7_defconfig enables PCI support again. - Trivial fixes for tee, optee and psci firmware drivers, addressing compiler warning and error output" * tag 'arm-fixes-6.3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (32 commits) firmware/psci: demote suspend-mode warning to info level arm64: dts: qcom: sc7280: remove hbr3 support on herobrine boards ARM: imx_v6_v7_defconfig: Fix unintentional disablement of PCI arm64: dts: rockchip: correct panel supplies on some rk3326 boards arm64: dts: rockchip: use just "port" in panel on RockPro64 arm64: dts: rockchip: use just "port" in panel on Pinebook Pro ARM: dts: imx6ull-colibri: Remove unnecessary #address-cells/#size-cells ARM: dts: imx7d-remarkable2: Remove unnecessary #address-cells/#size-cells arm64: dts: imx8mp-verdin: correct off-on-delay arm64: dts: imx8mm-verdin: correct off-on-delay arm64: dts: imx8mm-evk: correct pmic clock source arm64: dts: qcom: sc8280xp-pmics: fix pon compatible and registers arm64: dts: rockchip: Remove non-existing pwm-delay-us property arm64: dts: rockchip: Add clk_rtc_32k to Anbernic xx3 Devices tee: Pass a pointer to virt_to_page() perf/amlogic: adjust register offsets arm64: dts: meson-g12-common: resolve conflict between canvas & pmu arm64: dts: meson-g12-common: specify full DMC range arm64: dts: imx8mp: fix address length for LCDIF2 riscv: dts: canaan: drop invalid spi-max-frequency ...
2023-04-18LoongArch: module: set section addresses to 0x0Huacai Chen1-4/+4
These got*, plt* and .text.ftrace_trampoline sections specified for LoongArch have non-zero addressses. Non-zero section addresses in a relocatable ELF would confuse GDB when it tries to compute the section offsets and it ends up printing wrong symbol addresses. Therefore, set them to zero, which mirrors the change in commit 5d8591bc0fbaeb6ded ("arm64 module: set plt* section addresses to 0x0"). Cc: [email protected] Reviewed-by: Guo Ren <[email protected]> Signed-off-by: Chong Qiao <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2023-04-18LoongArch: Mark 3 symbol exports as non-GPLHuacai Chen2-3/+3
vm_map_base, empty_zero_page and invalid_pmd_table could be accessed widely by some out-of-tree non-GPL but important file systems or drivers (e.g. OpenZFS). Let's use EXPORT_SYMBOL() instead of EXPORT_SYMBOL_GPL() to export them, so as to avoid build errors. 1, Details about vm_map_base: This is a LoongArch-specific symbol and may be referenced through macros PCI_IOBASE, VMALLOC_START and VMALLOC_END. 2, Details about empty_zero_page: As it stands today, only 3 architectures export empty_zero_page as a GPL symbol: IA64, LoongArch and MIPS. LoongArch gets the GPL export by inheriting from MIPS, and the MIPS export was first introduced in commit 497d2adcbf50b ("[MIPS] Export empty_zero_page for sake of the ext4 module."). The IA64 export was similar: commit a7d57ecf4216e ("[IA64] Export three symbols for module use") did so for kvm. In both IA64 and MIPS, the export of empty_zero_page was done for satisfying some in-kernel component built as module (kvm and ext4 respectively), and given its reasonably low-level nature, GPL is a reasonable choice. But looking at the bigger picture it is evident most other architectures do not regard it as GPL, so in effect the symbol probably should not be treated as such, in favor of consistency. 3, Details about invalid_pmd_table: Keep consistency with invalid_pte_table and make it be possible by some modules. Cc: [email protected] Reviewed-by: WANG Xuerui <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2023-04-18LoongArch: Enable PG when wakeup from suspendHuacai Chen1-0/+4
Some firmwares don't enable PG when wakeup from suspend, so do it in kernel. This can improve code compatibility for boot kernel. Signed-off-by: Baoqi Zhang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2023-04-18LoongArch: Fix _CONST64_(x) as unsignedQing Zhang1-2/+2
Addresses should all be of unsigned type to avoid unnecessary conversions. Signed-off-by: Qing Zhang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2023-04-18LoongArch: Fix build error if CONFIG_SUSPEND is not setHuacai Chen1-0/+3
We can see the following build error on LoongArch if CONFIG_SUSPEND is not set: ld: drivers/acpi/sleep.o: in function 'acpi_pm_prepare': sleep.c:(.text+0x2b8): undefined reference to 'loongarch_wakeup_start' Here is the call trace: acpi_pm_prepare() __acpi_pm_prepare() acpi_sleep_prepare() acpi_get_wakeup_address() loongarch_wakeup_start() Root cause: loongarch_wakeup_start() is defined in arch/loongarch/power/ suspend_asm.S which is only built under CONFIG_SUSPEND. In order to fix the build error, just let acpi_get_wakeup_address() return 0 if CONFIG_ SUSPEND is not set. Fixes: 366bb35a8e48 ("LoongArch: Add suspend (ACPI S3) support") Reviewed-by: WANG Xuerui <[email protected]> Reported-by: Randy Dunlap <[email protected]> Link: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Tiezhu Yang <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2023-04-18LoongArch: Fix probing of the CRC32 featureHuacai Chen5-21/+30
Not all LoongArch processors support CRC32 instructions. This feature is indicated by CPUCFG1.CRC32 (Bit25) but it is wrongly defined in the previous versions of the ISA manual (and so does in loongarch.h). The CRC32 feature is set unconditionally now, so fix it. BTW, expose the CRC32 feature in /proc/cpuinfo. Cc: [email protected] Signed-off-by: Huacai Chen <[email protected]>
2023-04-18LoongArch: Make WriteCombine configurable for ioremap()Huacai Chen5-1/+47
LoongArch maintains cache coherency in hardware, but when paired with LS7A chipsets the WUC attribute (Weak-ordered UnCached, which is similar to WriteCombine) is out of the scope of cache coherency machanism for PCIe devices (this is a PCIe protocol violation, which may be fixed in newer chipsets). This means WUC can only used for write-only memory regions now, so this option is disabled by default, making WUC silently fallback to SUC for ioremap(). You can enable this option if the kernel is ensured to run on hardware without this bug. Kernel parameter writecombine=on/off can be used to override the Kconfig option. Cc: [email protected] Suggested-by: WANG Xuerui <[email protected]> Reviewed-by: WANG Xuerui <[email protected]> Signed-off-by: Huacai Chen <[email protected]>
2023-04-18mlxfw: fix null-ptr-deref in mlxfw_mfa2_tlv_next()Nikita Zhandarovich1-0/+2
Function mlxfw_mfa2_tlv_multi_get() returns NULL if 'tlv' in question does not pass checks in mlxfw_mfa2_tlv_payload_get(). This behaviour may lead to NULL pointer dereference in 'multi->total_len'. Fix this issue by testing mlxfw_mfa2_tlv_multi_get()'s return value against NULL. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Fixes: 410ed13cae39 ("Add the mlxfw module for Mellanox firmware flash process") Co-developed-by: Natalia Petrova <[email protected]> Signed-off-by: Nikita Zhandarovich <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18Merge branch 'add-ethernet-driver-for-starfive-jh7110-soc'Paolo Abeni7-5/+350
Samin Guo says: ==================== Add Ethernet driver for StarFive JH7110 SoC This series adds ethernet support for the StarFive JH7110 RISC-V SoC, which includes a dwmac-5.20 MAC driver (from Synopsys DesignWare). This series has been tested and works fine on VisionFive-2 v1.2A and v1.3B SBC boards. For more information and support, you can visit RVspace wiki[1]. You can simply review or test the patches at the link [2]. This patchset should be applied after the patchset [3] [4]. [1]: https://wiki.rvspace.org/ [2]: https://github.com/saminGuo/linux/tree/vf2-6.3rc4-gmac-net-next [3]: https://patchwork.kernel.org/project/linux-riscv/cover/[email protected] [4]: https://patchwork.kernel.org/project/linux-riscv/cover/[email protected] ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18net: stmmac: dwmac-starfive: Add phy interface settingsSamin Guo1-0/+48
dwmac supports multiple modess. When working under rmii and rgmii, you need to set different phy interfaces. According to the dwmac document, when working in rmii, it needs to be set to 0x4, and rgmii needs to be set to 0x1. The phy interface needs to be set in syscon, the format is as follows: starfive,syscon: <&syscon, offset, shift> Tested-by: Tommaso Merciai <[email protected]> Signed-off-by: Samin Guo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18net: stmmac: Add glue layer for StarFive JH7110 SoCSamin Guo4-0/+137
This adds StarFive dwmac driver support on the StarFive JH7110 SoC. Tested-by: Tommaso Merciai <[email protected]> Co-developed-by: Emil Renner Berthing <[email protected]> Signed-off-by: Emil Renner Berthing <[email protected]> Signed-off-by: Samin Guo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18dt-bindings: net: Add support StarFive dwmacYanhong Wang3-0/+151
Add documentation to describe StarFive dwmac driver(GMAC). Signed-off-by: Yanhong Wang <[email protected]> Reviewed-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Samin Guo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18dt-bindings: net: snps,dwmac: Add 'ahb' reset/reset-nameSamin Guo1-4/+8
According to: stmmac_platform.c: stmmac_probe_config_dt stmmac_main.c: stmmac_dvr_probe dwmac controller may require one (stmmaceth) or two (stmmaceth+ahb) reset signals, and the maxItems of resets/reset-names is going to be 2. The gmac of Starfive Jh7110 SOC must have two resets. it uses snps,dwmac-5.20 IP. Reviewed-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Samin Guo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18net: stmmac: platform: Add snps,dwmac-5.20 IP compatible stringEmil Renner Berthing1-1/+2
Add "snps,dwmac-5.20" compatible string for 5.20 version that can avoid to define some platform data in the glue layer. Tested-by: Tommaso Merciai <[email protected]> Signed-off-by: Emil Renner Berthing <[email protected]> Signed-off-by: Samin Guo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18dt-bindings: net: snps,dwmac: Add dwmac-5.20 versionEmil Renner Berthing1-0/+4
Add dwmac-5.20 IP version to snps.dwmac.yaml Acked-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Emil Renner Berthing <[email protected]> Signed-off-by: Samin Guo <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18Merge branch 'r8169-use-new-macros-from-netdev_queues-h'Paolo Abeni2-38/+23
Heiner Kallweit says: ==================== r8169: use new macros from netdev_queues.h Add one missing subqueue version of the macros, and use the new macros in r8169 to simplify the code. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18r8169: use new macro netif_subqueue_completed_wake in the tx cleanup pathHeiner Kallweit1-12/+4
Use new net core macro netif_subqueue_completed_wake to simplify the code of the tx cleanup path. Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18r8169: use new macro netif_subqueue_maybe_stop in rtl8169_start_xmitHeiner Kallweit1-28/+11
Use new net core macro netif_subqueue_maybe_stop in the start_xmit path to simplify the code. Whilst at it, set the tx queue start threshold to twice the stop threshold. Before values were the same, resulting in stopping/starting the queue more often than needed. Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18net: add macro netif_subqueue_completed_wakeHeiner Kallweit1-0/+10
Add netif_subqueue_completed_wake, complementing the subqueue versions netif_subqueue_try_stop and netif_subqueue_maybe_stop. Reviewed-by: Jacob Keller <[email protected]> Signed-off-by: Heiner Kallweit <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18Merge branch 'bnxt_en-bug-fixes'Paolo Abeni2-10/+11
Michael Chan says: ==================== bnxt_en: Bug fixes This small series contains 2 fixes. The first one fixes the PTP initialization logic on older chips to avoid logging a warning. The second one fixes a potenial NULL pointer dereference in the driver's aux bus unload path. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18bnxt_en: Fix a possible NULL pointer dereference in unload pathKalesh AP1-9/+10
In the driver unload path, the driver currently checks the valid BNXT_FLAG_ROCE_CAP flag in bnxt_rdma_aux_device_uninit() before proceeding. This is flawed because the flag may not be set initially during driver load. It may be set later after the NVRAM setting is changed followed by a firmware reset. Relying on the BNXT_FLAG_ROCE_CAP flag may crash in bnxt_rdma_aux_device_uninit() if the aux device was never initialized: BUG: unable to handle kernel NULL pointer dereference at 0000000000000000 PGD 8ae6aa067 P4D 0 Oops: 0000 [#1] SMP NOPTI CPU: 39 PID: 42558 Comm: rmmod Kdump: loaded Tainted: G OE --------- - - 4.18.0-348.el8.x86_64 #1 Hardware name: Dell Inc. PowerEdge R750/0WT8Y6, BIOS 1.5.4 12/17/2021 RIP: 0010:device_del+0x1b/0x410 Code: 89 a5 50 03 00 00 4c 89 a5 58 03 00 00 eb 89 0f 1f 44 00 00 41 56 41 55 41 54 4c 8d a7 80 00 00 00 55 53 48 89 fb 48 83 ec 18 <48> 8b 2f 4c 89 e7 65 48 8b 04 25 28 00 00 00 48 89 44 24 10 31 c0 RSP: 0018:ff7f82bf469a7dc8 EFLAGS: 00010292 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000206 RDI: 0000000000000000 RBP: ff31b7cd114b0ac0 R08: 0000000000000000 R09: ffffffff935c3400 R10: ff31b7cd45bc3440 R11: 0000000000000001 R12: 0000000000000080 R13: ffffffffc1069f40 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fc9903ce740(0000) GS:ff31b7d4ffac0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000992fee004 CR4: 0000000000773ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: bnxt_rdma_aux_device_uninit+0x1f/0x30 [bnxt_en] bnxt_remove_one+0x2f/0x1f0 [bnxt_en] pci_device_remove+0x3b/0xc0 device_release_driver_internal+0x103/0x1f0 driver_detach+0x54/0x88 bus_remove_driver+0x77/0xc9 pci_unregister_driver+0x2d/0xb0 bnxt_exit+0x16/0x2c [bnxt_en] __x64_sys_delete_module+0x139/0x280 do_syscall_64+0x5b/0x1a0 entry_SYSCALL_64_after_hwframe+0x65/0xca RIP: 0033:0x7fc98f3af71b Fix this by modifying the check inside bnxt_rdma_aux_device_uninit() to check for bp->aux_priv instead. We also need to make some changes in bnxt_rdma_aux_device_init() to make sure that bp->aux_priv is set only when the aux device is fully initialized. Fixes: d80d88b0dfff ("bnxt_en: Add auxiliary driver support") Reviewed-by: Ajit Khaparde <[email protected]> Signed-off-by: Kalesh AP <[email protected]> Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18bnxt_en: Do not initialize PTP on older P3/P4 chipsMichael Chan1-1/+1
The driver does not support PTP on these older chips and it is assuming that firmware on these older chips will not return the PORT_MAC_PTP_QCFG_RESP_FLAGS_HWRM_ACCESS flag in __bnxt_hwrm_ptp_qcfg(), causing the function to abort quietly. But newer firmware now sets this flag and so __bnxt_hwrm_ptp_qcfg() will proceed further. Eventually it will fail in bnxt_ptp_init() -> bnxt_map_ptp_regs() because there is no code to support the older chips. The driver will then complain: "PTP initialization failed.\n" Fix it so that we abort quietly earlier without going through the unnecessary steps and alarming the user with the warning log. Fixes: ae5c42f0b92c ("bnxt_en: Get PTP hardware capability from firmware") Signed-off-by: Michael Chan <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18netfilter: nf_tables: tighten netlink attribute requirements for catch-all ↵Pablo Neira Ayuso1-1/+2
elements If NFT_SET_ELEM_CATCHALL is set on, then userspace provides no set element key. Otherwise, bail out with -EINVAL. Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2023-04-18cxgb4: fix use after free bugs caused by circular dependency problemDuoming Zhou1-1/+1
The flower_stats_timer can schedule flower_stats_work and flower_stats_work can also arm the flower_stats_timer. The process is shown below: ----------- timer schedules work ------------ ch_flower_stats_cb() //timer handler schedule_work(&adap->flower_stats_work); ----------- work arms timer ------------ ch_flower_stats_handler() //workqueue callback function mod_timer(&adap->flower_stats_timer, ...); When the cxgb4 device is detaching, the timer and workqueue could still be rearmed. The process is shown below: (cleanup routine) | (timer and workqueue routine) remove_one() | free_some_resources() | ch_flower_stats_cb() //timer cxgb4_cleanup_tc_flower() | schedule_work() del_timer_sync() | | ch_flower_stats_handler() //workqueue | mod_timer() cancel_work_sync() | kfree(adapter) //FREE | ch_flower_stats_cb() //timer | adap->flower_stats_work //USE This patch changes del_timer_sync() to timer_shutdown_sync(), which could prevent rearming of the timer from the workqueue. Fixes: e0f911c81e93 ("cxgb4: fetch stats for offloaded tc flower flows") Signed-off-by: Duoming Zhou <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-04-18netfilter: nf_tables: validate catch-all set elementsPablo Neira Ayuso3-38/+66
catch-all set element might jump/goto to chain that uses expressions that require validation. Fixes: aaa31047a6d2 ("netfilter: nftables: add catch-all set element support") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2023-04-17Merge branch 'ocelot-felix-driver-support-for-preemptible-traffic-classes'Jakub Kicinski5-25/+199
Vladimir Oltean says: ==================== Ocelot/Felix driver support for preemptible traffic classes The series "Add tc-mqprio and tc-taprio support for preemptible traffic classes" from: https://lore.kernel.org/netdev/[email protected]/ was eventually submitted in a form without the support for the Ocelot/Felix switch driver. This patch set picks up that work again, and presents a fairly modified form compared to the original. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17net: mscc: ocelot: add support for preemptible traffic classesVladimir Oltean5-3/+74
In order to not transmit (preemptible) frames which will be received by the link partner as corrupted (because it doesn't support FP), the hardware requires the driver to program the QSYS_PREEMPTION_CFG_P_QUEUES register only after the MAC Merge layer becomes active (verification succeeds, or was disabled). There are some cases when FP is known (through experimentation) to be broken. Give priority to FP over cut-through switching, and disable FP for known broken link modes. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17net: dsa: felix: act upon the mqprio qopt in taprio offloadVladimir Oltean1-5/+17
The mqprio queue configuration can appear either through TC_SETUP_QDISC_MQPRIO or through TC_SETUP_QDISC_TAPRIO. Make sure both are treated in the same way. Code does nothing new for now (except for rejecting multiple TXQs per TC, which is a useless concept with DSA switches). Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ferenc Fejes <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17net: mscc: ocelot: add support for mqprio offloadVladimir Oltean3-0/+63
This doesn't apply anything to hardware and in general doesn't do anything that the software variant doesn't do, except for checking that there isn't more than 1 TXQ per TC (TXQs for a DSA switch are a dubious concept anyway). The reason we add this is to be able to parse one more field added to struct tc_mqprio_qopt_offload, namely preemptible_tcs. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ferenc Fejes <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17net: mscc: ocelot: don't rely on cached verify_status in ocelot_port_get_mm()Vladimir Oltean1-0/+1
ocelot_mm_update_port_status() updates mm->verify_status, but when the verification state of a port changes, an IRQ isn't emitted, but rather, only when the verification state reaches one of the final states (like DISABLED, FAILED, SUCCEEDED) - things that would affect mm->tx_active, which is what the IRQ *is* actually emitted for. That is to say, user space may miss reports of an intermediary MAC Merge verification state (like from INITIAL to VERIFYING), unless there was an IRQ notifying the driver of the change in mm->tx_active as well. This is not a huge deal, but for reliable reporting to user space, let's call ocelot_mm_update_port_status() synchronously from ocelot_port_get_mm(), which makes user space see the current MM status. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17net: mscc: ocelot: optimize ocelot_mm_irq()Vladimir Oltean2-2/+29
The MAC Merge IRQ of all ports is shared with the PTP TX timestamp IRQ of all ports, which means that currently, when a PTP TX timestamp is generated, felix_irq_handler() also polls for the MAC Merge layer status of all ports, looking for changes. This makes the kernel do more work, and under certain circumstances may make ptp4l require a tx_timestamp_timeout argument higher than before. Changes to the MAC Merge layer status are only to be expected under certain conditions - its TX direction needs to be enabled - so we can check early if that is the case, and omit register access otherwise. Make ocelot_mm_update_port_status() skip register access if mm->tx_enabled is unset, and also call it once more, outside IRQ context, from ocelot_port_set_mm(), when mm->tx_enabled transitions from true to false, because an IRQ is also expected in that case. Also, a port may have its MAC Merge layer enabled but it may not have generated the interrupt. In that case, there's no point in writing to DEV_MM_STATUS to acknowledge that IRQ. We can reduce the number of register writes per port with MM enabled by keeping an "ack" variable which writes the "write-one-to-clear" bits. Those are 3 in number: PRMPT_ACTIVE_STICKY, UNEXP_RX_PFRM_STICKY and UNEXP_TX_PFRM_STICKY. The other fields in DEV_MM_STATUS are read-only and it doesn't matter what is written to them, so writing zero is just fine. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17net: mscc: ocelot: remove struct ocelot_mm_state :: lockVladimir Oltean2-13/+8
Unfortunately, the workarounds for the hardware bugs make it pointless to keep fine-grained locking for the MAC Merge state of each port. Our vsc9959_cut_through_fwd() implementation requires ocelot->fwd_domain_lock to be held, in order to serialize with changes to the bridging domains and to port speed changes (which affect which ports can be cut-through). Simultaneously, the traffic classes which can be cut-through cannot be preemptible at the same time, and this will depend on the MAC Merge layer state (which changes from threaded interrupt context). Since vsc9959_cut_through_fwd() would have to hold the mm->lock of all ports for a correct and race-free implementation with respect to ocelot_mm_irq(), in practice it means that any time a port's mm->lock is held, it would potentially block holders of ocelot->fwd_domain_lock. In the interest of simple locking rules, make all MAC Merge layer state changes (and preemptible traffic class changes) be serialized by the ocelot->fwd_domain_lock. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17net: mscc: ocelot: export a single ocelot_mm_irq()Vladimir Oltean3-7/+12
When the switch emits an IRQ, we don't know what caused it, and we iterate through all ports to check the MAC Merge status. Move that iteration inside the ocelot lib; we will change the locking in a future change and it would be good to encapsulate that lock completely within the ocelot lib. Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17Merge branch 'xdp-rx-hwts-metadata-for-stmmac-driver'Jakub Kicinski2-10/+77
Song Yoong Siang says: ==================== XDP Rx HWTS metadata for stmmac driver Implemented XDP receive hardware timestamp metadata for stmmac driver. This patchset is tested with tools/testing/selftests/bpf/xdp_hw_metadata. Below are the test steps and results. Command on DUT: sudo ./xdp_hw_metadata <interface name> Command on Link Partner: echo -n xdp | nc -u -q1 <destination IPv4 addr> 9091 echo -n skb | nc -u -q1 <destination IPv4 addr> 9092 Result for port 9091: poll: 1 (0) skip=1 fail=0 redir=1 xsk_ring_cons__peek: 1 0x55f69f65f6d0: rx_desc[0]->addr=100000000008000 addr=8100 comp_addr=8000 rx_timestamp: 1677762069053692631 No rx_hash err=-95 0x55f69f65f6d0: complete idx=8 addr=8000 Result for port 9092: poll: 1 (0) skip=2 fail=0 redir=1 found skb hwtstamp = 1677762071.937207680 ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17net: stmmac: add Rx HWTS metadata to XDP ZC receive pktSong Yoong Siang1-0/+22
Add receive hardware timestamp metadata support via kfunc to XDP Zero Copy receive packets. Signed-off-by: Song Yoong Siang <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17net: stmmac: add Rx HWTS metadata to XDP receive pktSong Yoong Siang2-1/+42
Add receive hardware timestamp metadata support via kfunc to XDP receive packets. Suggested-by: Stanislav Fomichev <[email protected]> Signed-off-by: Song Yoong Siang <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17net: stmmac: introduce wrapper for struct xdp_buffSong Yoong Siang2-9/+13
Introduce struct stmmac_xdp_buff as a preparation to support XDP Rx metadata via kfuncs. Signed-off-by: Song Yoong Siang <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17Merge branch 'support-tunnel-mode-in-mlx5-ipsec-packet-offload'Jakub Kicinski7-47/+481
Leon Romanovsky says: ==================== Support tunnel mode in mlx5 IPsec packet offload This series extends mlx5 to support tunnel mode in its IPsec packet offload implementation. v0: https://lore.kernel.org/all/[email protected] ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17net/mlx5e: Accept tunnel mode for IPsec packet offloadLeon Romanovsky1-7/+8
Open mlx5 driver to accept IPsec tunnel mode. Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17net/mlx5e: Create IPsec table with tunnel support only when encap is disabledLeon Romanovsky3-3/+39
Current hardware doesn't support double encapsulation which is happening when IPsec packet offload tunnel mode is configured together with eswitch encap option. Any user attempt to add new SA/policy after he/she sets encap mode, will generate the following FW syndrome: mlx5_core 0000:08:00.0: mlx5_cmd_out_err:803:(pid 1904): CREATE_FLOW_TABLE(0x930) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0xa43321), err(-22) Make sure that we block encap changes before creating flow steering tables. This is applicable only for packet offload in tunnel mode, while packet offload in transport mode and crypto offload, don't have such limitation as they don't perform encapsulation. Reviewed-by: Raed Salem <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17net/mlx5: Allow blocking encap changes in eswitchLeon Romanovsky2-0/+62
Existing eswitch encap option enables header encapsulation. Unfortunately currently available hardware isn't able to perform double encapsulation, which can happen once IPsec packet offload tunnel mode is used together with encap mode set to BASIC. So as a solution for misconfiguration, provide an option to block encap changes, which will be used for IPsec packet offload. Reviewed-by: Emeel Hakim <[email protected]> Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-04-17net/mlx5e: Listen to ARP events to update IPsec L2 headers in tunnel modeLeon Romanovsky2-7/+130
In IPsec packet offload mode all header manipulations are performed by hardware, which is responsible to add/remove L2 header with source and destinations MACs. CX-7 devices don't support offload of in-kernel routing functionality, as such HW needs external help to fill other side MAC as it isn't available for HW. As a solution, let's listen to neigh ARP updates and reconfigure IPsec rules on the fly once new MAC data information arrives. Signed-off-by: Leon Romanovsky <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Sridhar Samudrala <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>