aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-01-25Merge tag 'vfs-6.8-rc2.netfs' of ↵Linus Torvalds12-42/+79
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull netfs fixes from Christian Brauner: "This contains various fixes for the netfs work merged earlier this cycle: afs: - Fix locking imbalance in afs_proc_addr_prefs_show() - Remove afs_dynroot_d_revalidate() which is redundant - Fix error handling during lookup - Hide sillyrenames from userspace. This fixes a race between silly-rename files being created/removed and userspace iterating over directory entries - Don't use unnecessary folio_*() functions cifs: - Don't use unnecessary folio_*() functions cachefiles: - erofs: Fix Null dereference when cachefiles are not doing ondemand-mode - Update mailing list netfs library: - Add Jeff Layton as reviewer - Update mailing list - Fix a error checking in netfs_perform_write() - fscache: Check error before dereferencing - Don't use unnecessary folio_*() functions" * tag 'vfs-6.8-rc2.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: afs: Fix missing/incorrect unlocking of RCU read lock afs: Remove afs_dynroot_d_revalidate() as it is redundant afs: Fix error handling with lookup via FS.InlineBulkStatus afs: Hide silly-rename files from userspace cachefiles, erofs: Fix NULL deref in when cachefiles is not doing ondemand-mode netfs: Fix a NULL vs IS_ERR() check in netfs_perform_write() netfs, fscache: Prevent Oops in fscache_put_cache() cifs: Don't use certain unnecessary folio_*() functions afs: Don't use certain unnecessary folio_*() functions netfs: Don't use certain unnecessary folio_*() functions netfs: Add Jeff Layton as reviewer netfs, cachefiles: Change mailing list
2024-01-25Merge tag 'nfsd-6.8-1' of ↵Linus Torvalds2-13/+17
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux Pull nfsd fixes from Chuck Lever: - Fix in-kernel RPC UDP transport - Fix NFSv4.0 RELEASE_LOCKOWNER * tag 'nfsd-6.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: nfsd: fix RELEASE_LOCKOWNER SUNRPC: use request size to initialize bio_vec in svc_udp_sendto()
2024-01-25Merge tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linuxLinus Torvalds2-3/+34
Pull RCU fix from Neeraj Upadhyay: "This fixes RCU grace period stalls, which are observed when an outgoing CPU's quiescent state reporting results in wakeup of one of the grace period kthreads, to complete the grace period. If those kthreads have SCHED_FIFO policy, the wake up can indirectly arm the RT bandwith timer to the local offline CPU. Earlier migration of the hrtimers from the CPU introduced in commit 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") results in this timer getting ignored. If the RCU grace period kthreads are waiting for RT bandwidth to be available, they may never be actually scheduled, resulting in RCU stall warnings" * tag 'urgent-rcu.2024.01.24a' of https://github.com/neeraju/linux: rcu: Defer RCU kthreads wakeup when CPU is dying
2024-01-25Merge tag 'samsung-fixes-6.8' of ↵Arnd Bergmann2-1/+2
https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into arm/fixes Samsung fixes for v6.8 1. Google GS101: Correct the input clock names to CMU MISC clock controller to match received review. The review was initially missed and CMU MISC clock controller bindings, driver and DTS was merged into v6.8-rc1 with different names. Nothing was released so far, so the bindings and driver can be still corrected to match review. 2. Samsung Galaxy Tab3: Fix display by using correct vclk polarity in display node. * tag 'samsung-fixes-6.8' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux: ARM: dts: exynos4212-tab3: add samsung,invert-vclk flag to fimd arm64: dts: exynos: gs101: comply with the new cmu_misc clock names Link: https://lore.kernel.org/r/20240125082400.163935-1-krzysztof.kozlowski@linaro.org Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-01-25Merge tag 'ffa-fixes-6.8' of ↵Arnd Bergmann1-28/+57
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes Arm FF-A fixes for v6.8 Quite a few fixes addressing issues around missing RW lock initialisation in ffa_setup_partitions(), missing check for xa_load() return value, use of xa_insert instead of xa_store to flag case of duplicate insertion. It also simplifies ffa_partitions_cleanup() with xa_for_each() and xa_erase() instead of xa_extract() and kfree(). Finally it includes fixes around handling of partitions setup failures during initialisation. * tag 'ffa-fixes-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: firmware: arm_ffa: Handle partitions setup failures firmware: arm_ffa: Use xa_insert() and check for result firmware: arm_ffa: Simplify ffa_partitions_cleanup() firmware: arm_ffa: Check xa_load() return value firmware: arm_ffa: Add missing rwlock_init() for the driver partition firmware: arm_ffa: Add missing rwlock_init() in ffa_setup_partitions() Link: https://lore.kernel.org/r/20240122161652.3551159-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-01-25Merge tag 'scmi-fixes-6.8' of ↵Arnd Bergmann6-13/+50
git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into arm/fixes Arm SCMI fixes for v6.8 Few fixes addressing the below issues: 1. A spurious IRQ related to the late reply can get wrongly associated with the new enqueued request resulting in misinterpretation of data in shared memory. This race-condition can be detected by looking at the channel status bits which the platform must set to the channel free before triggering the completion IRQ. Adding a consistency check to validate such condition will fix the issue. 2. Incorrect use of asm-generic/bug.h instead of generic linux/bUg.h 3. xa_store() can't check for possible duplication insertion, use xa_insert() instead 4. Fix the SCMI clock protocol version in the v3.2 SCMI specification 5. Incorrect upgrade of highest supported clock protocol version from v2.0 to v3.0 * tag 'scmi-fixes-6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: firmware: arm_scmi: Fix the clock protocol supported version firmware: arm_scmi: Fix the clock protocol version for v3.2 firmware: arm_scmi: Use xa_insert() when saving raw queues firmware: arm_scmi: Use xa_insert() to store opps firmware: arm_scmi: Replace asm-generic/bug.h with linux/bug.h firmware: arm_scmi: Check mailbox/SMT channel for consistency Link: https://lore.kernel.org/r/20240122161640.3551085-1-sudeep.holla@arm.com Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2024-01-25arm64: dts: Fix TPM schema violationsLukas Wunner12-12/+12
Since commit 26c9d152ebf3 ("dt-bindings: tpm: Consolidate TCG TIS bindings"), several issues are reported by "make dtbs_check" for arm64 devicetrees: The compatible property needs to contain the chip's name in addition to the generic "tcg,tpm_tis-spi" and the nodename needs to be "tpm@0" rather than "cr50@0": tpm@1: compatible: ['tcg,tpm_tis-spi'] is too short from schema $id: http://devicetree.org/schemas/tpm/tcg,tpm_tis-spi.yaml# cr50@0: $nodename:0: 'cr50@0' does not match '^tpm(@[0-9a-f]+)?$' from schema $id: http://devicetree.org/schemas/tpm/google,cr50.yaml# Fix these schema violations. phyGATE-Tauri uses an Infineon SLB9670: https://lore.kernel.org/all/ab45c82485fa272f74adf560cbb58ee60cc42689.camel@phytec.de/ Gateworks Venice uses an Atmel ATTPM20P: https://trac.gateworks.com/wiki/tpm Signed-off-by: Lukas Wunner <lukas@wunner.de> Acked-by: Heiko Stuebner <heiko@sntech.de> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
2024-01-25ARM: dts: Fix TPM schema violationsLukas Wunner7-10/+10
Since commit 26c9d152ebf3 ("dt-bindings: tpm: Consolidate TCG TIS bindings"), several issues are reported by "make dtbs_check" for ARM devicetrees: The nodename needs to be "tpm@0" rather than "tpmdev@0" and the compatible property needs to contain the chip's name in addition to the generic "tcg,tpm_tis-spi" or "tcg,tpm-tis-i2c": tpmdev@0: $nodename:0: 'tpmdev@0' does not match '^tpm(@[0-9a-f]+)?$' from schema $id: http://devicetree.org/schemas/tpm/tcg,tpm_tis-spi.yaml# tpm@2e: compatible: 'oneOf' conditional failed, one must be fixed: ['tcg,tpm-tis-i2c'] is too short from schema $id: http://devicetree.org/schemas/tpm/tcg,tpm-tis-i2c.yaml# Fix these schema violations. Aspeed Facebook BMCs use an Infineon SLB9670: https://lore.kernel.org/all/ZZSmMJ%2F%2Fl972Qbxu@fedora/ https://lore.kernel.org/all/ZZT4%2Fw2eVzMhtsPx@fedora/ https://lore.kernel.org/all/ZZTS0p1hdAchIbKp@heinlein.vulture-banana.ts.net/ Aspeed Tacoma uses a Nuvoton NPCT75X per commit 39d8a73c53a2 ("ARM: dts: aspeed: tacoma: Add TPM"). phyGATE-Tauri uses an Infineon SLB9670: https://lore.kernel.org/all/ab45c82485fa272f74adf560cbb58ee60cc42689.camel@phytec.de/ A single schema violation remains in am335x-moxa-uc-2100-common.dtsi because it is unknown which chip is used on the board. The devicetree's author has been asked for clarification but has not responded so far: https://lore.kernel.org/all/20231220090910.GA32182@wunner.de/ Signed-off-by: Lukas Wunner <lukas@wunner.de> Reviewed-by: Patrick Williams <patrick@stwcx.xyz> Reviewed-by: Tao Ren <rentao.bupt@gmail.com> Reviewed-by: Bruno Thomsen <bruno.thomsen@gmail.com>
2024-01-25ahci: add 43-bit DMA address quirk for ASMedia ASM1061 controllersLennert Buytenhek2-6/+24
With one of the on-board ASM1061 AHCI controllers (1b21:0612) on an ASUSTeK Pro WS WRX80E-SAGE SE WIFI mainboard, a controller hang was observed that was immediately preceded by the following kernel messages: ahci 0000:28:00.0: Using 64-bit DMA addresses ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00000 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00300 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00380 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00400 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00680 flags=0x0000] ahci 0000:28:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0035 address=0x7fffff00700 flags=0x0000] The first message is produced by code in drivers/iommu/dma-iommu.c which is accompanied by the following comment that seems to apply: /* * Try to use all the 32-bit PCI addresses first. The original SAC vs. * DAC reasoning loses relevance with PCIe, but enough hardware and * firmware bugs are still lurking out there that it's safest not to * venture into the 64-bit space until necessary. * * If your device goes wrong after seeing the notice then likely either * its driver is not setting DMA masks accurately, the hardware has * some inherent bug in handling >32-bit addresses, or not all the * expected address bits are wired up between the device and the IOMMU. */ Asking the ASM1061 on a discrete PCIe card to DMA from I/O virtual address 0xffffffff00000000 produces the following I/O page faults: vfio-pci 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0x7ff00000000 flags=0x0010] vfio-pci 0000:07:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0021 address=0x7ff00000500 flags=0x0010] Note that the upper 21 bits of the logged DMA address are zero. (When asking a different PCIe device in the same PCIe slot to DMA to the same I/O virtual address, we do see all the upper 32 bits of the DMA address as 1, so this is not an issue with the chipset or IOMMU configuration on the test system.) Also, hacking libahci to always set the upper 21 bits of all DMA addresses to 1 produces no discernible effect on the behavior of the ASM1061, and mkfs/mount/scrub/etc work as without this hack. This all strongly suggests that the ASM1061 has a 43 bit DMA address limit, and this commit therefore adds a quirk to deal with this limit. This issue probably applies to (some of) the other supported ASMedia parts as well, but we limit it to the PCI IDs known to refer to ASM1061 parts, as that's the only part we know for sure to be affected by this issue at this point. Link: https://lore.kernel.org/linux-ide/ZaZ2PIpEId-rl6jv@wantstofly.org/ Signed-off-by: Lennert Buytenhek <kernel@wantstofly.org> [cassel: drop date from error messages in commit log] Signed-off-by: Niklas Cassel <cassel@kernel.org>
2024-01-25x86/CPU/AMD: Add more models to X86_FEATURE_ZEN5Mario Limonciello1-0/+3
Add model ranges starting at 0x20, 0x40 and 0x70 to the synthetic feature flag X86_FEATURE_ZEN5. Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20240124220749.2983-1-mario.limonciello@amd.com
2024-01-25Merge branch 'tsnep-xdp-fixes'Paolo Abeni1-2/+15
Gerhard Engleder says: ==================== tsnep: XDP fixes Found two driver specific problems during XDP and XSK testing. ==================== Link: https://lore.kernel.org/r/20240123200918.61219-1-gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25tsnep: Fix XDP_RING_NEED_WAKEUP for empty fill ringGerhard Engleder1-0/+13
The fill ring of the XDP socket may contain not enough buffers to completey fill the RX queue during socket creation. In this case the flag XDP_RING_NEED_WAKEUP is not set as this flag is only set if the RX queue is not completely filled during polling. Set XDP_RING_NEED_WAKEUP flag also if RX queue is not completely filled during XDP socket creation. Fixes: 3fc2333933fd ("tsnep: Add XDP socket zero-copy RX support") Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25tsnep: Remove FCS for XDP data pathGerhard Engleder1-2/+2
The RX data buffer includes the FCS. The FCS is already stripped for the normal data path. But for the XDP data path the FCS is included and acts like additional/useless data. Remove the FCS from the RX data buffer also for XDP. Fixes: 65b28c810035 ("tsnep: Add XDP RX support") Fixes: 3fc2333933fd ("tsnep: Add XDP socket zero-copy RX support") Signed-off-by: Gerhard Engleder <gerhard@engleder-embedded.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25Merge tag 'mlx5-fixes-2024-01-24' of ↵Paolo Abeni20-41/+99
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2024-01-24 This series provides bug fixes to mlx5 driver. Please pull and let me know if there is any problem. * tag 'mlx5-fixes-2024-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: net/mlx5e: fix a potential double-free in fs_any_create_groups net/mlx5e: fix a double-free in arfs_create_groups net/mlx5e: Ignore IPsec replay window values on sender side net/mlx5e: Allow software parsing when IPsec crypto is enabled net/mlx5: Use mlx5 device constant for selecting CQ period mode for ASO net/mlx5: DR, Can't go to uplink vport on RX rule net/mlx5: DR, Use the right GVMI number for drop action net/mlx5: Bridge, fix multicast packets sent to uplink net/mlx5: Fix a WARN upon a callback command failure net/mlx5e: Fix peer flow lists handling net/mlx5e: Fix inconsistent hairpin RQT sizes net/mlx5e: Fix operation precedence bug in port timestamping napi_poll context net/mlx5: Fix query of sd_group field net/mlx5e: Use the correct lag ports number when creating TISes ==================== Link: https://lore.kernel.org/r/20240124081855.115410-1-saeed@kernel.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25Merge tag 'for-netdev' of ↵Paolo Abeni13-91/+190
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-01-25 The following pull-request contains BPF updates for your *net* tree. We've added 12 non-merge commits during the last 2 day(s) which contain a total of 13 files changed, 190 insertions(+), 91 deletions(-). The main changes are: 1) Fix bpf_xdp_adjust_tail() in context of XSK zero-copy drivers which support XDP multi-buffer. The former triggered a NULL pointer dereference upon shrinking, from Maciej Fijalkowski & Tirthendu Sarkar. 2) Fix a bug in riscv64 BPF JIT which emitted a wrong prologue and epilogue for struct_ops programs, from Pu Lehui. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue i40e: set xdp_rxq_info::frag_size xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers ice: remove redundant xdp_rxq_info registration i40e: handle multi-buffer packets that are shrunk by xdp prog ice: work on pre-XDP prog frag count xsk: fix usage of multi-buffer BPF helpers for ZC XDP xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags xsk: recycle buffer in case Rx queue was full riscv, bpf: Fix unpredictable kernel crash about RV64 struct_ops ==================== Link: https://lore.kernel.org/r/20240125084416.10876-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25net: fec: fix the unhandled context fault from smmuShenwei Wang1-0/+2
When repeatedly changing the interface link speed using the command below: ethtool -s eth0 speed 100 duplex full ethtool -s eth0 speed 1000 duplex full The following errors may sometimes be reported by the ARM SMMU driver: [ 5395.035364] fec 5b040000.ethernet eth0: Link is Down [ 5395.039255] arm-smmu 51400000.iommu: Unhandled context fault: fsr=0x402, iova=0x00000000, fsynr=0x100001, cbfrsynra=0x852, cb=2 [ 5398.108460] fec 5b040000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off It is identified that the FEC driver does not properly stop the TX queue during the link speed transitions, and this results in the invalid virtual I/O address translations from the SMMU and causes the context faults. Fixes: dbc64a8ea231 ("net: fec: move calls to quiesce/resume packet processing out of fec_restart()") Signed-off-by: Shenwei Wang <shenwei.wang@nxp.com> Link: https://lore.kernel.org/r/20240123165141.2008104-1-shenwei.wang@nxp.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25accel/ivpu: Improve recovery and reset supportJacek Lawrynowicz8-48/+70
- Synchronize job submission with reset/recovery using reset_lock - Always print recovery reason and call diagnose_failure() - Don't allow for autosupend during recovery - Prevent immediate autosuspend after reset/recovery - Prevent force_recovery for issuing TDR when device is suspended - Reset VPU instead triggering recovery after changing debugfs params Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Wachowski, Karol <karol.wachowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240122120945.1150728-4-jacek.lawrynowicz@linux.intel.com
2024-01-25accel/ivpu: Improve stability of ivpu_submit_ioctl()Jacek Lawrynowicz2-78/+62
- Wake up the device as late as possible - Remove job reference counting in order to simplify the code - Don't put jobs that are not fully submitted on submitted_jobs_xa in order to avoid potential races with reset/recovery Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Wachowski, Karol <karol.wachowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240122120945.1150728-3-jacek.lawrynowicz@linux.intel.com
2024-01-25accel/ivpu: Fix dev open/close races with unbindJacek Lawrynowicz6-65/+85
- Add context_list_lock to synchronize user context addition/removal - Use drm_dev_enter() to prevent unbinding the device during ivpu_open() and vpu address allocation Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com> Reviewed-by: Wachowski, Karol <karol.wachowski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240122120945.1150728-2-jacek.lawrynowicz@linux.intel.com
2024-01-25tick/sched: Preserve number of idle sleeps across CPU hotplug eventsTim Chen1-0/+5
Commit 71fee48f ("tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug") preserved total idle sleep time and iowait sleeptime across CPU hotplug events. Similar reasoning applies to the number of idle calls and idle sleeps to get the proper average of sleep time per idle invocation. Preserve those fields too. Fixes: 71fee48f ("tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug") Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240122233534.3094238-1-tim.c.chen@linux.intel.com
2024-01-25selftests: bonding: do not test arp/ns target with mode balance-alb/tlbHangbin Liu1-4/+4
The prio_arp/ns tests hard code the mode to active-backup. At the same time, The balance-alb/tlb modes do not support arp/ns target. So remove the prio_arp/ns tests from the loop and only test active-backup mode. Fixes: 481b56e0391e ("selftests: bonding: re-format bond option tests") Reported-by: Jay Vosburgh <jay.vosburgh@canonical.com> Closes: https://lore.kernel.org/netdev/17415.1705965957@famine/ Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Link: https://lore.kernel.org/r/20240123075917.1576360-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-25drm/i915/psr: Only allow PSR in LPSP mode on HSW non-ULTVille Syrjälä1-2/+12
On HSW non-ULT (or at least on Dell Latitude E6540) external displays start to flicker when we enable PSR on the eDP. We observe a much higher SR and PC6 residency than should be possible with an external display, and indeen much higher than what we observe with eDP disabled and only the external display enabled. Looks like the hardware is somehow ignoring the fact that the external display is active during PSR. I wasn't able to redproduce this on my HSW ULT machine, or BDW. So either there's something specific about this particular laptop (eg. some unknown firmware thing) or the issue is limited to just non-ULT HSW systems. All known registers that could affect this look perfectly reasonable on the affected machine. As a workaround let's unmask the LPSP event to prevent PSR entry except while in LPSP mode (only pipe A + eDP active). This will prevent PSR entry entirely when multiple pipes are active. The one slight downside is that we now also prevent PSR entry when driving eDP with pipe B or C, but I think that's a reasonable tradeoff to avoid having to implement a more complex workaround. Cc: stable@vger.kernel.org Fixes: 783d8b80871f ("drm/i915/psr: Re-enable PSR1 on hsw/bdw") Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10092 Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240118212131.31868-1-ville.syrjala@linux.intel.com Reviewed-by: Jouni Högander <jouni.hogander@intel.com> (cherry picked from commit 94501c3ca6400e463ff6cc0c9cf4a2feb6a9205d) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
2024-01-25clocksource: Skip watchdog check for large watchdog intervalsJiri Wiesner1-1/+24
There have been reports of the watchdog marking clocksources unstable on machines with 8 NUMA nodes: clocksource: timekeeping watchdog on CPU373: Marking clocksource 'tsc' as unstable because the skew is too large: clocksource: 'hpet' wd_nsec: 14523447520 clocksource: 'tsc' cs_nsec: 14524115132 The measured clocksource skew - the absolute difference between cs_nsec and wd_nsec - was 668 microseconds: cs_nsec - wd_nsec = 14524115132 - 14523447520 = 667612 The kernel used 200 microseconds for the uncertainty_margin of both the clocksource and watchdog, resulting in a threshold of 400 microseconds (the md variable). Both the cs_nsec and the wd_nsec value indicate that the readout interval was circa 14.5 seconds. The observed behaviour is that watchdog checks failed for large readout intervals on 8 NUMA node machines. This indicates that the size of the skew was directly proportinal to the length of the readout interval on those machines. The measured clocksource skew, 668 microseconds, was evaluated against a threshold (the md variable) that is suited for readout intervals of roughly WATCHDOG_INTERVAL, i.e. HZ >> 1, which is 0.5 second. The intention of 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold") was to tighten the threshold for evaluating skew and set the lower bound for the uncertainty_margin of clocksources to twice WATCHDOG_MAX_SKEW. Later in c37e85c135ce ("clocksource: Loosen clocksource watchdog constraints"), the WATCHDOG_MAX_SKEW constant was increased to 125 microseconds to fit the limit of NTP, which is able to use a clocksource that suffers from up to 500 microseconds of skew per second. Both the TSC and the HPET use default uncertainty_margin. When the readout interval gets stretched the default uncertainty_margin is no longer a suitable lower bound for evaluating skew - it imposes a limit that is far stricter than the skew with which NTP can deal. The root causes of the skew being directly proportinal to the length of the readout interval are: * the inaccuracy of the shift/mult pairs of clocksources and the watchdog * the conversion to nanoseconds is imprecise for large readout intervals Prevent this by skipping the current watchdog check if the readout interval exceeds 2 * WATCHDOG_INTERVAL. Considering the maximum readout interval of 2 * WATCHDOG_INTERVAL, the current default uncertainty margin (of the TSC and HPET) corresponds to a limit on clocksource skew of 250 ppm (microseconds of skew per second). To keep the limit imposed by NTP (500 microseconds of skew per second) for all possible readout intervals, the margins would have to be scaled so that the threshold value is proportional to the length of the actual readout interval. As for why the readout interval may get stretched: Since the watchdog is executed in softirq context the expiration of the watchdog timer can get severely delayed on account of a ksoftirqd thread not getting to run in a timely manner. Surely, a system with such belated softirq execution is not working well and the scheduling issue should be looked into but the clocksource watchdog should be able to deal with it accordingly. Fixes: 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold") Suggested-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Jiri Wiesner <jwiesner@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Feng Tang <feng.tang@intel.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240122172350.GA740@incl
2024-01-24md: fix a suspicious RCU usage warningMikulas Patocka1-1/+1
RCU protection was removed in the commit 2d32777d60de ("raid1: remove rcu protection to access rdev from conf"). However, the code in fix_read_error does rcu_dereference outside rcu_read_lock - this triggers the following warning. The warning is triggered by a LVM2 test shell/integrity-caching.sh. This commit removes rcu_dereference. ============================= WARNING: suspicious RCU usage 6.7.0 #2 Not tainted ----------------------------- drivers/md/raid1.c:2265 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 no locks held by mdX_raid1/1859. stack backtrace: CPU: 2 PID: 1859 Comm: mdX_raid1 Not tainted 6.7.0 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.12.0-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x60/0x70 lockdep_rcu_suspicious+0x153/0x1b0 raid1d+0x1732/0x1750 [raid1] ? lock_acquire+0x9f/0x270 ? finish_wait+0x3d/0x80 ? md_thread+0xf7/0x130 [md_mod] ? lock_release+0xaa/0x230 ? md_register_thread+0xd0/0xd0 [md_mod] md_thread+0xa0/0x130 [md_mod] ? housekeeping_test_cpu+0x30/0x30 kthread+0xdc/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x28/0x40 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork_asm+0x11/0x20 </TASK> Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Fixes: ca294b34aaf3 ("md/raid1: support read error check") Reviewed-by: Yu Kuai <yukuai3@huawei.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/51539879-e1ca-fde3-b8b4-8934ddedcbc@redhat.com
2024-01-25ksmbd: fix global oob in ksmbd_nl_policyLin Ma2-3/+4
Similar to a reported issue (check the commit b33fb5b801c6 ("net: qualcomm: rmnet: fix global oob in rmnet_policy"), my local fuzzer finds another global out-of-bounds read for policy ksmbd_nl_policy. See bug trace below: ================================================================== BUG: KASAN: global-out-of-bounds in validate_nla lib/nlattr.c:386 [inline] BUG: KASAN: global-out-of-bounds in __nla_validate_parse+0x24af/0x2750 lib/nlattr.c:600 Read of size 1 at addr ffffffff8f24b100 by task syz-executor.1/62810 CPU: 0 PID: 62810 Comm: syz-executor.1 Tainted: G N 6.1.0 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x8b/0xb3 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:284 [inline] print_report+0x172/0x475 mm/kasan/report.c:395 kasan_report+0xbb/0x1c0 mm/kasan/report.c:495 validate_nla lib/nlattr.c:386 [inline] __nla_validate_parse+0x24af/0x2750 lib/nlattr.c:600 __nla_parse+0x3e/0x50 lib/nlattr.c:697 __nlmsg_parse include/net/netlink.h:748 [inline] genl_family_rcv_msg_attrs_parse.constprop.0+0x1b0/0x290 net/netlink/genetlink.c:565 genl_family_rcv_msg_doit+0xda/0x330 net/netlink/genetlink.c:734 genl_family_rcv_msg net/netlink/genetlink.c:833 [inline] genl_rcv_msg+0x441/0x780 net/netlink/genetlink.c:850 netlink_rcv_skb+0x14f/0x410 net/netlink/af_netlink.c:2540 genl_rcv+0x24/0x40 net/netlink/genetlink.c:861 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x54e/0x800 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x930/0xe50 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0x154/0x190 net/socket.c:734 ____sys_sendmsg+0x6df/0x840 net/socket.c:2482 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536 __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fdd66a8f359 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fdd65e00168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fdd66bbcf80 RCX: 00007fdd66a8f359 RDX: 0000000000000000 RSI: 0000000020000500 RDI: 0000000000000003 RBP: 00007fdd66ada493 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc84b81aff R14: 00007fdd65e00300 R15: 0000000000022000 </TASK> The buggy address belongs to the variable: ksmbd_nl_policy+0x100/0xa80 The buggy address belongs to the physical page: page:0000000034f47940 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1ccc4b flags: 0x200000000001000(reserved|node=0|zone=2) raw: 0200000000001000 ffffea00073312c8 ffffea00073312c8 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffffffff8f24b000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffff8f24b080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >ffffffff8f24b100: f9 f9 f9 f9 00 00 f9 f9 f9 f9 f9 f9 00 00 07 f9 ^ ffffffff8f24b180: f9 f9 f9 f9 00 05 f9 f9 f9 f9 f9 f9 00 00 00 05 ffffffff8f24b200: f9 f9 f9 f9 00 00 03 f9 f9 f9 f9 f9 00 00 04 f9 ================================================================== To fix it, add a placeholder named __KSMBD_EVENT_MAX and let KSMBD_EVENT_MAX to be its original value - 1 according to what other netlink families do. Also change two sites that refer the KSMBD_EVENT_MAX to correct value. Cc: stable@vger.kernel.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Signed-off-by: Lin Ma <linma@zju.edu.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2024-01-24Merge tag 'nf-24-01-24' of ↵Jakub Kicinski12-31/+121
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) Update nf_tables kdoc to keep it in sync with the code, from George Guo. 2) Handle NETDEV_UNREGISTER event for inet/ingress basechain. 3) Reject configuration that cause nft_limit to overflow, from Florian Westphal. 4) Restrict anonymous set/map names to 16 bytes, from Florian Westphal. 5) Disallow to encode queue number and error in verdicts. This reverts a patch which seems to have introduced an early attempt to support for nfqueue maps, which is these days supported via nft_queue expression. 6) Sanitize family via .validate for expressions that explicitly refer to NF_INET_* hooks. * tag 'nf-24-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_tables: validate NFPROTO_* family netfilter: nf_tables: reject QUEUE/DROP verdict parameters netfilter: nf_tables: restrict anonymous set and map names to 16 bytes netfilter: nft_limit: reject configurations that cause integer overflow netfilter: nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain netfilter: nf_tables: cleanup documentation ==================== Link: https://lore.kernel.org/r/20240124191248.75463-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-24cxl/region:Fix overflow issue in alloc_hpa()Quanquan Cao1-2/+2
Creating a region with 16 memory devices caused a problem. The div_u64_rem function, used for dividing an unsigned 64-bit number by a 32-bit one, faced an issue when SZ_256M * p->interleave_ways. The result surpassed the maximum limit of the 32-bit divisor (4G), leading to an overflow and a remainder of 0. note: At this point, p->interleave_ways is 16, meaning 16 * 256M = 4G To fix this issue, I replaced the div_u64_rem function with div64_u64_rem and adjusted the type of the remainder. Signed-off-by: Quanquan Cao <caoqq@fujitsu.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Fixes: 23a22cd1c98b ("cxl/region: Allocate HPA capacity to regions") Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2024-01-25Merge tag 'exynos-drm-fixes-for-v6.8-rc2' of ↵Dave Airlie4-9/+9
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes Several fixups - Minor fix in `drm/exynos: gsc: gsc_runtime_resume` . The patch ensures `clk_disable_unprepare()` is called on the first element of `ctx->clocks` array. This issue was identified by the Linux Verification Center. - Fix excessive stack usage in `fimd_win_set_pixfmt()` in `drm/exynos` . The issue, highlighted by gcc, involved an unnecessary on-stack copy of the large `exynos_drm_plane` structure, now replaced with a pointer. - Fix an incorrect type issue in `exynos_drm_fimd.c` module . Addresses an incorrect type issue in `fimd_commit()` within the `exynos_drm_fimd.c` The problem was reported by the kernel test robot[1]. [1] https://lore.kernel.org/oe-kbuild-all/202312140930.Me9yWf8F-lkp@intel.com/ - Fix a typo in the dt-bindings for `samsung,exynos-mixer` . Changes 'regs' to the correct property name 'reg' in the dt-bindings documentation for `samsung,exynos-mixer` Signed-off-by: Dave Airlie <airlied@redhat.com> From: Inki Dae <inki.dae@samsung.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240122072407.39546-1-inki.dae@samsung.com
2024-01-25Merge tag 'drm-misc-next-fixes-2024-01-19' of ↵Dave Airlie2-9/+35
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes A null pointer dereference fix for v3d and a protection fault fix for ttm. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/5zrphn2nhxnwillxlmo6ap3zh7qjt3jgydlm5sntuc4fzvwhpo@hznprx2bjyi7
2024-01-25Merge tag 'drm-intel-next-fixes-2024-01-19' of ↵Dave Airlie2-3/+1
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes - DSI sequence revert to fix GitLab #10071 and DP test-pattern fix - Drop -Wstringop-overflow (broken on GCC11) Signed-off-by: Dave Airlie <airlied@redhat.com> From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/ZaozNnAGhu6Ec6cb@jlahtine-mobl.ger.corp.intel.com
2024-01-24fjes: fix memleaks in fjes_hw_setupZhipeng Lu1-7/+30
In fjes_hw_setup, it allocates several memory and delay the deallocation to the fjes_hw_exit in fjes_probe through the following call chain: fjes_probe |-> fjes_hw_init |-> fjes_hw_setup |-> fjes_hw_exit However, when fjes_hw_setup fails, fjes_hw_exit won't be called and thus all the resources allocated in fjes_hw_setup will be leaked. In this patch, we free those resources in fjes_hw_setup and prevents such leaks. Fixes: 2fcbca687702 ("fjes: platform_driver's .probe and .remove routine") Signed-off-by: Zhipeng Lu <alexious@zju.edu.cn> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20240122172445.3841883-1-alexious@zju.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-24Merge tag 'ceph-for-6.8-rc2' of https://github.com/ceph/ceph-clientLinus Torvalds1-13/+19
Pull ceph fixes from Ilya Dryomov: "A fix to avoid triggering an assert in some cases where RBD exclusive mappings are involved and a deprecated API cleanup" * tag 'ceph-for-6.8-rc2' of https://github.com/ceph/ceph-client: rbd: don't move requests to the running list on errors rbd: remove usage of the deprecated ida_simple_*() API
2024-01-24Merge tag 'integrity-v6.8-rc1' of ↵Linus Torvalds1-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity fix from Mimi Zohar: "Revert patch that required user-provided key data, since keys can be created from kernel-generated random numbers" * tag 'integrity-v6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: Revert "KEYS: encrypted: Add check for strsep"
2024-01-24Merge branch 'net-bpf_xdp_adjust_tail-and-intel-mbuf-fixes'Alexei Starovoitov12-89/+187
Maciej Fijalkowski says: ==================== net: bpf_xdp_adjust_tail() and Intel mbuf fixes Hey, after a break followed by dealing with sickness, here is a v6 that makes bpf_xdp_adjust_tail() actually usable for ZC drivers that support XDP multi-buffer. Since v4 I tried also using bpf_xdp_adjust_tail() with positive offset which exposed yet another issues, which can be observed by increased commit count when compared to v3. John, in the end I think we should remove handling MEM_TYPE_XSK_BUFF_POOL from __xdp_return(), but it is out of the scope for fixes set, IMHO. Thanks, Maciej v6: - add acks [Magnus] - fix spelling mistakes [Magnus] - avoid touching xdp_buff in xp_alloc_{reused,new_from_fq}() [Magnus] - s/shrink_data/bpf_xdp_shrink_data [Jakub] - remove __shrink_data() [Jakub] - check retvals from __xdp_rxq_info_reg() [Magnus] v5: - pick correct version of patch 5 [Simon] - elaborate a bit more on what patch 2 fixes v4: - do not clear frags flag when deleting tail; xsk_buff_pool now does that - skip some NULL tests for xsk_buff_get_tail [Martin, John] - address problems around registering xdp_rxq_info - fix bpf_xdp_frags_increase_tail() for ZC mbuf v3: - add acks - s/xsk_buff_tail_del/xsk_buff_del_tail - address i40e as well (thanks Tirthendu) v2: - fix !CONFIG_XDP_SOCKETS builds - add reviewed-by tag to patch 3 ==================== Link: https://lore.kernel.org/r/20240124191602.566724-1-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queueMaciej Fijalkowski1-0/+7
Now that i40e driver correctly sets up frag_size in xdp_rxq_info, let us make it work for ZC multi-buffer as well. i40e_ring::rx_buf_len for ZC is being set via xsk_pool_get_rx_frame_size() and this needs to be propagated up to xdp_rxq_info. Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support") Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-12-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24i40e: set xdp_rxq_info::frag_sizeMaciej Fijalkowski2-25/+24
i40e support XDP multi-buffer so it is supposed to use __xdp_rxq_info_reg() instead of xdp_rxq_info_reg() and set the frag_size. It can not be simply converted at existing callsite because rx_buf_len could be un-initialized, so let us register xdp_rxq_info within i40e_configure_rx_ring(), which happen to be called with already initialized rx_buf_len value. Commit 5180ff1364bc ("i40e: use int for i40e_status") converted 'err' to int, so two variables to deal with return codes are not needed within i40e_configure_rx_ring(). Remove 'ret' and use 'err' to handle status from xdp_rxq_info registration. Fixes: e213ced19bef ("i40e: add support for XDP multi-buffer Rx") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-11-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOLMaciej Fijalkowski1-0/+2
XSK ZC Rx path calculates the size of data that will be posted to XSK Rx queue via subtracting xdp_buff::data_end from xdp_buff::data. In bpf_xdp_frags_increase_tail(), when underlying memory type of xdp_rxq_info is MEM_TYPE_XSK_BUFF_POOL, add offset to data_end in tail fragment, so that later on user space will be able to take into account the amount of bytes added by XDP program. Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-10-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24ice: update xdp_rxq_info::frag_size for ZC enabled Rx queueMaciej Fijalkowski1-14/+23
Now that ice driver correctly sets up frag_size in xdp_rxq_info, let us make it work for ZC multi-buffer as well. ice_rx_ring::rx_buf_len for ZC is being set via xsk_pool_get_rx_frame_size() and this needs to be propagated up to xdp_rxq_info. Use a bigger hammer and instead of unregistering only xdp_rxq_info's memory model, unregister it altogether and register it again and have xdp_rxq_info with correct frag_size value. Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-9-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24intel: xsk: initialize skb_frag_t::bv_offset in ZC driversMaciej Fijalkowski2-2/+4
Ice and i40e ZC drivers currently set offset of a frag within skb_shared_info to 0, which is incorrect. xdp_buffs that come from xsk_buff_pool always have 256 bytes of a headroom, so they need to be taken into account to retrieve xdp_buff::data via skb_frag_address(). Otherwise, bpf_xdp_frags_increase_tail() would be starting its job from xdp_buff::data_hard_start which would result in overwriting existing payload. Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support") Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support") Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-8-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24ice: remove redundant xdp_rxq_info registrationMaciej Fijalkowski1-5/+0
xdp_rxq_info struct can be registered by drivers via two functions - xdp_rxq_info_reg() and __xdp_rxq_info_reg(). The latter one allows drivers that support XDP multi-buffer to set up xdp_rxq_info::frag_size which in turn will make it possible to grow the packet via bpf_xdp_adjust_tail() BPF helper. Currently, ice registers xdp_rxq_info in two spots: 1) ice_setup_rx_ring() // via xdp_rxq_info_reg(), BUG 2) ice_vsi_cfg_rxq() // via __xdp_rxq_info_reg(), OK Cited commit under fixes tag took care of setting up frag_size and updated registration scheme in 2) but it did not help as 1) is called before 2) and as shown above it uses old registration function. This means that 2) sees that xdp_rxq_info is already registered and never calls __xdp_rxq_info_reg() which leaves us with xdp_rxq_info::frag_size being set to 0. To fix this misbehavior, simply remove xdp_rxq_info_reg() call from ice_setup_rx_ring(). Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-7-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24i40e: handle multi-buffer packets that are shrunk by xdp progTirthendu Sarkar1-17/+23
XDP programs can shrink packets by calling the bpf_xdp_adjust_tail() helper function. For multi-buffer packets this may lead to reduction of frag count stored in skb_shared_info area of the xdp_buff struct. This results in issues with the current handling of XDP_PASS and XDP_DROP cases. For XDP_PASS, currently skb is being built using frag count of xdp_buffer before it was processed by XDP prog and thus will result in an inconsistent skb when frag count gets reduced by XDP prog. To fix this, get correct frag count while building the skb instead of using pre-obtained frag count. For XDP_DROP, current page recycling logic will not reuse the page but instead will adjust the pagecnt_bias so that the page can be freed. This again results in inconsistent behavior as the page refcnt has already been changed by the helper while freeing the frag(s) as part of shrinking the packet. To fix this, only adjust pagecnt_bias for buffers that are stillpart of the packet post-xdp prog run. Fixes: e213ced19bef ("i40e: add support for XDP multi-buffer Rx") Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Tirthendu Sarkar <tirthendu.sarkar@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-6-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24ice: work on pre-XDP prog frag countMaciej Fijalkowski3-14/+32
Fix an OOM panic in XDP_DRV mode when a XDP program shrinks a multi-buffer packet by 4k bytes and then redirects it to an AF_XDP socket. Since support for handling multi-buffer frames was added to XDP, usage of bpf_xdp_adjust_tail() helper within XDP program can free the page that given fragment occupies and in turn decrease the fragment count within skb_shared_info that is embedded in xdp_buff struct. In current ice driver codebase, it can become problematic when page recycling logic decides not to reuse the page. In such case, __page_frag_cache_drain() is used with ice_rx_buf::pagecnt_bias that was not adjusted after refcount of page was changed by XDP prog which in turn does not drain the refcount to 0 and page is never freed. To address this, let us store the count of frags before the XDP program was executed on Rx ring struct. This will be used to compare with current frag count from skb_shared_info embedded in xdp_buff. A smaller value in the latter indicates that XDP prog freed frag(s). Then, for given delta decrement pagecnt_bias for XDP_DROP verdict. While at it, let us also handle the EOP frag within ice_set_rx_bufs_act() to make our life easier, so all of the adjustments needed to be applied against freed frags are performed in the single place. Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-5-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24xsk: fix usage of multi-buffer BPF helpers for ZC XDPMaciej Fijalkowski2-6/+62
Currently when packet is shrunk via bpf_xdp_adjust_tail() and memory type is set to MEM_TYPE_XSK_BUFF_POOL, null ptr dereference happens: [1136314.192256] BUG: kernel NULL pointer dereference, address: 0000000000000034 [1136314.203943] #PF: supervisor read access in kernel mode [1136314.213768] #PF: error_code(0x0000) - not-present page [1136314.223550] PGD 0 P4D 0 [1136314.230684] Oops: 0000 [#1] PREEMPT SMP NOPTI [1136314.239621] CPU: 8 PID: 54203 Comm: xdpsock Not tainted 6.6.0+ #257 [1136314.250469] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [1136314.265615] RIP: 0010:__xdp_return+0x6c/0x210 [1136314.274653] Code: ad 00 48 8b 47 08 49 89 f8 a8 01 0f 85 9b 01 00 00 0f 1f 44 00 00 f0 41 ff 48 34 75 32 4c 89 c7 e9 79 cd 80 ff 83 fe 03 75 17 <f6> 41 34 01 0f 85 02 01 00 00 48 89 cf e9 22 cc 1e 00 e9 3d d2 86 [1136314.302907] RSP: 0018:ffffc900089f8db0 EFLAGS: 00010246 [1136314.312967] RAX: ffffc9003168aed0 RBX: ffff8881c3300000 RCX: 0000000000000000 [1136314.324953] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffc9003168c000 [1136314.336929] RBP: 0000000000000ae0 R08: 0000000000000002 R09: 0000000000010000 [1136314.348844] R10: ffffc9000e495000 R11: 0000000000000040 R12: 0000000000000001 [1136314.360706] R13: 0000000000000524 R14: ffffc9003168aec0 R15: 0000000000000001 [1136314.373298] FS: 00007f8df8bbcb80(0000) GS:ffff8897e0e00000(0000) knlGS:0000000000000000 [1136314.386105] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1136314.396532] CR2: 0000000000000034 CR3: 00000001aa912002 CR4: 00000000007706f0 [1136314.408377] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1136314.420173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1136314.431890] PKRU: 55555554 [1136314.439143] Call Trace: [1136314.446058] <IRQ> [1136314.452465] ? __die+0x20/0x70 [1136314.459881] ? page_fault_oops+0x15b/0x440 [1136314.468305] ? exc_page_fault+0x6a/0x150 [1136314.476491] ? asm_exc_page_fault+0x22/0x30 [1136314.484927] ? __xdp_return+0x6c/0x210 [1136314.492863] bpf_xdp_adjust_tail+0x155/0x1d0 [1136314.501269] bpf_prog_ccc47ae29d3b6570_xdp_sock_prog+0x15/0x60 [1136314.511263] ice_clean_rx_irq_zc+0x206/0xc60 [ice] [1136314.520222] ? ice_xmit_zc+0x6e/0x150 [ice] [1136314.528506] ice_napi_poll+0x467/0x670 [ice] [1136314.536858] ? ttwu_do_activate.constprop.0+0x8f/0x1a0 [1136314.546010] __napi_poll+0x29/0x1b0 [1136314.553462] net_rx_action+0x133/0x270 [1136314.561619] __do_softirq+0xbe/0x28e [1136314.569303] do_softirq+0x3f/0x60 This comes from __xdp_return() call with xdp_buff argument passed as NULL which is supposed to be consumed by xsk_buff_free() call. To address this properly, in ZC case, a node that represents the frag being removed has to be pulled out of xskb_list. Introduce appropriate xsk helpers to do such node operation and use them accordingly within bpf_xdp_adjust_tail(). Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> # For the xsk header part Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-4-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24xsk: make xsk_buff_pool responsible for clearing xdp_buff::flagsMaciej Fijalkowski4-2/+2
XDP multi-buffer support introduced XDP_FLAGS_HAS_FRAGS flag that is used by drivers to notify data path whether xdp_buff contains fragments or not. Data path looks up mentioned flag on first buffer that occupies the linear part of xdp_buff, so drivers only modify it there. This is sufficient for SKB and XDP_DRV modes as usually xdp_buff is allocated on stack or it resides within struct representing driver's queue and fragments are carried via skb_frag_t structs. IOW, we are dealing with only one xdp_buff. ZC mode though relies on list of xdp_buff structs that is carried via xsk_buff_pool::xskb_list, so ZC data path has to make sure that fragments do *not* have XDP_FLAGS_HAS_FRAGS set. Otherwise, xsk_buff_free() could misbehave if it would be executed against xdp_buff that carries a frag with XDP_FLAGS_HAS_FRAGS flag set. Such scenario can take place when within supplied XDP program bpf_xdp_adjust_tail() is used with negative offset that would in turn release the tail fragment from multi-buffer frame. Calling xsk_buff_free() on tail fragment with XDP_FLAGS_HAS_FRAGS would result in releasing all the nodes from xskb_list that were produced by driver before XDP program execution, which is not what is intended - only tail fragment should be deleted from xskb_list and then it should be put onto xsk_buff_pool::free_list. Such multi-buffer frame will never make it up to user space, so from AF_XDP application POV there would be no traffic running, however due to free_list getting constantly new nodes, driver will be able to feed HW Rx queue with recycled buffers. Bottom line is that instead of traffic being redirected to user space, it would be continuously dropped. To fix this, let us clear the mentioned flag on xsk_buff_pool side during xdp_buff initialization, which is what should have been done right from the start of XSK multi-buffer support. Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support") Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support") Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-3-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24xsk: recycle buffer in case Rx queue was fullMaciej Fijalkowski1-4/+8
Add missing xsk_buff_free() call when __xsk_rcv_zc() failed to produce descriptor to XSK Rx queue. Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20240124191602.566724-2-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-01-24Merge branch 'fix-module_description-for-net-p2'Jakub Kicinski19-0/+19
Breno Leitao says: ==================== Fix MODULE_DESCRIPTION() for net (p2) There are hundreds of network modules that misses MODULE_DESCRIPTION(), causing a warnning when compiling with W=1. Example: WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com90io.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/arc-rimi.o WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com20020.o This part2 of the patchset focus on the drivers/net/ethernet drivers. There are still some missing warnings in drivers/net/ethernet that will be fixed in an upcoming patchset. v1: https://lore.kernel.org/all/20240122184543.2501493-2-leitao@debian.org/ ==================== Link: https://lore.kernel.org/r/20240123190332.677489-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-24net: fill in MODULE_DESCRIPTION()s for rvu_mboxBreno Leitao1-0/+1
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Marvel RVU mbox driver. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240123190332.677489-11-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-24net: fill in MODULE_DESCRIPTION()s for litexBreno Leitao1-0/+1
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the LiteX Liteeth Ethernet device. Signed-off-by: Breno Leitao <leitao@debian.org> Acked-by: Gabriel Somlo <gsomlo@gmail.com> Link: https://lore.kernel.org/r/20240123190332.677489-10-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-24net: fill in MODULE_DESCRIPTION()s for fsl_pq_mdioBreno Leitao1-0/+1
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the Freescale PQ MDIO driver. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240123190332.677489-9-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-01-24net: fill in MODULE_DESCRIPTION()s for fecBreno Leitao1-0/+1
W=1 builds now warn if module is built without a MODULE_DESCRIPTION(). Add descriptions to the FEC (MPC8xx) Ethernet controller. Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://lore.kernel.org/r/20240123190332.677489-8-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>