aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-05-29drm/i915/gt: Disarm breadcrumbs if engines are already idleChris Wilson1-8/+7
The breadcrumbs use a GT wakeref for guarding the interrupt, but are disarmed during release of the engine wakeref. This leaves a hole where we may attach a breadcrumb just as the engine is parking (after it has parked its breadcrumbs), execute the irq worker with some signalers still attached, but never be woken again. That issue manifests itself in CI with IGT runner timeouts while tests are waiting indefinitely for release of all GT wakerefs. <6> [209.151778] i915: Running live_engine_pm_selftests/live_engine_busy_stats <7> [209.231628] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_5 <7> [209.231816] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_4 <7> [209.231944] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_3 <7> [209.232056] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_2 <7> [209.232166] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling DC_off <7> [209.232270] i915 0000:00:02.0: [drm:skl_enable_dc6 [i915]] Enabling DC6 <7> [209.232368] i915 0000:00:02.0: [drm:gen9_set_dc_state.part.0 [i915]] Setting DC state from 00 to 02 <4> [299.356116] [IGT] Inactivity timeout exceeded. Killing the current test with SIGQUIT. ... <6> [299.356526] sysrq: Show State ... <6> [299.373964] task:i915_selftest state:D stack:11784 pid:5578 tgid:5578 ppid:873 flags:0x00004002 <6> [299.373967] Call Trace: <6> [299.373968] <TASK> <6> [299.373970] __schedule+0x3bb/0xda0 <6> [299.373974] schedule+0x41/0x110 <6> [299.373976] intel_wakeref_wait_for_idle+0x82/0x100 [i915] <6> [299.374083] ? __pfx_var_wake_function+0x10/0x10 <6> [299.374087] live_engine_busy_stats+0x9b/0x500 [i915] <6> [299.374173] __i915_subtests+0xbe/0x240 [i915] <6> [299.374277] ? __pfx___intel_gt_live_setup+0x10/0x10 [i915] <6> [299.374369] ? __pfx___intel_gt_live_teardown+0x10/0x10 [i915] <6> [299.374456] intel_engine_live_selftests+0x1c/0x30 [i915] <6> [299.374547] __run_selftests+0xbb/0x190 [i915] <6> [299.374635] i915_live_selftests+0x4b/0x90 [i915] <6> [299.374717] i915_pci_probe+0x10d/0x210 [i915] At the end of the interrupt worker, if there are no more engines awake, disarm the breadcrumb and go to sleep. Fixes: 9d5612ca165a ("drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission") Closes: https://gitlab.freedesktop.org/drm/intel/issues/10026 Signed-off-by: Chris Wilson <[email protected]> Cc: Andrzej Hajda <[email protected]> Cc: <[email protected]> # v5.12+ Signed-off-by: Janusz Krzysztofik <[email protected]> Acked-by: Nirmoy Das <[email protected]> Reviewed-by: Andrzej Hajda <[email protected]> Reviewed-by: Andi Shyti <[email protected]> Signed-off-by: Andi Shyti <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit fbad43eccae5cb14594195c20113369aabaa22b5) Signed-off-by: Jani Nikula <[email protected]>
2024-05-29Revert "drm/i915: Remove extra multi-gt pm-references"Janusz Krzysztofik1-0/+18
This reverts commit 1f33dc0c1189efb9ae19c6fc22b64dd3e26261fb. There was a patch supposed to fix an issue of illegal attempts to free a still active i915 VMA object when parking a GT believed to be idle, reported by CI on 2-GT Meteor Lake. As a solution, an extra wakeref for a Primary GT was acquired from i915_gem_do_execbuffer() -- see commit f56fe3e91787 ("drm/i915: Fix a VMA UAF for multi-gt platform"). However, that fix occurred insufficient -- the issue was still reported by CI. That wakeref was released on exit from i915_gem_do_execbuffer(), then potentially before completion of the request and deactivation of its associated VMAs. Moreover, CI reports indicated that single-GT platforms also suffered sporadically from the same race. Since that issue was fixed by another commit f3c71b2ded5c ("drm/i915/vma: Fix UAF on destroy against retire race"), the changes introduced by that insufficient fix were dropped as no longer useful. However, that series resulted in another VMA UAF scenario now being triggered in CI. <4> [260.290809] ------------[ cut here ]------------ <4> [260.290988] list_del corruption. prev->next should be ffff888118c5d990, but was ffff888118c5a510. (prev=ffff888118c5a510) <4> [260.291004] WARNING: CPU: 2 PID: 1143 at lib/list_debug.c:62 __list_del_entry_valid_or_report+0xb7/0xe0 .. <4> [260.291055] CPU: 2 PID: 1143 Comm: kms_plane Not tainted 6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1 <4> [260.291058] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024 <4> [260.291060] RIP: 0010:__list_del_entry_valid_or_report+0xb7/0xe0 ... <4> [260.291087] Call Trace: <4> [260.291089] <TASK> <4> [260.291124] i915_vma_reopen+0x43/0x80 [i915] <4> [260.291298] eb_lookup_vmas+0x9cb/0xcc0 [i915] <4> [260.291579] i915_gem_do_execbuffer+0xc9a/0x26d0 [i915] <4> [260.291883] i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915] ... <4> [260.292301] </TASK> ... <4> [260.292506] ---[ end trace 0000000000000000 ]--- <4> [260.292782] general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6ca3: 0000 [#1] PREEMPT SMP NOPTI <4> [260.303575] CPU: 2 PID: 1143 Comm: kms_plane Tainted: G W 6.9.0-rc2-CI_DRM_14524-ga25d180c6853+ #1 <4> [260.313851] Hardware name: Intel Corporation Meteor Lake Client Platform/MTL-P LP5x T3 RVP, BIOS MTLPFWI1.R00.3471.D91.2401310918 01/31/2024 <4> [260.326359] RIP: 0010:eb_validate_vmas+0x114/0xd80 [i915] ... <4> [260.428756] Call Trace: <4> [260.431192] <TASK> <4> [639.283393] i915_gem_do_execbuffer+0xd05/0x26d0 [i915] <4> [639.305245] i915_gem_execbuffer2_ioctl+0x123/0x2a0 [i915] ... <4> [639.411134] </TASK> ... <4> [639.449979] ---[ end trace 0000000000000000 ]--- We defer actually closing, unbinding and destroying a VMA until next idle point, or until the object is freed in the meantime. By postponing the unbind, we allow for the VMA to be reopened by the client, avoiding the work required to rebind the VMA. Starting from commit b0647a5e79b1 ("drm/i915: Avoid live-lock with i915_vma_parked()"), we assume that as long as a GT is held idle, no VMA would be reopened while we destroy them. That assumption is no longer true in multi-GT configurations, where a VMA we reopen may be handled by a GT different from the one that we already keep active via its engine while we set up an execbuf request. Restoring the extra GT0 PM wakeref removed from i915_gem_do_execbuffer() processing path seems to fix this issue. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10608 Signed-off-by: Janusz Krzysztofik <[email protected]> Cc: Rodrigo Vivi <[email protected]> Cc: Nirmoy Das <[email protected]> Reviewed-by: Nirmoy Das <[email protected]> Fixes: 1f33dc0c1189 ("drm/i915: Remove extra multi-gt pm-references") Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> (cherry picked from commit 749670a58d935303ad1ce529acc73f12de25832e) Signed-off-by: Jani Nikula <[email protected]>
2024-05-29scripts/make_fit: Drop fdt image entry compatible stringChen-Yu Tsai1-2/+1
According to the FIT image source file format document found in U-boot [1] and the split-out FIT image specification [2], under "'/images' node" -> "Conditionally mandatory property", the "compatible" property is described as "compatible method for loading image", i.e., not the compatible string embedded in the FDT or used for matching. Drop the compatible string from the fdt image entry node. While at it also fix up a typo in the document section of output_dtb. [1] U-boot source "doc/usage/fit/source_file_format.rst", or on the website: https://docs.u-boot.org/en/latest/usage/fit/source_file_format.html [2] https://github.com/open-source-firmware/flat-image-tree/blob/main/source/chapter2-source-file-format.rst Fixes: 7a23b027ec17 ("arm64: boot: Support Flat Image Tree") Signed-off-by: Chen-Yu Tsai <[email protected]> Reviewed-by: Simon Glass <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2024-05-29kbuild: remove a stale comment about cleaning in link-vmlinux.shMasahiro Yamada1-1/+0
Remove the left-over of commit 51eb95e2da41 ("kbuild: Don't remove link-vmlinux temporary files on exit/signal"). Signed-off-by: Masahiro Yamada <[email protected]>
2024-05-29kbuild: fix short log for AS in link-vmlinux.shMasahiro Yamada1-1/+1
In convention, short logs print the output file, not the input file. Let's change the suffix for 'AS' since it assembles *.S into *.o. [Before] LD .tmp_vmlinux.kallsyms1 NM .tmp_vmlinux.kallsyms1.syms KSYMS .tmp_vmlinux.kallsyms1.S AS .tmp_vmlinux.kallsyms1.S LD .tmp_vmlinux.kallsyms2 NM .tmp_vmlinux.kallsyms2.syms KSYMS .tmp_vmlinux.kallsyms2.S AS .tmp_vmlinux.kallsyms2.S LD vmlinux [After] LD .tmp_vmlinux.kallsyms1 NM .tmp_vmlinux.kallsyms1.syms KSYMS .tmp_vmlinux.kallsyms1.S AS .tmp_vmlinux.kallsyms1.o LD .tmp_vmlinux.kallsyms2 NM .tmp_vmlinux.kallsyms2.syms KSYMS .tmp_vmlinux.kallsyms2.S AS .tmp_vmlinux.kallsyms2.o LD vmlinux Signed-off-by: Masahiro Yamada <[email protected]>
2024-05-29kbuild: change scripts/mksysmap into sed scriptMasahiro Yamada2-14/+7
The previous commit removed the subshell execution from scripts/mksysmap, which is now simple enough to become a sed script. Signed-off-by: Masahiro Yamada <[email protected]>
2024-05-29kbuild: avoid unneeded kallsyms step 3Masahiro Yamada2-13/+4
Since commit 951bcae6c5a0 ("kallsyms: Avoid weak references for kallsyms symbols"), the kallsyms step 3 always occurs. You can compare the build logs. [Before 951bcae6c5a0] $ git checkout 951bcae6c5a0^ $ make defconfig all [ snip ] LD .tmp_vmlinux.kallsyms1 NM .tmp_vmlinux.kallsyms1.syms KSYMS .tmp_vmlinux.kallsyms1.S AS .tmp_vmlinux.kallsyms1.S LD .tmp_vmlinux.kallsyms2 NM .tmp_vmlinux.kallsyms2.syms KSYMS .tmp_vmlinux.kallsyms2.S AS .tmp_vmlinux.kallsyms2.S LD vmlinux [After 951bcae6c5a0] $ git checkout 951bcae6c5a0 $ make defconfig all [ snip ] LD .tmp_vmlinux.kallsyms1 NM .tmp_vmlinux.kallsyms1.syms KSYMS .tmp_vmlinux.kallsyms1.S AS .tmp_vmlinux.kallsyms1.S LD .tmp_vmlinux.kallsyms2 NM .tmp_vmlinux.kallsyms2.syms KSYMS .tmp_vmlinux.kallsyms2.S AS .tmp_vmlinux.kallsyms2.S LD .tmp_vmlinux.kallsyms3 # should not happen NM .tmp_vmlinux.kallsyms3.syms # should not happen KSYMS .tmp_vmlinux.kallsyms3.S # should not happen AS .tmp_vmlinux.kallsyms3.S # should not happen LD vmlinux The resulting vmlinux is correct, but it always requires an additional linking step. The symbols produced by kallsyms are excluded from kallsyms itself because they were previously missing in step 1. With those symbols excluded, the symbol lists matched between step 1 and step 2, eliminating the need for step 3. Now, this has a negative effect. Since 951bcae6c5a0, the PROVIDE() directives provide the fallback definitions, which are not trimmed from the sysbol list in step 1 because ${kallsymso_prev} is empty at this point. In step 2, ${kallsymso_prev} is set, and the kallsyms_* symbols are trimmed from the symbol list. Due to the table size difference between step 1 and step 2 (the former is larger due to the presence of kallsyms_*), step 3 is triggered. Now that the kallsyms_* symbols are always linked, let's stop omitting them from kallsyms. This avoids unnecessary step 3. Fixes: 951bcae6c5a0 ("kallsyms: Avoid weak references for kallsyms symbols") Signed-off-by: Masahiro Yamada <[email protected]>
2024-05-29kbuild: scripts/gdb: Replace missed $(srctree)/$(src) w/ $(src)Douglas Anderson1-1/+1
Recently we went through the source tree and replaced $(srctree)/$(src) w/ $(src). However, the gdb scripts Makefile had a hidden $(srctree)/$(src) that looked like this: $(abspath $(srctree))/$(src) Because we missed that then my installed kernel had symlinks that looked like this: __init__.py -> ${INSTALL_DIR}/$(INSTALL_DIR}/scripts/gdb/linux/__init__.py Let's also replace the midden $(abspath $(srctree))/$(src) with $(src). Now: __init__.py -> $(INSTALL_DIR}/scripts/gdb/linux/__init__.py Fixes: b1992c3772e6 ("kbuild: use $(src) instead of $(srctree)/$(src) for source directory") Signed-off-by: Douglas Anderson <[email protected]> Signed-off-by: Masahiro Yamada <[email protected]>
2024-05-29kconfig: remove redundant check in expr_join_or()Masahiro Yamada1-1/+1
The check for 'sym1 == sym2' is redundant here because it has already been done a few lines above: if (sym1 != sym2) return NULL; Signed-off-by: Masahiro Yamada <[email protected]>
2024-05-29kconfig: fix comparison to constant symbols, 'm', 'n'Masahiro Yamada1-2/+4
Currently, comparisons to 'm' or 'n' result in incorrect output. [Test Code] config MODULES def_bool y modules config A def_tristate m config B def_bool A > n CONFIG_B is unset, while CONFIG_B=y is expected. The reason for the issue is because Kconfig compares the tristate values as strings. Currently, the .type fields in the constant symbol definitions, symbol_{yes,mod,no} are unspecified, i.e., S_UNKNOWN. When expr_calc_value() evaluates 'A > n', it checks the types of 'A' and 'n' to determine how to compare them. The left-hand side, 'A', is a tristate symbol with a value of 'm', which corresponds to a numeric value of 1. (Internally, 'y', 'm', and 'n' are represented as 2, 1, and 0, respectively.) The right-hand side, 'n', has an unknown type, so it is treated as the string "n" during the comparison. expr_calc_value() compares two values numerically only when both can have numeric values. Otherwise, they are compared as strings. symbol numeric value ASCII code ------------------------------------- y 2 0x79 m 1 0x6d n 0 0x6e 'm' is greater than 'n' if compared numerically (since 1 is greater than 0), but smaller than 'n' if compared as strings (since the ASCII code 0x6d is smaller than 0x6e). Specifying .type=S_TRISTATE for symbol_{yes,mod,no} fixes the above test code. Doing so, however, would cause a regression to the following test code. [Test Code 2] config MODULES def_bool n modules config A def_tristate n config B def_bool A = m You would get CONFIG_B=y, while CONFIG_B should not be set. The reason is because sym_get_string_value() turns 'm' into 'n' when the module feature is disabled. Consequently, expr_calc_value() evaluates 'A = n' instead of 'A = m'. This oddity has been hidden because the type of 'm' was previously S_UNKNOWN instead of S_TRISTATE. sym_get_string_value() should not tweak the string because the tristate value has already been correctly calculated. There is no reason to return the string "n" where its tristate value is mod. Fixes: 31847b67bec0 ("kconfig: allow use of relations other than (in)equality") Signed-off-by: Masahiro Yamada <[email protected]>
2024-05-29kconfig: remove unused expr_is_no()Masahiro Yamada1-5/+0
This has not been used since commit e911503085ae ("Kconfig: Remove bad inference rules expr_eliminate_dups2()"). Signed-off-by: Masahiro Yamada <[email protected]>
2024-05-29drm/gem-shmem: Add import attachment warning to locked pin functionAdrián Larumbe1-0/+2
Commit ec144244a43f ("drm/gem-shmem: Acquire reservation lock in GEM pin/unpin callbacks") moved locking DRM object's dma reservation to drm_gem_shmem_object_pin, and made drm_gem_shmem_pin_locked public, so we need to make sure the not-imported check warning is also added to the latter. Cc: Thomas Zimmermann <[email protected]> Cc: Dmitry Osipenko <[email protected]> Cc: Boris Brezillon <[email protected]> Fixes: a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()") Signed-off-by: Adrián Larumbe <[email protected]> Reviewed-by: Boris Brezillon <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-05-29drm/lima: Fix dma_resv deadlock at drm object pin timeAdrián Larumbe1-1/+1
Commit a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()") moved locking the DRM object's dma reservation to drm_gem_pin(), but Lima's pin callback kept calling drm_gem_shmem_pin, which also tries to lock the same dma_resv, leading to a double lock situation. As was already done for Panfrost in the previous commit, fix it by replacing drm_gem_shmem_pin() with its locked variant. Cc: Thomas Zimmermann <[email protected]> Cc: Dmitry Osipenko <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Steven Price <[email protected]> Fixes: a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()") Signed-off-by: Adrián Larumbe <[email protected]> Reviewed-by: Boris Brezillon <[email protected]> Tested-by: Val Packett <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-05-29drm/panfrost: Fix dma_resv deadlock at drm object pin timeAdrián Larumbe1-1/+1
When Panfrost must pin an object that is being prepared a dma-buf attachment for on behalf of another driver, the core drm gem object pinning code already takes a lock on the object's dma reservation. However, Panfrost GEM object's pinning callback would eventually try taking the lock on the same dma reservation when delegating pinning of the object onto the shmem subsystem, which led to a deadlock. This can be shown by enabling CONFIG_DEBUG_WW_MUTEX_SLOWPATH, which throws the following recursive locking situation: weston/3440 is trying to acquire lock: ffff000000e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_shmem_pin+0x34/0xb8 [drm_shmem_helper] but task is already holding lock: ffff000000e235a0 (reservation_ww_class_mutex){+.+.}-{3:3}, at: drm_gem_pin+0x2c/0x80 [drm] Fix it by replacing drm_gem_shmem_pin with its locked version, as the lock had already been taken by drm_gem_pin(). Cc: Thomas Zimmermann <[email protected]> Cc: Dmitry Osipenko <[email protected]> Cc: Boris Brezillon <[email protected]> Cc: Steven Price <[email protected]> Fixes: a78027847226 ("drm/gem: Acquire reservation lock in drm_gem_{pin/unpin}()") Signed-off-by: Adrián Larumbe <[email protected]> Reviewed-by: Boris Brezillon <[email protected]> Signed-off-by: Boris Brezillon <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-05-28net/sched: taprio: extend minimum interval restriction to entire cycle tooVladimir Oltean2-5/+27
It is possible for syzbot to side-step the restriction imposed by the blamed commit in the Fixes: tag, because the taprio UAPI permits a cycle-time different from (and potentially shorter than) the sum of entry intervals. We need one more restriction, which is that the cycle time itself must be larger than N * ETH_ZLEN bit times, where N is the number of schedule entries. This restriction needs to apply regardless of whether the cycle time came from the user or was the implicit, auto-calculated value, so we move the existing "cycle == 0" check outside the "if "(!new->cycle_time)" branch. This way covers both conditions and scenarios. Add a selftest which illustrates the issue triggered by syzbot. Fixes: b5b73b26b3ca ("taprio: Fix allowing too small intervals") Reported-by: [email protected] Closes: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Vladimir Oltean <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-28net/sched: taprio: make q->picos_per_byte available to fill_sched_entry()Vladimir Oltean2-1/+25
In commit b5b73b26b3ca ("taprio: Fix allowing too small intervals"), a comparison of user input against length_to_duration(q, ETH_ZLEN) was introduced, to avoid RCU stalls due to frequent hrtimers. The implementation of length_to_duration() depends on q->picos_per_byte being set for the link speed. The blamed commit in the Fixes: tag has moved this too late, so the checks introduced above are ineffective. The q->picos_per_byte is zero at parse_taprio_schedule() -> parse_sched_list() -> parse_sched_entry() -> fill_sched_entry() time. Move the taprio_set_picos_per_byte() call as one of the first things in taprio_change(), before the bulk of the netlink attribute parsing is done. That's because it is needed there. Add a selftest to make sure the issue doesn't get reintroduced. Fixes: 09dbdf28f9f9 ("net/sched: taprio: fix calculation of maximum gate durations") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-05-28bcachefs: Don't return -EROFS from mount on inconsistency errorKent Overstreet1-2/+10
We were accidentally returning -EROFS during recovery on filesystem inconsistency - since this is what the journal returns on emergency shutdown. Signed-off-by: Kent Overstreet <[email protected]>
2024-05-29netfilter: nft_fib: allow from forward/input without iif selectorEric Garver1-5/+3
This removes the restriction of needing iif selector in the forward/input hooks for fib lookups when requested result is oif/oifname. Removing this restriction allows "loose" lookups from the forward hooks. Fixes: be8be04e5ddb ("netfilter: nft_fib: reverse path filter for policy-based routing on iif") Signed-off-by: Eric Garver <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-05-29netfilter: tproxy: bail out if IP has been disabled on the deviceFlorian Westphal1-0/+2
syzbot reports: general protection fault, probably for non-canonical address 0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f] [..] RIP: 0010:nf_tproxy_laddr4+0xb7/0x340 net/ipv4/netfilter/nf_tproxy_ipv4.c:62 Call Trace: nft_tproxy_eval_v4 net/netfilter/nft_tproxy.c:56 [inline] nft_tproxy_eval+0xa9a/0x1a00 net/netfilter/nft_tproxy.c:168 __in_dev_get_rcu() can return NULL, so check for this. Reported-and-tested-by: [email protected] Fixes: cc6eb4338569 ("tproxy: use the interface primary IP address as a default value for --on-ip") Signed-off-by: Florian Westphal <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-05-29netfilter: nft_payload: skbuff vlan metadata mangle supportPablo Neira Ayuso1-7/+65
Userspace assumes vlan header is present at a given offset, but vlan offload allows to store this in metadata fields of the skbuff. Hence mangling vlan results in a garbled packet. Handle this transparently by adding a parser to the kernel. If vlan metadata is present and payload offset is over 12 bytes (source and destination mac address fields), then subtract vlan header present in vlan metadata, otherwise mangle vlan metadata based on offset and length, extracting data from the source register. This is similar to: 8cfd23e67401 ("netfilter: nft_payload: work around vlan header stripping") to deal with vlan payload mangling. Fixes: 7ec3f7b47b8d ("netfilter: nft_payload: add packet mangling support") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-05-28bcachefs: Fix uninitialized var warningKent Overstreet2-2/+2
Can't actually be used uninitialized, but gcc was being silly. Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28bcachefs: Split out sb-errors_format.hKent Overstreet3-292/+297
Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28bcachefs: Split out journal_seq_blacklist_format.hKent Overstreet2-10/+16
Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28bcachefs: Split out replicas_format.hKent Overstreet2-32/+36
Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28bcachefs: Split out disk_groups_format.hKent Overstreet2-18/+22
Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28bcachefs: split out sb-downgrade_format.hKent Overstreet2-13/+18
Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28bcachefs: split out sb-members_format.hKent Overstreet2-101/+111
Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28Merge remote-tracking branch 'drm/drm-fixes' into drm-misc-fixesMaarten Lankhorst12567-255046/+494451
v6.10-rc1 is released, forward from v6.9 Signed-off-by: Maarten Lankhorst <[email protected]>
2024-05-28Merge tag 'tpmdd-next-6.10-rc2' of ↵Linus Torvalds6-48/+31
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm fixes from Jarkko Sakkinen: "This fixes two unaddressed review comments for the HMAC encryption patch set. They are cosmetic but we are better off, if such unnecessary glitches do not exist in the release. The important part is enabling the HMAC encryption by default only on x86-64 because that is the only sufficiently tested arch. Finally, there is a bug fix for SPI transfer buffer allocation, which did not take into account the SPI header size" * tag 'tpmdd-next-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm: Enable TCG_TPM2_HMAC by default only for X86_64 tpm: Rename TPM2_OA_TMPL to TPM2_OA_NULL_KEY and make it local tpm: Open code tpm_buf_parameters() tpm_tis_spi: Account for SPI header when allocating TPM SPI xfer buffer
2024-05-28Merge tag 'probes-fixes-v6.10-rc1' of ↵Linus Torvalds2-5/+13
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - uprobes: prevent mutex_lock() under rcu_read_lock(). Recent changes moved uprobe_cpu_buffer preparation which involves mutex_lock(), under __uprobe_trace_func() which is called inside rcu_read_lock(). Fix it by moving uprobe_cpu_buffer preparation outside of __uprobe_trace_func() - kprobe-events: handle the error case of btf_find_struct_member() * tag 'probes-fixes-v6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/probes: fix error check in parse_btf_field() uprobes: prevent mutex_lock() under rcu_read_lock()
2024-05-28nvmet: fix a possible leak when destroy a ctrl during qp establishmentSagi Grimberg1-0/+9
In nvmet_sq_destroy we capture sq->ctrl early and if it is non-NULL we know that a ctrl was allocated (in the admin connect request handler) and we need to release pending AERs, clear ctrl->sqs and sq->ctrl (for nvme-loop primarily), and drop the final reference on the ctrl. However, a small window is possible where nvmet_sq_destroy starts (as a result of the client giving up and disconnecting) concurrently with the nvme admin connect cmd (which may be in an early stage). But *before* kill_and_confirm of sq->ref (i.e. the admin connect managed to get an sq live reference). In this case, sq->ctrl was allocated however after it was captured in a local variable in nvmet_sq_destroy. This prevented the final reference drop on the ctrl. Solve this by re-capturing the sq->ctrl after all inflight request has completed, where for sure sq->ctrl reference is final, and move forward based on that. This issue was observed in an environment with many hosts connecting multiple ctrls simoutanuosly, creating a delay in allocating a ctrl leading up to this race window. Reported-by: Alex Turin <[email protected]> Signed-off-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2024-05-28nvme: use srcu for iterating namespace listKeith Busch4-56/+83
The nvme pci driver synchronizes with all the namespace queues during a reset to ensure that there's no pending timeout work. Meanwhile the timeout work potentially iterates those same namespaces to freeze their queues. Each of those namespace iterations use the same read lock. If a write lock should somehow get between the synchronize and freeze steps, then forward progress is deadlocked. We had been relying on the nvme controller state machine to ensure the reset work wouldn't conflict with timeout work. That guarantee may be a bit fragile to rely on, so iterate the namespace lists without taking potentially circular locks, as reported by lockdep. Link: https://lore.kernel.org/all/20220930001943.zdbvolc3gkekfmcv@shindev/ Reported-by: Shinichiro Kawasaki <[email protected]> Tested-by: Shinichiro Kawasaki <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2024-05-28bcachefs: Better fsck error message for key versionKent Overstreet1-4/+5
Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28bcachefs: btree_gc can now handle unknown btreesKent Overstreet5-73/+55
Compatibility fix - we no longer have a separate table for which order gc walks btrees in, and special case the stripes btree directly. Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28bcachefs: add missing MODULE_DESCRIPTION()Jeff Johnson1-0/+1
Fix the 'make W=1' warning: WARNING: modpost: missing MODULE_DESCRIPTION() in fs/bcachefs/mean_and_variance_test.o Signed-off-by: Jeff Johnson <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28bcachefs: Fix setting of downgrade recovery passes/errorsKent Overstreet1-9/+3
bch2_check_version_downgrade() was setting c->sb.version, which bch2_sb_set_downgrade() expects to be at the previous version; and it shouldn't even have been set directly because c->sb.version is updated by write_super(). Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28bcachefs: Run check_key_has_snapshot in snapshot_delete_keys()Kent Overstreet3-24/+29
delete_dead_snapshots now runs before the main fsck.c passes which check for keys for invalid snapshots; thus, it needs those checks as well. Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28bcachefs: Refactor delete_dead_snapshots()Kent Overstreet1-40/+25
Consolidate per-key work into delete_dead_snapshots_process_key(), so we now walk all keys once, not twice. Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28bcachefs: Fix locking assertKent Overstreet1-5/+5
We now track whether a transaction is locked, and verify that we don't have nodes locked when the transaction isn't locked; reorder relocks to not pop the new assert. Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28bcachefs: Fix lookup_first_inode() when inode_generations are presentKent Overstreet1-14/+10
This function is used for finding the hash seed (which is the same in all versions of an inode in different snapshots): ff an inode has been deleted in a child snapshot we need to iterate until we find a live version. Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28bcachefs: Plumb bkey into __btree_err()Kent Overstreet1-40/+45
It can be useful to know the exact byte offset within a btree node where an error occured. Signed-off-by: Kent Overstreet <[email protected]>
2024-05-28net: ti: icssg-prueth: Fix start counter for ft1 filterMD Danish Anwar1-1/+1
The start counter for FT1 filter is wrongly set to 0 in the driver. FT1 is used for source address violation (SAV) check and source address starts at Byte 6 not Byte 0. Fix this by changing start counter to ETH_ALEN in icssg_ft1_set_mac_addr(). Fixes: e9b4ece7d74b ("net: ti: icssg-prueth: Add Firmware config and classification APIs.") Signed-off-by: MD Danish Anwar <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-05-28bcache: code cleanup in __bch_bucket_alloc_set()Coly Li1-6/+2
In __bch_bucket_alloc_set() the lines after lable 'err:' indeed do nothing useful after multiple cache devices are removed from bcache code. This cleanup patch drops the useless code to save a bit CPU cycles. Signed-off-by: Coly Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-05-28bcache: call force_wake_up_gc() if necessary in check_should_bypass()Coly Li1-1/+15
If there are extreme heavy write I/O continuously hit on relative small cache device (512GB in my testing), it is possible to make counter c->gc_stats.in_use continue to increase and exceed CUTOFF_CACHE_ADD. If 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, all following write requests will bypass the cache device because check_should_bypass() returns 'true'. Because all writes bypass the cache device, counter c->sectors_to_gc has no chance to be negative value, and garbage collection thread won't be waken up even the whole cache becomes clean after writeback accomplished. The aftermath is that all write I/Os go directly into backing device even the cache device is clean. To avoid the above situation, this patch uses a quite conservative way to fix: if 'c->gc_stats.in_use > CUTOFF_CACHE_ADD' happens, only wakes up garbage collection thread when the whole cache device is clean. Before the fix, the writes-always-bypass situation happens after 10+ hours write I/O pressure on 512GB Intel optane memory which acts as cache device. After this fix, such situation doesn't happen after 36+ hours testing. Signed-off-by: Coly Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-05-28bcache: allow allocator to invalidate bucket in gcDongsheng Yang3-9/+12
Currently, if the gc is running, when the allocator found free_inc is empty, allocator has to wait the gc finish. Before that, the IO is blocked. But actually, there would be some buckets is reclaimable before gc, and gc will never mark this kind of bucket to be unreclaimable. So we can put these buckets into free_inc in gc running to avoid IO being blocked. Signed-off-by: Dongsheng Yang <[email protected]> Signed-off-by: Mingzhe Zou <[email protected]> Signed-off-by: Coly Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-05-28block: check for max_hw_sectors underflowHannes Reinecke1-2/+6
The logical block size need to be smaller than the max_hw_sector setting, otherwise we can't even transfer a single LBA. Signed-off-by: Hannes Reinecke <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: John Garry <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2024-05-28block: stack max_user_sectorsChristoph Hellwig1-0/+2
The max_user_sectors is one of the three factors determining the actual max_sectors limit for READ/WRITE requests. Because of that it needs to be stacked at least for the device mapper multi-path case where requests are directly inserted on the lower device. For SCSI disks this is important because the sd driver actually sets it's own advisory limit that is lower than max_hw_sectors based on the block limits VPD page. While this is a bit odd an unusual, the same effect can happen if a user or udev script tweaks the value manually. Fixes: 4f563a64732d ("block: add a max_user_discard_sectors queue limit") Reported-by: Mike Snitzer <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Mike Snitzer <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-05-28sd: also set max_user_sectors when setting max_sectorsChristoph Hellwig1-1/+3
sd can set a max_sectors value that is lower than the max_hw_sectors limit based on the block limits VPD page. While this is rather unusual, it used to work until the max_user_sectors field was split out to cleanly deal with conflicting hardware and user limits when the hardware limit changes. Also set max_user_sectors to ensure the limit can properly be stacked. Fixes: 4f563a64732d ("block: add a max_user_discard_sectors queue limit") Reported-by: Mike Snitzer <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Mike Snitzer <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-05-28null_blk: Print correct max open zones limit in null_init_zoned_dev()Damien Le Moal1-1/+1
When changing the maximum number of open zones, print that number instead of the total number of zones. Fixes: dc4d137ee3b7 ("null_blk: add support for max open/active zone limit for zoned devices") Cc: [email protected] Signed-off-by: Damien Le Moal <[email protected]> Reviewed-by: Niklas Cassel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-05-28regulator: rtq2208: Fix invalid memory access when ↵Alina Yu1-10/+12
devm_of_regulator_put_matches is called In this patch, a software bug has been fixed. rtq2208_ldo_match is no longer a local variable. It prevents invalid memory access when devm_of_regulator_put_matches is called. Signed-off-by: Alina Yu <[email protected]> Link: https://msgid.link/r/4ce8c4f16f1cf3aa4e5f36c0694dd3c5ccf3cd1c.1716870419.git.alina_yu@richtek.com Signed-off-by: Mark Brown <[email protected]>