aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-07-29Merge branch 'for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu Pull m68knommu fix from Greg Ungerer: "A single compile time fix" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu: m68k/coldfire: change pll var. to clk_pll
2021-07-29btrfs: calculate number of eb pages properly in csum_tree_blockDavid Sterba1-1/+1
Building with -Warray-bounds on systems with 64K pages there's a warning: fs/btrfs/disk-io.c: In function ‘csum_tree_block’: fs/btrfs/disk-io.c:226:34: warning: array subscript 1 is above array bounds of ‘struct page *[1]’ [-Warray-bounds] 226 | kaddr = page_address(buf->pages[i]); | ~~~~~~~~~~^~~ ./include/linux/mm.h:1630:48: note: in definition of macro ‘page_address’ 1630 | #define page_address(page) lowmem_page_address(page) | ^~~~ In file included from fs/btrfs/ctree.h:32, from fs/btrfs/disk-io.c:23: fs/btrfs/extent_io.h:98:15: note: while referencing ‘pages’ 98 | struct page *pages[1]; | ^~~~~ The compiler has no way to know that in that case the nodesize is exactly PAGE_SIZE, so the resulting number of pages will be correct (1). Let's use num_extent_pages that makes the case nodesize == PAGE_SIZE explicitly 1. Reported-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-07-29HID: ft260: fix device removal due to USB disconnectMichael Zaidman1-16/+7
This commit fixes a functional regression introduced by the commit 82f09a637dd3 ("HID: ft260: improve error handling of ft260_hid_feature_report_get()") when upon USB disconnect, the FTDI FT260 i2c device is still available within the /dev folder. In my company's product, where the host USB to FT260 USB connection is hard-wired in the PCB, the issue is not reproducible. To reproduce it, I used the VirtualBox Ubuntu 20.04 VM and the UMFT260EV1A development module for the FTDI FT260 chip: Plug the UMFT260EV1A module into a USB port and attach it to VM. The VM shows 2 i2c devices under the /dev: michael@michael-VirtualBox:~$ ls /dev/i2c-* /dev/i2c-0 /dev/i2c-1 The i2c-0 is not related to the FTDI FT260: michael@michael-VirtualBox:~$ cat /sys/bus/i2c/devices/i2c-0/name SMBus PIIX4 adapter at 4100 The i2c-1 is created by hid-ft260.ko: michael@michael-VirtualBox:~$ cat /sys/bus/i2c/devices/i2c-1/name FT260 usb-i2c bridge on hidraw1 Now, detach the FTDI FT260 USB device from VM. We expect the /dev/i2c-1 to disappear, but it's still here: michael@michael-VirtualBox:~$ ls /dev/i2c-* /dev/i2c-0 /dev/i2c-1 And the kernel log shows: [ +0.001202] usb 2-2: USB disconnect, device number 3 [ +0.000109] ft260 0003:0403:6030.0002: failed to retrieve system status [ +0.000316] ft260 0003:0403:6030.0003: failed to retrieve system status It happens because the commit 82f09a637dd3 changed the ft260_get_system_config() return logic. This caused the ft260_is_interface_enabled() to exit with error upon the FT260 device USB disconnect, which in turn, aborted the ft260_remove() before deleting the FT260 i2c device and cleaning its sysfs stuff. This commit restores the FT260 USB removal functionality and improves the ft260_is_interface_enabled() code to handle correctly all chip modes defined by the device interface configuration pins DCNF0 and DCNF1. Signed-off-by: Michael Zaidman <[email protected]> Acked-by: Aaron Jones (FTDI-UK) <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2021-07-29Merge tag 'amd-drm-fixes-5.14-2021-07-28' of ↵Dave Airlie8-14/+16
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.14-2021-07-28: amdgpu: - Fix resource leak in an error path - Avoid stack contents exposure in error path - pmops check fix for S0ix vs S3 - DCN 2.1 display fixes - DCN 2.0 display fix - Backlight control fix for laptops with HDR panels - Maintainers updates Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-07-28alpha: register early reserved memory in memblockMike Rapoport1-6/+7
The memory reserved by console/PALcode or non-volatile memory is not added to memblock.memory. Since commit fa3354e4ea39 (mm: free_area_init: use maximal zone PFNs rather than zone sizes) the initialization of the memory map relies on the accuracy of memblock.memory to properly calculate zone sizes. The holes in memblock.memory caused by absent regions reserved by the firmware cause incorrect initialization of struct pages which leads to BUG() during the initial page freeing: BUG: Bad page state in process swapper pfn:2ffc53 page:fffffc000ecf14c0 refcount:0 mapcount:1 mapping:0000000000000000 index:0x0 flags: 0x0() raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 raw: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 page dumped because: nonzero mapcount Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.7.0-03841-gfa3354e4ea39-dirty #26 fffffc0001b5bd68 fffffc0001b5be80 fffffc00011cd148 fffffc000ecf14c0 fffffc00019803df fffffc0001b5be80 fffffc00011ce340 fffffc000ecf14c0 0000000000000000 fffffc0001b5be80 fffffc0001b482c0 fffffc00027d6618 fffffc00027da7d0 00000000002ff97a 0000000000000000 fffffc0001b5be80 fffffc00011d1abc fffffc000ecf14c0 fffffc0002d00000 fffffc0001b5be80 fffffc0001b2350c 0000000000300000 fffffc0001b48298 fffffc0001b482c0 Trace: [<fffffc00011cd148>] bad_page+0x168/0x1b0 [<fffffc00011ce340>] free_pcp_prepare+0x1e0/0x290 [<fffffc00011d1abc>] free_unref_page+0x2c/0xa0 [<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30 [<fffffc00014ee5f0>] cmp_ex_sort+0x0/0x30 [<fffffc000101001c>] _stext+0x1c/0x20 Fix this by registering the reserved ranges in memblock.memory. Link: https://lore.kernel.org/lkml/20210726192311.uffqnanxw3ac5wwi@ivybridge Fixes: fa3354e4ea39 ("mm: free_area_init: use maximal zone PFNs rather than zone sizes") Reported-by: Matt Turner <[email protected]> Cc: <[email protected]> Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Matt Turner <[email protected]>
2021-07-29Merge tag 'drm-intel-fixes-2021-07-28' of ↵Dave Airlie3-6/+14
git://anongit.freedesktop.org/drm/drm-intel into drm-fixes Display related fixes: - Fix vbt port mask - Fix around reading the right DSC disable fuse in display_ver 10 - Split display version 9 and 10 in intel_setup_outputs Signed-off-by: Dave Airlie <[email protected]> From: Rodrigo Vivi <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-07-29Merge tag 'drm-misc-fixes-2021-07-28' of ↵Dave Airlie3-17/+13
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: * panel: Fix bpc for ytc700tlag_05_201c * ttm: debugfs init fixes Signed-off-by: Dave Airlie <[email protected]> From: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-07-29Merge tag 'drm-msm-fixes-2021-07-27' of ↵Dave Airlie5-3/+18
https://gitlab.freedesktop.org/drm/msm into drm-fixes A few fixes for v5.14, including a fix for a crash if display triggers an iommu fault (which tends to happen at probe time on devices with bootloader fw that leaves display enabled as kernel starts) Signed-off-by: Dave Airlie <[email protected]> From: Rob Clark <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/CAF6AEGubeV_uzWhsqp_+EmQmPcPatnqWOQnARoing2YvQOHbyg@mail.gmail.com
2021-07-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller20-138/+446
Daniel Borkmann says: ==================== pull-request: bpf 2021-07-29 The following pull-request contains BPF updates for your *net* tree. We've added 9 non-merge commits during the last 14 day(s) which contain a total of 20 files changed, 446 insertions(+), 138 deletions(-). The main changes are: 1) Fix UBSAN out-of-bounds splat for showing XDP link fdinfo, from Lorenz Bauer. 2) Fix insufficient Spectre v4 mitigation in BPF runtime, from Daniel Borkmann, Piotr Krysiuk and Benedict Schlueter. 3) Batch of fixes for BPF sockmap found under stress testing, from John Fastabend. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-07-29bpf: Fix leakage due to insufficient speculative store bypass mitigationDaniel Borkmann2-56/+33
Spectre v4 gadgets make use of memory disambiguation, which is a set of techniques that execute memory access instructions, that is, loads and stores, out of program order; Intel's optimization manual, section 2.4.4.5: A load instruction micro-op may depend on a preceding store. Many microarchitectures block loads until all preceding store addresses are known. The memory disambiguator predicts which loads will not depend on any previous stores. When the disambiguator predicts that a load does not have such a dependency, the load takes its data from the L1 data cache. Eventually, the prediction is verified. If an actual conflict is detected, the load and all succeeding instructions are re-executed. af86ca4e3088 ("bpf: Prevent memory disambiguation attack") tried to mitigate this attack by sanitizing the memory locations through preemptive "fast" (low latency) stores of zero prior to the actual "slow" (high latency) store of a pointer value such that upon dependency misprediction the CPU then speculatively executes the load of the pointer value and retrieves the zero value instead of the attacker controlled scalar value previously stored at that location, meaning, subsequent access in the speculative domain is then redirected to the "zero page". The sanitized preemptive store of zero prior to the actual "slow" store is done through a simple ST instruction based on r10 (frame pointer) with relative offset to the stack location that the verifier has been tracking on the original used register for STX, which does not have to be r10. Thus, there are no memory dependencies for this store, since it's only using r10 and immediate constant of zero; hence af86ca4e3088 /assumed/ a low latency operation. However, a recent attack demonstrated that this mitigation is not sufficient since the preemptive store of zero could also be turned into a "slow" store and is thus bypassed as well: [...] // r2 = oob address (e.g. scalar) // r7 = pointer to map value 31: (7b) *(u64 *)(r10 -16) = r2 // r9 will remain "fast" register, r10 will become "slow" register below 32: (bf) r9 = r10 // JIT maps BPF reg to x86 reg: // r9 -> r15 (callee saved) // r10 -> rbp // train store forward prediction to break dependency link between both r9 // and r10 by evicting them from the predictor's LRU table. 33: (61) r0 = *(u32 *)(r7 +24576) 34: (63) *(u32 *)(r7 +29696) = r0 35: (61) r0 = *(u32 *)(r7 +24580) 36: (63) *(u32 *)(r7 +29700) = r0 37: (61) r0 = *(u32 *)(r7 +24584) 38: (63) *(u32 *)(r7 +29704) = r0 39: (61) r0 = *(u32 *)(r7 +24588) 40: (63) *(u32 *)(r7 +29708) = r0 [...] 543: (61) r0 = *(u32 *)(r7 +25596) 544: (63) *(u32 *)(r7 +30716) = r0 // prepare call to bpf_ringbuf_output() helper. the latter will cause rbp // to spill to stack memory while r13/r14/r15 (all callee saved regs) remain // in hardware registers. rbp becomes slow due to push/pop latency. below is // disasm of bpf_ringbuf_output() helper for better visual context: // // ffffffff8117ee20: 41 54 push r12 // ffffffff8117ee22: 55 push rbp // ffffffff8117ee23: 53 push rbx // ffffffff8117ee24: 48 f7 c1 fc ff ff ff test rcx,0xfffffffffffffffc // ffffffff8117ee2b: 0f 85 af 00 00 00 jne ffffffff8117eee0 <-- jump taken // [...] // ffffffff8117eee0: 49 c7 c4 ea ff ff ff mov r12,0xffffffffffffffea // ffffffff8117eee7: 5b pop rbx // ffffffff8117eee8: 5d pop rbp // ffffffff8117eee9: 4c 89 e0 mov rax,r12 // ffffffff8117eeec: 41 5c pop r12 // ffffffff8117eeee: c3 ret 545: (18) r1 = map[id:4] 547: (bf) r2 = r7 548: (b7) r3 = 0 549: (b7) r4 = 4 550: (85) call bpf_ringbuf_output#194288 // instruction 551 inserted by verifier \ 551: (7a) *(u64 *)(r10 -16) = 0 | /both/ are now slow stores here // storing map value pointer r7 at fp-16 | since value of r10 is "slow". 552: (7b) *(u64 *)(r10 -16) = r7 / // following "fast" read to the same memory location, but due to dependency // misprediction it will speculatively execute before insn 551/552 completes. 553: (79) r2 = *(u64 *)(r9 -16) // in speculative domain contains attacker controlled r2. in non-speculative // domain this contains r7, and thus accesses r7 +0 below. 554: (71) r3 = *(u8 *)(r2 +0) // leak r3 As can be seen, the current speculative store bypass mitigation which the verifier inserts at line 551 is insufficient since /both/, the write of the zero sanitation as well as the map value pointer are a high latency instruction due to prior memory access via push/pop of r10 (rbp) in contrast to the low latency read in line 553 as r9 (r15) which stays in hardware registers. Thus, architecturally, fp-16 is r7, however, microarchitecturally, fp-16 can still be r2. Initial thoughts to address this issue was to track spilled pointer loads from stack and enforce their load via LDX through r10 as well so that /both/ the preemptive store of zero /as well as/ the load use the /same/ register such that a dependency is created between the store and load. However, this option is not sufficient either since it can be bypassed as well under speculation. An updated attack with pointer spill/fills now _all_ based on r10 would look as follows: [...] // r2 = oob address (e.g. scalar) // r7 = pointer to map value [...] // longer store forward prediction training sequence than before. 2062: (61) r0 = *(u32 *)(r7 +25588) 2063: (63) *(u32 *)(r7 +30708) = r0 2064: (61) r0 = *(u32 *)(r7 +25592) 2065: (63) *(u32 *)(r7 +30712) = r0 2066: (61) r0 = *(u32 *)(r7 +25596) 2067: (63) *(u32 *)(r7 +30716) = r0 // store the speculative load address (scalar) this time after the store // forward prediction training. 2068: (7b) *(u64 *)(r10 -16) = r2 // preoccupy the CPU store port by running sequence of dummy stores. 2069: (63) *(u32 *)(r7 +29696) = r0 2070: (63) *(u32 *)(r7 +29700) = r0 2071: (63) *(u32 *)(r7 +29704) = r0 2072: (63) *(u32 *)(r7 +29708) = r0 2073: (63) *(u32 *)(r7 +29712) = r0 2074: (63) *(u32 *)(r7 +29716) = r0 2075: (63) *(u32 *)(r7 +29720) = r0 2076: (63) *(u32 *)(r7 +29724) = r0 2077: (63) *(u32 *)(r7 +29728) = r0 2078: (63) *(u32 *)(r7 +29732) = r0 2079: (63) *(u32 *)(r7 +29736) = r0 2080: (63) *(u32 *)(r7 +29740) = r0 2081: (63) *(u32 *)(r7 +29744) = r0 2082: (63) *(u32 *)(r7 +29748) = r0 2083: (63) *(u32 *)(r7 +29752) = r0 2084: (63) *(u32 *)(r7 +29756) = r0 2085: (63) *(u32 *)(r7 +29760) = r0 2086: (63) *(u32 *)(r7 +29764) = r0 2087: (63) *(u32 *)(r7 +29768) = r0 2088: (63) *(u32 *)(r7 +29772) = r0 2089: (63) *(u32 *)(r7 +29776) = r0 2090: (63) *(u32 *)(r7 +29780) = r0 2091: (63) *(u32 *)(r7 +29784) = r0 2092: (63) *(u32 *)(r7 +29788) = r0 2093: (63) *(u32 *)(r7 +29792) = r0 2094: (63) *(u32 *)(r7 +29796) = r0 2095: (63) *(u32 *)(r7 +29800) = r0 2096: (63) *(u32 *)(r7 +29804) = r0 2097: (63) *(u32 *)(r7 +29808) = r0 2098: (63) *(u32 *)(r7 +29812) = r0 // overwrite scalar with dummy pointer; same as before, also including the // sanitation store with 0 from the current mitigation by the verifier. 2099: (7a) *(u64 *)(r10 -16) = 0 | /both/ are now slow stores here 2100: (7b) *(u64 *)(r10 -16) = r7 | since store unit is still busy. // load from stack intended to bypass stores. 2101: (79) r2 = *(u64 *)(r10 -16) 2102: (71) r3 = *(u8 *)(r2 +0) // leak r3 [...] Looking at the CPU microarchitecture, the scheduler might issue loads (such as seen in line 2101) before stores (line 2099,2100) because the load execution units become available while the store execution unit is still busy with the sequence of dummy stores (line 2069-2098). And so the load may use the prior stored scalar from r2 at address r10 -16 for speculation. The updated attack may work less reliable on CPU microarchitectures where loads and stores share execution resources. This concludes that the sanitizing with zero stores from af86ca4e3088 ("bpf: Prevent memory disambiguation attack") is insufficient. Moreover, the detection of stack reuse from af86ca4e3088 where previously data (STACK_MISC) has been written to a given stack slot where a pointer value is now to be stored does not have sufficient coverage as precondition for the mitigation either; for several reasons outlined as follows: 1) Stack content from prior program runs could still be preserved and is therefore not "random", best example is to split a speculative store bypass attack between tail calls, program A would prepare and store the oob address at a given stack slot and then tail call into program B which does the "slow" store of a pointer to the stack with subsequent "fast" read. From program B PoV such stack slot type is STACK_INVALID, and therefore also must be subject to mitigation. 2) The STACK_SPILL must not be coupled to register_is_const(&stack->spilled_ptr) condition, for example, the previous content of that memory location could also be a pointer to map or map value. Without the fix, a speculative store bypass is not mitigated in such precondition and can then lead to a type confusion in the speculative domain leaking kernel memory near these pointer types. While brainstorming on various alternative mitigation possibilities, we also stumbled upon a retrospective from Chrome developers [0]: [...] For variant 4, we implemented a mitigation to zero the unused memory of the heap prior to allocation, which cost about 1% when done concurrently and 4% for scavenging. Variant 4 defeats everything we could think of. We explored more mitigations for variant 4 but the threat proved to be more pervasive and dangerous than we anticipated. For example, stack slots used by the register allocator in the optimizing compiler could be subject to type confusion, leading to pointer crafting. Mitigating type confusion for stack slots alone would have required a complete redesign of the backend of the optimizing compiler, perhaps man years of work, without a guarantee of completeness. [...] From BPF side, the problem space is reduced, however, options are rather limited. One idea that has been explored was to xor-obfuscate pointer spills to the BPF stack: [...] // preoccupy the CPU store port by running sequence of dummy stores. [...] 2106: (63) *(u32 *)(r7 +29796) = r0 2107: (63) *(u32 *)(r7 +29800) = r0 2108: (63) *(u32 *)(r7 +29804) = r0 2109: (63) *(u32 *)(r7 +29808) = r0 2110: (63) *(u32 *)(r7 +29812) = r0 // overwrite scalar with dummy pointer; xored with random 'secret' value // of 943576462 before store ... 2111: (b4) w11 = 943576462 2112: (af) r11 ^= r7 2113: (7b) *(u64 *)(r10 -16) = r11 2114: (79) r11 = *(u64 *)(r10 -16) 2115: (b4) w2 = 943576462 2116: (af) r2 ^= r11 // ... and restored with the same 'secret' value with the help of AX reg. 2117: (71) r3 = *(u8 *)(r2 +0) [...] While the above would not prevent speculation, it would make data leakage infeasible by directing it to random locations. In order to be effective and prevent type confusion under speculation, such random secret would have to be regenerated for each store. The additional complexity involved for a tracking mechanism that prevents jumps such that restoring spilled pointers would not get corrupted is not worth the gain for unprivileged. Hence, the fix in here eventually opted for emitting a non-public BPF_ST | BPF_NOSPEC instruction which the x86 JIT translates into a lfence opcode. Inserting the latter in between the store and load instruction is one of the mitigations options [1]. The x86 instruction manual notes: [...] An LFENCE that follows an instruction that stores to memory might complete before the data being stored have become globally visible. [...] The latter meaning that the preceding store instruction finished execution and the store is at minimum guaranteed to be in the CPU's store queue, but it's not guaranteed to be in that CPU's L1 cache at that point (globally visible). The latter would only be guaranteed via sfence. So the load which is guaranteed to execute after the lfence for that local CPU would have to rely on store-to-load forwarding. [2], in section 2.3 on store buffers says: [...] For every store operation that is added to the ROB, an entry is allocated in the store buffer. This entry requires both the virtual and physical address of the target. Only if there is no free entry in the store buffer, the frontend stalls until there is an empty slot available in the store buffer again. Otherwise, the CPU can immediately continue adding subsequent instructions to the ROB and execute them out of order. On Intel CPUs, the store buffer has up to 56 entries. [...] One small upside on the fix is that it lifts constraints from af86ca4e3088 where the sanitize_stack_off relative to r10 must be the same when coming from different paths. The BPF_ST | BPF_NOSPEC gets emitted after a BPF_STX or BPF_ST instruction. This happens either when we store a pointer or data value to the BPF stack for the first time, or upon later pointer spills. The former needs to be enforced since otherwise stale stack data could be leaked under speculation as outlined earlier. For non-x86 JITs the BPF_ST | BPF_NOSPEC mapping is currently optimized away, but others could emit a speculation barrier as well if necessary. For real-world unprivileged programs e.g. generated by LLVM, pointer spill/fill is only generated upon register pressure and LLVM only tries to do that for pointers which are not used often. The program main impact will be the initial BPF_ST | BPF_NOSPEC sanitation for the STACK_INVALID case when the first write to a stack slot occurs e.g. upon map lookup. In future we might refine ways to mitigate the latter cost. [0] https://arxiv.org/pdf/1902.05178.pdf [1] https://msrc-blog.microsoft.com/2018/05/21/analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639/ [2] https://arxiv.org/pdf/1905.05725.pdf Fixes: af86ca4e3088 ("bpf: Prevent memory disambiguation attack") Fixes: f7cf25b2026d ("bpf: track spill/fill of constants") Co-developed-by: Piotr Krysiuk <[email protected]> Co-developed-by: Benedict Schlueter <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Piotr Krysiuk <[email protected]> Signed-off-by: Benedict Schlueter <[email protected]> Acked-by: Alexei Starovoitov <[email protected]>
2021-07-29bpf: Introduce BPF nospec instruction for mitigating Spectre v4Daniel Borkmann14-8/+102
In case of JITs, each of the JIT backends compiles the BPF nospec instruction /either/ to a machine instruction which emits a speculation barrier /or/ to /no/ machine instruction in case the underlying architecture is not affected by Speculative Store Bypass or has different mitigations in place already. This covers both x86 and (implicitly) arm64: In case of x86, we use 'lfence' instruction for mitigation. In case of arm64, we rely on the firmware mitigation as controlled via the ssbd kernel parameter. Whenever the mitigation is enabled, it works for all of the kernel code with no need to provide any additional instructions here (hence only comment in arm64 JIT). Other archs can follow as needed. The BPF nospec instruction is specifically targeting Spectre v4 since i) we don't use a serialization barrier for the Spectre v1 case, and ii) mitigation instructions for v1 and v4 might be different on some archs. The BPF nospec is required for a future commit, where the BPF verifier does annotate intermediate BPF programs with speculation barriers. Co-developed-by: Piotr Krysiuk <[email protected]> Co-developed-by: Benedict Schlueter <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: Piotr Krysiuk <[email protected]> Signed-off-by: Benedict Schlueter <[email protected]> Acked-by: Alexei Starovoitov <[email protected]>
2021-07-28Merge tag 'fixes_for_v5.14-rc4' of ↵Linus Torvalds5-14/+44
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2 and reiserfs fixes from Jan Kara: "A fix for the ext2 conversion to kmap_local() and two reiserfs hardening fixes" * tag 'fixes_for_v5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: reiserfs: check directory items on read from disk fs/ext2: Avoid page_address on pages returned by ext2_get_page reiserfs: add check for root_inode in reiserfs_fill_super
2021-07-28Merge tag 'platform-drivers-x86-v5.14-2' of ↵Linus Torvalds6-34/+265
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86 Pull x86 platform driver fixes from Hans de Goede: "A set of bug-fixes and new hardware ids. Highlights: - amd-pmc fixes - think-lmi fixes - various new hardware-ids" * tag 'platform-drivers-x86-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: platform/x86: gigabyte-wmi: add support for B550 Aorus Elite V2 platform/x86: intel-hid: add Alder Lake ACPI device ID platform/x86: think-lmi: Fix possible mem-leaks on tlmi_analyze() error-exit platform/x86: think-lmi: Split kobject_init() and kobject_add() calls platform/x86: think-lmi: Move pending_reboot_attr to the attributes sysfs dir platform/x86: amd-pmc: Fix undefined reference to __udivdi3 platform/x86: amd-pmc: Fix missing unlock on error in amd_pmc_send_cmd() platform/x86: wireless-hotkey: remove hardcoded "hp" from the error message platform/x86: amd-pmc: Use return code on suspend platform/x86: amd-pmc: Add new acpi id for future PMC controllers platform/x86: amd-pmc: Add support for ACPI ID AMDI0006 platform/x86: amd-pmc: Add support for logging s0ix counters platform/x86: amd-pmc: Add support for logging SMU metrics platform/x86: amd-pmc: call dump registers only once platform/x86: amd-pmc: Fix SMU firmware reporting mechanism platform/x86: amd-pmc: Fix command completion code platform/x86: think-lmi: Add pending_reboot support
2021-07-28dmaengine: idxd: Change license on idxd.h to LGPLTony Luck1-1/+1
This file was given GPL-2.0 license. But LGPL-2.1 makes more sense as it needs to be used by libraries outside of the kernel source tree. Signed-off-by: Tony Luck <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-28af_unix: fix garbage collect vs MSG_PEEKMiklos Szeredi1-2/+49
unix_gc() assumes that candidate sockets can never gain an external reference (i.e. be installed into an fd) while the unix_gc_lock is held. Except for MSG_PEEK this is guaranteed by modifying inflight count under the unix_gc_lock. MSG_PEEK does not touch any variable protected by unix_gc_lock (file count is not), yet it needs to be serialized with garbage collection. Do this by locking/unlocking unix_gc_lock: 1) increment file count 2) lock/unlock barrier to make sure incremented file count is visible to garbage collection 3) install file into fd This is a lock barrier (unlike smp_mb()) that ensures that garbage collection is run completely before or completely after the barrier. Cc: <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2021-07-28btrfs: fix rw device counting in __btrfs_free_extra_devidsDesmond Cheong Zhi Xi1-0/+1
When removing a writeable device in __btrfs_free_extra_devids, the rw device count should be decremented. This error was caught by Syzbot which reported a warning in close_fs_devices: WARNING: CPU: 1 PID: 9355 at fs/btrfs/volumes.c:1168 close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168 Modules linked in: CPU: 0 PID: 9355 Comm: syz-executor552 Not tainted 5.13.0-rc1-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168 RSP: 0018:ffffc9000333f2f0 EFLAGS: 00010293 RAX: ffffffff8365f5c3 RBX: 0000000000000001 RCX: ffff888029afd4c0 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000 RBP: ffff88802846f508 R08: ffffffff8365f525 R09: ffffed100337d128 R10: ffffed100337d128 R11: 0000000000000000 R12: dffffc0000000000 R13: ffff888019be8868 R14: 1ffff1100337d10d R15: 1ffff1100337d10a FS: 00007f6f53828700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000000047c410 CR3: 00000000302a6000 CR4: 00000000001506f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: btrfs_close_devices+0xc9/0x450 fs/btrfs/volumes.c:1180 open_ctree+0x8e1/0x3968 fs/btrfs/disk-io.c:3693 btrfs_fill_super fs/btrfs/super.c:1382 [inline] btrfs_mount_root+0xac5/0xc60 fs/btrfs/super.c:1749 legacy_get_tree+0xea/0x180 fs/fs_context.c:592 vfs_get_tree+0x86/0x270 fs/super.c:1498 fc_mount fs/namespace.c:993 [inline] vfs_kern_mount+0xc9/0x160 fs/namespace.c:1023 btrfs_mount+0x3d3/0xb50 fs/btrfs/super.c:1809 legacy_get_tree+0xea/0x180 fs/fs_context.c:592 vfs_get_tree+0x86/0x270 fs/super.c:1498 do_new_mount fs/namespace.c:2905 [inline] path_mount+0x196f/0x2be0 fs/namespace.c:3235 do_mount fs/namespace.c:3248 [inline] __do_sys_mount fs/namespace.c:3456 [inline] __se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433 do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47 entry_SYSCALL_64_after_hwframe+0x44/0xae Because fs_devices->rw_devices was not 0 after closing all devices. Here is the call trace that was observed: btrfs_mount_root(): btrfs_scan_one_device(): device_list_add(); <---------------- device added btrfs_open_devices(): open_fs_devices(): btrfs_open_one_device(); <-------- writable device opened, rw device count ++ btrfs_fill_super(): open_ctree(): btrfs_free_extra_devids(): __btrfs_free_extra_devids(); <--- writable device removed, rw device count not decremented fail_tree_roots: btrfs_close_devices(): close_fs_devices(); <------- rw device count off by 1 As a note, prior to commit cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device"), rw_devices was decremented on removing a writable device in __btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit was not set for the device. However, this check does not need to be reinstated as it is now redundant and incorrect. In __btrfs_free_extra_devids, we skip removing the device if it is the target for replacement. This is done by checking whether device->devid == BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check, and so it's redundant to test for that bit. Additionally, following commit 82372bc816d7 ("Btrfs: make the logic of source device removing more clear"), rw_devices is incremented whenever a writeable device is added to the alloc list (including the target device in btrfs_dev_replace_finishing), so all removals of writable devices from the alloc list should also be accompanied by a decrement to rw_devices. Reported-by: [email protected] Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device") CC: [email protected] # 5.10+ Tested-by: [email protected] Reviewed-by: Anand Jain <[email protected]> Signed-off-by: Desmond Cheong Zhi Xi <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-07-28btrfs: fix lost inode on log replay after mix of fsync, rename and inode ↵Filipe Manana1-2/+2
eviction When checking if we need to log the new name of a renamed inode, we are checking if the inode and its parent inode have been logged before, and if not we don't log the new name. The check however is buggy, as it directly compares the logged_trans field of the inodes versus the ID of the current transaction. The problem is that logged_trans is a transient field, only stored in memory and never persisted in the inode item, so if an inode was logged before, evicted and reloaded, its logged_trans field is set to a value of 0, meaning the check will return false and the new name of the renamed inode is not logged. If the old parent directory was previously fsynced and we deleted the logged directory entries corresponding to the old name, we end up with a log that when replayed will delete the renamed inode. The following example triggers the problem: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt $ mkdir /mnt/A $ mkdir /mnt/B $ echo -n "hello world" > /mnt/A/foo $ sync # Add some new file to A and fsync directory A. $ touch /mnt/A/bar $ xfs_io -c "fsync" /mnt/A # Now trigger inode eviction. We are only interested in triggering # eviction for the inode of directory A. $ echo 2 > /proc/sys/vm/drop_caches # Move foo from directory A to directory B. # This deletes the directory entries for foo in A from the log, and # does not add the new name for foo in directory B to the log, because # logged_trans of A is 0, which is less than the current transaction ID. $ mv /mnt/A/foo /mnt/B/foo # Now make an fsync to anything except A, B or any file inside them, # like for example create a file at the root directory and fsync this # new file. This syncs the log that contains all the changes done by # previous rename operation. $ touch /mnt/baz $ xfs_io -c "fsync" /mnt/baz <power fail> # Mount the filesystem and replay the log. $ mount /dev/sdc /mnt # Check the filesystem content. $ ls -1R /mnt /mnt/: A B baz /mnt/A: bar /mnt/B: $ # File foo is gone, it's neither in A/ nor in B/. Fix this by using the inode_logged() helper at btrfs_log_new_name(), which safely checks if an inode was logged before in the current transaction. A test case for fstests will follow soon. CC: [email protected] # 4.14+ Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-07-28btrfs: mark compressed range uptodate only if all bio succeedGoldwyn Rodrigues1-1/+1
In compression write endio sequence, the range which the compressed_bio writes is marked as uptodate if the last bio of the compressed (sub)bios is completed successfully. There could be previous bio which may have failed which is recorded in cb->errors. Set the writeback range as uptodate only if cb->errors is zero, as opposed to checking only the last bio's status. Backporting notes: in all versions up to 4.4 the last argument is always replaced by "!cb->errors". CC: [email protected] # 4.4+ Signed-off-by: Goldwyn Rodrigues <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2021-07-28ACPI: DPTF: Fix reading of attributesSrinivas Pandruvada1-8/+43
The current assumption that methods to read PCH FIVR attributes will return integer, is not correct. There is no good way to return integer as negative numbers are also valid. These read methods return a package of integers. The first integer returns status, which is 0 on success and any other value for failure. When the returned status is zero, then the second integer returns the actual value. This change fixes this issue by replacing acpi_evaluate_integer() with acpi_evaluate_object() and use acpi_extract_package() to extract results. Fixes: 2ce6324eadb01 ("ACPI: DPTF: Add PCH FIVR participant driver") Signed-off-by: Srinivas Pandruvada <[email protected]> Cc: 5.10+ <[email protected]> # 5.10+ Signed-off-by: Rafael J. Wysocki <[email protected]>
2021-07-28Revert "ACPI: resources: Add checks for ACPI IRQ override"Hui Wang1-8/+1
The commit 0ec4e55e9f57 ("ACPI: resources: Add checks for ACPI IRQ override") introduces regression on some platforms, at least it makes the UART can't get correct irq setting on two different platforms, and it makes the kernel can't bootup on these two platforms. This reverts commit 0ec4e55e9f571f08970ed115ec0addc691eda613. Regression-discuss: https://bugzilla.kernel.org/show_bug.cgi?id=213031 Reported-by: PGNd <[email protected]> Cc: 5.4+ <[email protected]> # 5.4+ Signed-off-by: Hui Wang <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2021-07-28io_uring: fix poll requests leaking second poll entriesHao Xu1-3/+2
For pure poll requests, it doesn't remove the second poll wait entry when it's done, neither after vfs_poll() or in the poll completion handler. We should remove the second poll wait entry. And we use io_poll_remove_double() rather than io_poll_remove_waitqs() since the latter has some redundant logic. Fixes: 88e41cf928a6 ("io_uring: add multishot mode for IORING_OP_POLL_ADD") Cc: [email protected] # 5.13+ Signed-off-by: Hao Xu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-07-28io_uring: don't block level reissue off completion pathJens Axboe1-0/+6
Some setups, like SCSI, can throw spurious -EAGAIN off the softirq completion path. Normally we expect this to happen inline as part of submission, but apparently SCSI has a weird corner case where it can happen as part of normal completions. This should be solved by having the -EAGAIN bubble back up the stack as part of submission, but previous attempts at this failed and we're not just quite there yet. Instead we currently use REQ_F_REISSUE to handle this case. For now, catch it in io_rw_should_reissue() and prevent a reissue from a bogus path. Cc: [email protected] Reported-by: Fabian Ebner <[email protected]> Tested-by: Fabian Ebner <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-07-28sis900: Fix missing pci_disable_device() in probe and removeWang Hai1-5/+2
Replace pci_enable_device() with pcim_enable_device(), pci_disable_device() and pci_release_regions() will be called in release automatically. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Hulk Robot <[email protected]> Signed-off-by: Wang Hai <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-07-28net: let flow have same hash in two directionszhang kai1-9/+9
using same source and destination ip/port for flow hash calculation within the two directions. Signed-off-by: zhang kai <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-07-28platform/x86: gigabyte-wmi: add support for B550 Aorus Elite V2Thomas Weißschuh1-0/+1
Reported as working here: https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/1#issuecomment-879398883 Signed-off-by: Thomas Weißschuh <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hans de Goede <[email protected]>
2021-07-28platform/x86: intel-hid: add Alder Lake ACPI device IDPing Bao1-0/+1
Alder Lake has a new ACPI ID for Intel HID event filter device. Signed-off-by: Ping Bao <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Hans de Goede <[email protected]>
2021-07-28HID: wacom: Skip processing of touches with negative slot valuesJason Gerecke1-0/+3
The `input_mt_get_slot_by_key` function may return a negative value if an error occurs (e.g. running out of slots). If this occurs we should really avoid reporting any data for the slot. Signed-off-by: Ping Cheng <[email protected]> Signed-off-by: Jason Gerecke <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2021-07-28HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDTJason Gerecke1-1/+1
Commit 670e90924bfe ("HID: wacom: support named keys on older devices") added support for sending named events from the soft buttons on the 24HDT and 27QHDT. In the process, however, it inadvertantly disabled the touchscreen of the 24HDT and 27QHDT by default. The `wacom_set_shared_values` function would normally enable touch by default but because it checks the state of the non-shared `has_mute_touch_switch` flag and `wacom_setup_touch_input_capabilities` sets the state of the /shared/ version, touch ends up being disabled by default. This patch sets the non-shared flag, letting `wacom_set_shared_values` take care of copying the value over to the shared version and setting the default touch state to "on". Fixes: 670e90924bfe ("HID: wacom: support named keys on older devices") CC: [email protected] # 5.4+ Signed-off-by: Jason Gerecke <[email protected]> Reviewed-by: Ping Cheng <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2021-07-28HID: Kconfig: Fix spelling mistake "Uninterruptable" -> "Uninterruptible"Colin Ian King1-1/+1
There is a spelling mistake in the Kconfig text. Fix it. Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2021-07-28HID: apple: Add support for Keychron K1 wireless keyboardHaochen Tong1-0/+2
The Keychron K1 wireless keyboard has a set of Apple-like function keys and an Fn key that works like on an Apple bluetooth keyboard. It identifies as an Apple Alu RevB ANSI keyboard (05ac:024f) over USB and BT. Use hid-apple for it so the Fn key and function keys work correctly. Signed-off-by: Haochen Tong <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2021-07-28HID: fix typo in KconfigChristophe JAILLET1-1/+1
There is a missing space in "relyingon". Add it. Signed-off-by: Christophe JAILLET <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2021-07-28nfc: nfcsim: fix use after free during module unloadKrzysztof Kozlowski1-2/+1
There is a use after free memory corruption during module exit: - nfcsim_exit() - nfcsim_device_free(dev0) - nfc_digital_unregister_device() This iterates over command queue and frees all commands, - dev->up = false - nfcsim_link_shutdown() - nfcsim_link_recv_wake() This wakes the sleeping thread nfcsim_link_recv_skb(). - nfcsim_link_recv_skb() Wake from wait_event_interruptible_timeout(), call directly the deb->cb callback even though (dev->up == false), - digital_send_cmd_complete() Dereference of "struct digital_cmd" cmd which was freed earlier by nfc_digital_unregister_device(). This causes memory corruption shortly after (with unrelated stack trace): nfc nfc0: NFC: nfcsim_recv_wq: Device is down llcp: nfc_llcp_recv: err -19 nfc nfc1: NFC: nfcsim_recv_wq: Device is down BUG: unable to handle page fault for address: ffffffffffffffed Call Trace: fsnotify+0x54b/0x5c0 __fsnotify_parent+0x1fe/0x300 ? vfs_write+0x27c/0x390 vfs_write+0x27c/0x390 ksys_write+0x63/0xe0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae KASAN report: BUG: KASAN: use-after-free in digital_send_cmd_complete+0x16/0x50 Write of size 8 at addr ffff88800a05f720 by task kworker/0:2/71 Workqueue: events nfcsim_recv_wq [nfcsim] Call Trace:  dump_stack_lvl+0x45/0x59  print_address_description.constprop.0+0x21/0x140  ? digital_send_cmd_complete+0x16/0x50  ? digital_send_cmd_complete+0x16/0x50  kasan_report.cold+0x7f/0x11b  ? digital_send_cmd_complete+0x16/0x50  ? digital_dep_link_down+0x60/0x60  digital_send_cmd_complete+0x16/0x50  nfcsim_recv_wq+0x38f/0x3d5 [nfcsim]  ? nfcsim_in_send_cmd+0x4a/0x4a [nfcsim]  ? lock_is_held_type+0x98/0x110  ? finish_wait+0x110/0x110  ? rcu_read_lock_sched_held+0x9c/0xd0  ? rcu_read_lock_bh_held+0xb0/0xb0  ? lockdep_hardirqs_on_prepare+0x12e/0x1f0 This flow of calling digital_send_cmd_complete() callback on driver exit is specific to nfcsim which implements reading and sending work queues. Since the NFC digital device was unregistered, the callback should not be called. Fixes: 204bddcb508f ("NFC: nfcsim: Make use of the Digital layer") Cc: <[email protected]> Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-07-28tulip: windbond-840: Fix missing pci_disable_device() in probe and removeWang Hai1-5/+2
Replace pci_enable_device() with pcim_enable_device(), pci_disable_device() and pci_release_regions() will be called in release automatically. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Hulk Robot <[email protected]> Signed-off-by: Wang Hai <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-07-28sctp: fix return value check in __sctp_rcv_asconf_lookupMarcelo Ricardo Leitner1-1/+1
As Ben Hutchings noticed, this check should have been inverted: the call returns true in case of success. Reported-by: Ben Hutchings <[email protected]> Fixes: 0c5dc070ff3d ("sctp: validate from_addr_param return") Signed-off-by: Marcelo Ricardo Leitner <[email protected]> Reviewed-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-07-28nfc: s3fwrn5: fix undefined parameter values in dev_err()Tang Bin1-1/+1
In the function s3fwrn5_fw_download(), the 'ret' is not assigned, so the correct value should be given in dev_err function. Fixes: a0302ff5906a ("nfc: s3fwrn5: remove unnecessary label") Signed-off-by: Zhang Shengju <[email protected]> Signed-off-by: Tang Bin <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-07-28Merge tag 'mlx5-fixes-2021-07-27' of ↵David S. Miller10-32/+98
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-07-27 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-07-27block: delay freeing the gendiskChristoph Hellwig2-2/+3
blkdev_get_no_open acquires a reference to the block_device through the block device inode and then tries to acquire a device model reference to the gendisk. But at this point the disk migh already be freed (although the race is free). Fix this by only freeing the gendisk from the whole device bdevs ->free_inode callback as well. Fixes: 22ae8ce8b892 ("block: simplify bdev/disk lookup in blkdev_get") Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-07-27blk-iocost: fix operation ordering in iocg_wake_fn()Tejun Heo1-5/+6
iocg_wake_fn() open-codes wait_queue_entry removal and wakeup because it wants the wq_entry to be always removed whether it ended up waking the task or not. finish_wait() tests whether wq_entry needs removal without grabbing the wait_queue lock and expects the waker to use list_del_init_careful() after all waking operations are complete, which iocg_wake_fn() didn't do. The operation order was wrong and the regular list_del_init() was used. The result is that if a waiter wakes up racing the waker, it can free pop the wq_entry off stack before the waker is still looking at it, which can lead to a backtrace like the following. [7312084.588951] general protection fault, probably for non-canonical address 0x586bf4005b2b88: 0000 [#1] SMP ... [7312084.647079] RIP: 0010:queued_spin_lock_slowpath+0x171/0x1b0 ... [7312084.858314] Call Trace: [7312084.863548] _raw_spin_lock_irqsave+0x22/0x30 [7312084.872605] try_to_wake_up+0x4c/0x4f0 [7312084.880444] iocg_wake_fn+0x71/0x80 [7312084.887763] __wake_up_common+0x71/0x140 [7312084.895951] iocg_kick_waitq+0xe8/0x2b0 [7312084.903964] ioc_rqos_throttle+0x275/0x650 [7312084.922423] __rq_qos_throttle+0x20/0x30 [7312084.930608] blk_mq_make_request+0x120/0x650 [7312084.939490] generic_make_request+0xca/0x310 [7312084.957600] submit_bio+0x173/0x200 [7312084.981806] swap_readpage+0x15c/0x240 [7312084.989646] read_swap_cache_async+0x58/0x60 [7312084.998527] swap_cluster_readahead+0x201/0x320 [7312085.023432] swapin_readahead+0x2df/0x450 [7312085.040672] do_swap_page+0x52f/0x820 [7312085.058259] handle_mm_fault+0xa16/0x1420 [7312085.066620] do_page_fault+0x2c6/0x5c0 [7312085.074459] page_fault+0x2f/0x40 Fix it by switching to list_del_init_careful() and putting it at the end. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Rik van Riel <[email protected]> Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost") Cc: [email protected] # v5.4+ Signed-off-by: Jens Axboe <[email protected]>
2021-07-27net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32Chris Mi1-1/+1
The offending refactor commit uses u16 chain wrongly. Actually, it should be u32. Fixes: c620b772152b ("net/mlx5: Refactor tc flow attributes structure") CC: Ariel Levkovich <[email protected]> Signed-off-by: Chris Mi <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-07-27net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev()Dima Chumak1-2/+31
The result of __dev_get_by_index() is not checked for NULL and then gets dereferenced immediately. Also, __dev_get_by_index() must be called while holding either RTNL lock or @dev_base_lock, which isn't satisfied by mlx5e_hairpin_get_mdev() or its callers. This makes the underlying hlist_for_each_entry() loop not safe, and can have adverse effects in itself. Fix by using dev_get_by_index() and handling nullptr return value when ifindex device is not found. Update mlx5e_hairpin_get_mdev() callers to check for possible PTR_ERR() result. Fixes: 77ab67b7f0f9 ("net/mlx5e: Basic setup of hairpin object") Addresses-Coverity: ("Dereference null return value") Signed-off-by: Dima Chumak <[email protected]> Reviewed-by: Vlad Buslov <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-07-27net/mlx5: Unload device upon firmware fatal errorAya Levin1-2/+10
When fw_fatal reporter reports an error, the firmware in not responding. Unload the device to ensure that the driver closes all its resources, even if recovery is not due (user disabled auto-recovery or reporter is in grace period). On successful recovery the device is loaded back up. Fixes: b3bd076f7501 ("net/mlx5: Report devlink health on FW fatal issues") Signed-off-by: Aya Levin <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-07-27net/mlx5e: Fix page allocation failure for ptp-RQ over SFAya Levin1-1/+1
Set the correct pci-device pointer to the ptp-RQ. This allows access to dma_mask and avoids allocation request with wrong pci-device. Fixes: a099da8ffcf6 ("net/mlx5e: Add RQ to PTP channel") Signed-off-by: Aya Levin <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-07-27net/mlx5e: Fix page allocation failure for trap-RQ over SFAya Levin1-1/+1
Set the correct device pointer to the trap-RQ, to allow access to dma_mask and avoid allocation request with the wrong pci-dev. WARNING: CPU: 1 PID: 12005 at kernel/dma/mapping.c:151 dma_map_page_attrs+0x139/0x1c0 ... all Trace: <IRQ> ? __page_pool_alloc_pages_slow+0x5a/0x210 mlx5e_post_rx_wqes+0x258/0x400 [mlx5_core] mlx5e_trap_napi_poll+0x44/0xc0 [mlx5_core] __napi_poll+0x24/0x150 net_rx_action+0x22b/0x280 __do_softirq+0xc7/0x27e do_softirq+0x61/0x80 </IRQ> __local_bh_enable_ip+0x4b/0x50 mlx5e_handle_action_trap+0x2dd/0x4d0 [mlx5_core] blocking_notifier_call_chain+0x5a/0x80 mlx5_devlink_trap_action_set+0x8b/0x100 [mlx5_core] Fixes: 5543e989fe5e ("net/mlx5e: Add trap entity to ETH driver") Signed-off-by: Aya Levin <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-07-27net/mlx5e: Consider PTP-RQ when setting RX VLAN strippingAya Levin2-2/+7
Add PTP-RQ to the loop when setting rx-vlan-offload feature via ethtool. On PTP-RQ's creation, set rx-vlan-offload into its parameters. Fixes: a099da8ffcf6 ("net/mlx5e: Add RQ to PTP channel") Signed-off-by: Aya Levin <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-07-27net/mlx5e: Add NETIF_F_HW_TC to hw_features when HTB offload is availableMaxim Mikityanskiy1-2/+3
If a feature flag is only present in features, but not in hw_features, the user can't reset it. Although hw_features may contain NETIF_F_HW_TC by the point where the driver checks whether HTB offload is supported, this flag is controlled by another condition that may not hold. Set it explicitly to make sure the user can disable it. Fixes: 214baf22870c ("net/mlx5e: Support HTB offload") Signed-off-by: Maxim Mikityanskiy <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-07-27net/mlx5e: RX, Avoid possible data corruption when relaxed ordering and LRO ↵Tariq Toukan1-1/+10
combined When HW aggregates packets for an LRO session, it writes the payload of two consecutive packets of a flow contiguously, so that they usually share a cacheline. The first byte of a packet's payload is written immediately after the last byte of the preceding packet. In this flow, there are two consecutive write requests to the shared cacheline: 1. Regular write for the earlier packet. 2. Read-modify-write for the following packet. In case of relaxed-ordering on, these two writes might be re-ordered. Using the end padding optimization (to avoid partial write for the last cacheline of a packet) becomes problematic if the two writes occur out-of-order, as the padding would overwrite payload that belongs to the following packet, causing data corruption. Avoid this by disabling the end padding optimization when both LRO and relaxed-ordering are enabled. Fixes: 17347d5430c4 ("net/mlx5e: Add support for PCI relaxed ordering") Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Moshe Shemesh <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-07-27net/mlx5: E-Switch, handle devcom events only for ports on the same deviceRoi Dayan2-4/+4
This is the same check as LAG mode checks if to enable lag. This will fix adding peer miss rules if lag is not supported and even an incorrect rules in socket direct mode. Also fix the incorrect comment on mlx5_get_next_phys_dev() as flow #1 doesn't exists. Fixes: ac004b832128 ("net/mlx5e: E-Switch, Add peer miss rules") Signed-off-by: Roi Dayan <[email protected]> Reviewed-by: Maor Dickman <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-07-27net/mlx5: E-Switch, Set destination vport vhca id only when merged eswitch ↵Maor Dickman1-3/+4
is supported Destination vport vhca id is valid flag is set only merged eswitch isn't supported. Change destination vport vhca id value to be set also only when merged eswitch is supported. Fixes: e4ad91f23f10 ("net/mlx5e: Split offloaded eswitch TC rules for port mirroring") Signed-off-by: Maor Dickman <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-07-27net/mlx5e: Disable Rx ntuple offload for uplink representorMaor Dickman1-9/+20
Rx ntuple offload is not supported in switchdev mode. Tryng to enable it cause kernel panic. BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 80000001065a5067 P4D 80000001065a5067 PUD 106594067 PMD 0 Oops: 0000 [#1] SMP PTI CPU: 7 PID: 1089 Comm: ethtool Not tainted 5.13.0-rc7_for_upstream_min_debug_2021_06_23_16_44 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:mlx5e_arfs_enable+0x70/0xd0 [mlx5_core] Code: 44 24 10 00 00 00 00 48 c7 44 24 18 00 00 00 00 49 63 c4 48 89 e2 44 89 e6 48 69 c0 20 08 00 00 48 89 ef 48 03 85 68 ac 00 00 <48> 8b 40 08 48 89 44 24 08 e8 d2 aa fd ff 48 83 05 82 96 18 00 01 RSP: 0018:ffff8881047679e0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000004000000000 RCX: 0000004000000000 RDX: ffff8881047679e0 RSI: 0000000000000000 RDI: ffff888115100880 RBP: ffff888115100880 R08: ffffffffa00f6cb0 R09: ffff888104767a18 R10: ffff8881151000a0 R11: ffff888109479540 R12: 0000000000000000 R13: ffff888104767bb8 R14: ffff888115100000 R15: ffff8881151000a0 FS: 00007f41a64ab740(0000) GS:ffff8882f5dc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 0000000104cbc005 CR4: 0000000000370ea0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: set_feature_arfs+0x1e/0x40 [mlx5_core] mlx5e_handle_feature+0x43/0xa0 [mlx5_core] mlx5e_set_features+0x139/0x1b0 [mlx5_core] __netdev_update_features+0x2b3/0xaf0 ethnl_set_features+0x176/0x3a0 ? __nla_parse+0x22/0x30 genl_family_rcv_msg_doit+0xe2/0x140 genl_rcv_msg+0xde/0x1d0 ? features_reply_size+0xe0/0xe0 ? genl_get_cmd+0xd0/0xd0 netlink_rcv_skb+0x4e/0xf0 genl_rcv+0x24/0x40 netlink_unicast+0x1f6/0x2b0 netlink_sendmsg+0x225/0x450 sock_sendmsg+0x33/0x40 __sys_sendto+0xd4/0x120 ? __sys_recvmsg+0x4e/0x90 ? exc_page_fault+0x219/0x740 __x64_sys_sendto+0x25/0x30 do_syscall_64+0x3f/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f41a65b0cba Code: d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 15 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 76 c3 0f 1f 44 00 00 55 48 83 ec 30 44 89 4c RSP: 002b:00007ffd8d688358 EFLAGS: 00000246 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 00000000010f42a0 RCX: 00007f41a65b0cba RDX: 0000000000000058 RSI: 00000000010f43b0 RDI: 0000000000000003 RBP: 000000000047ae60 R08: 00007f41a667c000 R09: 000000000000000c R10: 0000000000000000 R11: 0000000000000246 R12: 00000000010f4340 R13: 00000000010f4350 R14: 00007ffd8d688400 R15: 00000000010f42a0 Modules linked in: mlx5_vdpa vhost_iotlb vdpa xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad ib_ipoib rdma_cm iw_cm ib_cm mlx5_ib ib_uverbs ib_core overlay mlx5_core ptp pps_core fuse CR2: 0000000000000008 ---[ end trace c66523f2aba94b43 ]--- Fixes: 7a9fb35e8c3a ("net/mlx5e: Do not reload ethernet ports when changing eswitch mode") Signed-off-by: Maor Dickman <[email protected]> Reviewed-by: Roi Dayan <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-07-27net/mlx5: Fix flow table chainingMaor Gottlieb1-4/+6
Fix a bug when flow table is created in priority that already has other flow tables as shown in the below diagram. If the new flow table (FT-B) has the lowest level in the priority, we need to connect the flow tables from the previous priority (p0) to this new table. In addition when this flow table is destroyed (FT-B), we need to connect the flow tables from the previous priority (p0) to the next level flow table (FT-C) in the same priority of the destroyed table (if exists). --------- |root_ns| --------- | -------------------------------- | | | ---------- ---------- --------- |p(prio)-x| | p-y | | p-n | ---------- ---------- --------- | | ---------------- ------------------ |ns(e.g bypass)| |ns(e.g. kernel) | ---------------- ------------------ | | | ------- ------ ---- | p0 | | p1 | |p2| ------- ------ ---- | | \ -------- ------- ------ | FT-A | |FT-B | |FT-C| -------- ------- ------ Fixes: f90edfd279f3 ("net/mlx5_core: Connect flow tables") Signed-off-by: Maor Gottlieb <[email protected]> Reviewed-by: Mark Bloch <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>