Age | Commit message (Collapse) | Author | Files | Lines |
|
Daniel Borkmann says:
====================
pull-request: bpf 2021-07-29
The following pull-request contains BPF updates for your *net* tree.
We've added 9 non-merge commits during the last 14 day(s) which contain
a total of 20 files changed, 446 insertions(+), 138 deletions(-).
The main changes are:
1) Fix UBSAN out-of-bounds splat for showing XDP link fdinfo, from Lorenz Bauer.
2) Fix insufficient Spectre v4 mitigation in BPF runtime, from Daniel Borkmann,
Piotr Krysiuk and Benedict Schlueter.
3) Batch of fixes for BPF sockmap found under stress testing, from John Fastabend.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
Spectre v4 gadgets make use of memory disambiguation, which is a set of
techniques that execute memory access instructions, that is, loads and
stores, out of program order; Intel's optimization manual, section 2.4.4.5:
A load instruction micro-op may depend on a preceding store. Many
microarchitectures block loads until all preceding store addresses are
known. The memory disambiguator predicts which loads will not depend on
any previous stores. When the disambiguator predicts that a load does
not have such a dependency, the load takes its data from the L1 data
cache. Eventually, the prediction is verified. If an actual conflict is
detected, the load and all succeeding instructions are re-executed.
af86ca4e3088 ("bpf: Prevent memory disambiguation attack") tried to mitigate
this attack by sanitizing the memory locations through preemptive "fast"
(low latency) stores of zero prior to the actual "slow" (high latency) store
of a pointer value such that upon dependency misprediction the CPU then
speculatively executes the load of the pointer value and retrieves the zero
value instead of the attacker controlled scalar value previously stored at
that location, meaning, subsequent access in the speculative domain is then
redirected to the "zero page".
The sanitized preemptive store of zero prior to the actual "slow" store is
done through a simple ST instruction based on r10 (frame pointer) with
relative offset to the stack location that the verifier has been tracking
on the original used register for STX, which does not have to be r10. Thus,
there are no memory dependencies for this store, since it's only using r10
and immediate constant of zero; hence af86ca4e3088 /assumed/ a low latency
operation.
However, a recent attack demonstrated that this mitigation is not sufficient
since the preemptive store of zero could also be turned into a "slow" store
and is thus bypassed as well:
[...]
// r2 = oob address (e.g. scalar)
// r7 = pointer to map value
31: (7b) *(u64 *)(r10 -16) = r2
// r9 will remain "fast" register, r10 will become "slow" register below
32: (bf) r9 = r10
// JIT maps BPF reg to x86 reg:
// r9 -> r15 (callee saved)
// r10 -> rbp
// train store forward prediction to break dependency link between both r9
// and r10 by evicting them from the predictor's LRU table.
33: (61) r0 = *(u32 *)(r7 +24576)
34: (63) *(u32 *)(r7 +29696) = r0
35: (61) r0 = *(u32 *)(r7 +24580)
36: (63) *(u32 *)(r7 +29700) = r0
37: (61) r0 = *(u32 *)(r7 +24584)
38: (63) *(u32 *)(r7 +29704) = r0
39: (61) r0 = *(u32 *)(r7 +24588)
40: (63) *(u32 *)(r7 +29708) = r0
[...]
543: (61) r0 = *(u32 *)(r7 +25596)
544: (63) *(u32 *)(r7 +30716) = r0
// prepare call to bpf_ringbuf_output() helper. the latter will cause rbp
// to spill to stack memory while r13/r14/r15 (all callee saved regs) remain
// in hardware registers. rbp becomes slow due to push/pop latency. below is
// disasm of bpf_ringbuf_output() helper for better visual context:
//
// ffffffff8117ee20: 41 54 push r12
// ffffffff8117ee22: 55 push rbp
// ffffffff8117ee23: 53 push rbx
// ffffffff8117ee24: 48 f7 c1 fc ff ff ff test rcx,0xfffffffffffffffc
// ffffffff8117ee2b: 0f 85 af 00 00 00 jne ffffffff8117eee0 <-- jump taken
// [...]
// ffffffff8117eee0: 49 c7 c4 ea ff ff ff mov r12,0xffffffffffffffea
// ffffffff8117eee7: 5b pop rbx
// ffffffff8117eee8: 5d pop rbp
// ffffffff8117eee9: 4c 89 e0 mov rax,r12
// ffffffff8117eeec: 41 5c pop r12
// ffffffff8117eeee: c3 ret
545: (18) r1 = map[id:4]
547: (bf) r2 = r7
548: (b7) r3 = 0
549: (b7) r4 = 4
550: (85) call bpf_ringbuf_output#194288
// instruction 551 inserted by verifier \
551: (7a) *(u64 *)(r10 -16) = 0 | /both/ are now slow stores here
// storing map value pointer r7 at fp-16 | since value of r10 is "slow".
552: (7b) *(u64 *)(r10 -16) = r7 /
// following "fast" read to the same memory location, but due to dependency
// misprediction it will speculatively execute before insn 551/552 completes.
553: (79) r2 = *(u64 *)(r9 -16)
// in speculative domain contains attacker controlled r2. in non-speculative
// domain this contains r7, and thus accesses r7 +0 below.
554: (71) r3 = *(u8 *)(r2 +0)
// leak r3
As can be seen, the current speculative store bypass mitigation which the
verifier inserts at line 551 is insufficient since /both/, the write of
the zero sanitation as well as the map value pointer are a high latency
instruction due to prior memory access via push/pop of r10 (rbp) in contrast
to the low latency read in line 553 as r9 (r15) which stays in hardware
registers. Thus, architecturally, fp-16 is r7, however, microarchitecturally,
fp-16 can still be r2.
Initial thoughts to address this issue was to track spilled pointer loads
from stack and enforce their load via LDX through r10 as well so that /both/
the preemptive store of zero /as well as/ the load use the /same/ register
such that a dependency is created between the store and load. However, this
option is not sufficient either since it can be bypassed as well under
speculation. An updated attack with pointer spill/fills now _all_ based on
r10 would look as follows:
[...]
// r2 = oob address (e.g. scalar)
// r7 = pointer to map value
[...]
// longer store forward prediction training sequence than before.
2062: (61) r0 = *(u32 *)(r7 +25588)
2063: (63) *(u32 *)(r7 +30708) = r0
2064: (61) r0 = *(u32 *)(r7 +25592)
2065: (63) *(u32 *)(r7 +30712) = r0
2066: (61) r0 = *(u32 *)(r7 +25596)
2067: (63) *(u32 *)(r7 +30716) = r0
// store the speculative load address (scalar) this time after the store
// forward prediction training.
2068: (7b) *(u64 *)(r10 -16) = r2
// preoccupy the CPU store port by running sequence of dummy stores.
2069: (63) *(u32 *)(r7 +29696) = r0
2070: (63) *(u32 *)(r7 +29700) = r0
2071: (63) *(u32 *)(r7 +29704) = r0
2072: (63) *(u32 *)(r7 +29708) = r0
2073: (63) *(u32 *)(r7 +29712) = r0
2074: (63) *(u32 *)(r7 +29716) = r0
2075: (63) *(u32 *)(r7 +29720) = r0
2076: (63) *(u32 *)(r7 +29724) = r0
2077: (63) *(u32 *)(r7 +29728) = r0
2078: (63) *(u32 *)(r7 +29732) = r0
2079: (63) *(u32 *)(r7 +29736) = r0
2080: (63) *(u32 *)(r7 +29740) = r0
2081: (63) *(u32 *)(r7 +29744) = r0
2082: (63) *(u32 *)(r7 +29748) = r0
2083: (63) *(u32 *)(r7 +29752) = r0
2084: (63) *(u32 *)(r7 +29756) = r0
2085: (63) *(u32 *)(r7 +29760) = r0
2086: (63) *(u32 *)(r7 +29764) = r0
2087: (63) *(u32 *)(r7 +29768) = r0
2088: (63) *(u32 *)(r7 +29772) = r0
2089: (63) *(u32 *)(r7 +29776) = r0
2090: (63) *(u32 *)(r7 +29780) = r0
2091: (63) *(u32 *)(r7 +29784) = r0
2092: (63) *(u32 *)(r7 +29788) = r0
2093: (63) *(u32 *)(r7 +29792) = r0
2094: (63) *(u32 *)(r7 +29796) = r0
2095: (63) *(u32 *)(r7 +29800) = r0
2096: (63) *(u32 *)(r7 +29804) = r0
2097: (63) *(u32 *)(r7 +29808) = r0
2098: (63) *(u32 *)(r7 +29812) = r0
// overwrite scalar with dummy pointer; same as before, also including the
// sanitation store with 0 from the current mitigation by the verifier.
2099: (7a) *(u64 *)(r10 -16) = 0 | /both/ are now slow stores here
2100: (7b) *(u64 *)(r10 -16) = r7 | since store unit is still busy.
// load from stack intended to bypass stores.
2101: (79) r2 = *(u64 *)(r10 -16)
2102: (71) r3 = *(u8 *)(r2 +0)
// leak r3
[...]
Looking at the CPU microarchitecture, the scheduler might issue loads (such
as seen in line 2101) before stores (line 2099,2100) because the load execution
units become available while the store execution unit is still busy with the
sequence of dummy stores (line 2069-2098). And so the load may use the prior
stored scalar from r2 at address r10 -16 for speculation. The updated attack
may work less reliable on CPU microarchitectures where loads and stores share
execution resources.
This concludes that the sanitizing with zero stores from af86ca4e3088 ("bpf:
Prevent memory disambiguation attack") is insufficient. Moreover, the detection
of stack reuse from af86ca4e3088 where previously data (STACK_MISC) has been
written to a given stack slot where a pointer value is now to be stored does
not have sufficient coverage as precondition for the mitigation either; for
several reasons outlined as follows:
1) Stack content from prior program runs could still be preserved and is
therefore not "random", best example is to split a speculative store
bypass attack between tail calls, program A would prepare and store the
oob address at a given stack slot and then tail call into program B which
does the "slow" store of a pointer to the stack with subsequent "fast"
read. From program B PoV such stack slot type is STACK_INVALID, and
therefore also must be subject to mitigation.
2) The STACK_SPILL must not be coupled to register_is_const(&stack->spilled_ptr)
condition, for example, the previous content of that memory location could
also be a pointer to map or map value. Without the fix, a speculative
store bypass is not mitigated in such precondition and can then lead to
a type confusion in the speculative domain leaking kernel memory near
these pointer types.
While brainstorming on various alternative mitigation possibilities, we also
stumbled upon a retrospective from Chrome developers [0]:
[...] For variant 4, we implemented a mitigation to zero the unused memory
of the heap prior to allocation, which cost about 1% when done concurrently
and 4% for scavenging. Variant 4 defeats everything we could think of. We
explored more mitigations for variant 4 but the threat proved to be more
pervasive and dangerous than we anticipated. For example, stack slots used
by the register allocator in the optimizing compiler could be subject to
type confusion, leading to pointer crafting. Mitigating type confusion for
stack slots alone would have required a complete redesign of the backend of
the optimizing compiler, perhaps man years of work, without a guarantee of
completeness. [...]
From BPF side, the problem space is reduced, however, options are rather
limited. One idea that has been explored was to xor-obfuscate pointer spills
to the BPF stack:
[...]
// preoccupy the CPU store port by running sequence of dummy stores.
[...]
2106: (63) *(u32 *)(r7 +29796) = r0
2107: (63) *(u32 *)(r7 +29800) = r0
2108: (63) *(u32 *)(r7 +29804) = r0
2109: (63) *(u32 *)(r7 +29808) = r0
2110: (63) *(u32 *)(r7 +29812) = r0
// overwrite scalar with dummy pointer; xored with random 'secret' value
// of 943576462 before store ...
2111: (b4) w11 = 943576462
2112: (af) r11 ^= r7
2113: (7b) *(u64 *)(r10 -16) = r11
2114: (79) r11 = *(u64 *)(r10 -16)
2115: (b4) w2 = 943576462
2116: (af) r2 ^= r11
// ... and restored with the same 'secret' value with the help of AX reg.
2117: (71) r3 = *(u8 *)(r2 +0)
[...]
While the above would not prevent speculation, it would make data leakage
infeasible by directing it to random locations. In order to be effective
and prevent type confusion under speculation, such random secret would have
to be regenerated for each store. The additional complexity involved for a
tracking mechanism that prevents jumps such that restoring spilled pointers
would not get corrupted is not worth the gain for unprivileged. Hence, the
fix in here eventually opted for emitting a non-public BPF_ST | BPF_NOSPEC
instruction which the x86 JIT translates into a lfence opcode. Inserting the
latter in between the store and load instruction is one of the mitigations
options [1]. The x86 instruction manual notes:
[...] An LFENCE that follows an instruction that stores to memory might
complete before the data being stored have become globally visible. [...]
The latter meaning that the preceding store instruction finished execution
and the store is at minimum guaranteed to be in the CPU's store queue, but
it's not guaranteed to be in that CPU's L1 cache at that point (globally
visible). The latter would only be guaranteed via sfence. So the load which
is guaranteed to execute after the lfence for that local CPU would have to
rely on store-to-load forwarding. [2], in section 2.3 on store buffers says:
[...] For every store operation that is added to the ROB, an entry is
allocated in the store buffer. This entry requires both the virtual and
physical address of the target. Only if there is no free entry in the store
buffer, the frontend stalls until there is an empty slot available in the
store buffer again. Otherwise, the CPU can immediately continue adding
subsequent instructions to the ROB and execute them out of order. On Intel
CPUs, the store buffer has up to 56 entries. [...]
One small upside on the fix is that it lifts constraints from af86ca4e3088
where the sanitize_stack_off relative to r10 must be the same when coming
from different paths. The BPF_ST | BPF_NOSPEC gets emitted after a BPF_STX
or BPF_ST instruction. This happens either when we store a pointer or data
value to the BPF stack for the first time, or upon later pointer spills.
The former needs to be enforced since otherwise stale stack data could be
leaked under speculation as outlined earlier. For non-x86 JITs the BPF_ST |
BPF_NOSPEC mapping is currently optimized away, but others could emit a
speculation barrier as well if necessary. For real-world unprivileged
programs e.g. generated by LLVM, pointer spill/fill is only generated upon
register pressure and LLVM only tries to do that for pointers which are not
used often. The program main impact will be the initial BPF_ST | BPF_NOSPEC
sanitation for the STACK_INVALID case when the first write to a stack slot
occurs e.g. upon map lookup. In future we might refine ways to mitigate
the latter cost.
[0] https://arxiv.org/pdf/1902.05178.pdf
[1] https://msrc-blog.microsoft.com/2018/05/21/analysis-and-mitigation-of-speculative-store-bypass-cve-2018-3639/
[2] https://arxiv.org/pdf/1905.05725.pdf
Fixes: af86ca4e3088 ("bpf: Prevent memory disambiguation attack")
Fixes: f7cf25b2026d ("bpf: track spill/fill of constants")
Co-developed-by: Piotr Krysiuk <[email protected]>
Co-developed-by: Benedict Schlueter <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Piotr Krysiuk <[email protected]>
Signed-off-by: Benedict Schlueter <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
|
|
In case of JITs, each of the JIT backends compiles the BPF nospec instruction
/either/ to a machine instruction which emits a speculation barrier /or/ to
/no/ machine instruction in case the underlying architecture is not affected
by Speculative Store Bypass or has different mitigations in place already.
This covers both x86 and (implicitly) arm64: In case of x86, we use 'lfence'
instruction for mitigation. In case of arm64, we rely on the firmware mitigation
as controlled via the ssbd kernel parameter. Whenever the mitigation is enabled,
it works for all of the kernel code with no need to provide any additional
instructions here (hence only comment in arm64 JIT). Other archs can follow
as needed. The BPF nospec instruction is specifically targeting Spectre v4
since i) we don't use a serialization barrier for the Spectre v1 case, and
ii) mitigation instructions for v1 and v4 might be different on some archs.
The BPF nospec is required for a future commit, where the BPF verifier does
annotate intermediate BPF programs with speculation barriers.
Co-developed-by: Piotr Krysiuk <[email protected]>
Co-developed-by: Benedict Schlueter <[email protected]>
Signed-off-by: Daniel Borkmann <[email protected]>
Signed-off-by: Piotr Krysiuk <[email protected]>
Signed-off-by: Benedict Schlueter <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
|
|
We lost parsing of backupuid in the switch to new mount API.
Add it back.
Signed-off-by: Ronnie Sahlberg <[email protected]>
Reviewed-by: Shyam Prasad N <[email protected]>
Cc: <[email protected]> # v5.11+
Reported-by: Xiaoli Feng <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull ext2 and reiserfs fixes from Jan Kara:
"A fix for the ext2 conversion to kmap_local() and two reiserfs
hardening fixes"
* tag 'fixes_for_v5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
reiserfs: check directory items on read from disk
fs/ext2: Avoid page_address on pages returned by ext2_get_page
reiserfs: add check for root_inode in reiserfs_fill_super
|
|
The race happens because put_ucounts() doesn't use spinlock and
get_ucounts is not under spinlock:
CPU0 CPU1
---- ----
alloc_ucounts() put_ucounts()
spin_lock_irq(&ucounts_lock);
ucounts = find_ucounts(ns, uid, hashent);
atomic_dec_and_test(&ucounts->count))
spin_unlock_irq(&ucounts_lock);
spin_lock_irqsave(&ucounts_lock, flags);
hlist_del_init(&ucounts->node);
spin_unlock_irqrestore(&ucounts_lock, flags);
kfree(ucounts);
ucounts = get_ucounts(ucounts);
==================================================================
BUG: KASAN: use-after-free in instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
BUG: KASAN: use-after-free in atomic_add_negative include/asm-generic/atomic-instrumented.h:556 [inline]
BUG: KASAN: use-after-free in get_ucounts kernel/ucount.c:152 [inline]
BUG: KASAN: use-after-free in get_ucounts kernel/ucount.c:150 [inline]
BUG: KASAN: use-after-free in alloc_ucounts+0x19b/0x5b0 kernel/ucount.c:188
Write of size 4 at addr ffff88802821e41c by task syz-executor.4/16785
CPU: 1 PID: 16785 Comm: syz-executor.4 Not tainted 5.14.0-rc1-next-20210712-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:105
print_address_description.constprop.0.cold+0x6c/0x309 mm/kasan/report.c:233
__kasan_report mm/kasan/report.c:419 [inline]
kasan_report.cold+0x83/0xdf mm/kasan/report.c:436
check_region_inline mm/kasan/generic.c:183 [inline]
kasan_check_range+0x13d/0x180 mm/kasan/generic.c:189
instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
atomic_add_negative include/asm-generic/atomic-instrumented.h:556 [inline]
get_ucounts kernel/ucount.c:152 [inline]
get_ucounts kernel/ucount.c:150 [inline]
alloc_ucounts+0x19b/0x5b0 kernel/ucount.c:188
set_cred_ucounts+0x171/0x3a0 kernel/cred.c:684
__sys_setuid+0x285/0x400 kernel/sys.c:623
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x4665d9
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007fde54097188 EFLAGS: 00000246 ORIG_RAX: 0000000000000069
RAX: ffffffffffffffda RBX: 000000000056bf80 RCX: 00000000004665d9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000000000000ff
RBP: 00000000004bfcb9 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000056bf80
R13: 00007ffc8655740f R14: 00007fde54097300 R15: 0000000000022000
Allocated by task 16784:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track mm/kasan/common.c:46 [inline]
set_alloc_info mm/kasan/common.c:434 [inline]
____kasan_kmalloc mm/kasan/common.c:513 [inline]
____kasan_kmalloc mm/kasan/common.c:472 [inline]
__kasan_kmalloc+0x9b/0xd0 mm/kasan/common.c:522
kmalloc include/linux/slab.h:591 [inline]
kzalloc include/linux/slab.h:721 [inline]
alloc_ucounts+0x23d/0x5b0 kernel/ucount.c:169
set_cred_ucounts+0x171/0x3a0 kernel/cred.c:684
__sys_setuid+0x285/0x400 kernel/sys.c:623
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Freed by task 16785:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_set_track+0x1c/0x30 mm/kasan/common.c:46
kasan_set_free_info+0x20/0x30 mm/kasan/generic.c:360
____kasan_slab_free mm/kasan/common.c:366 [inline]
____kasan_slab_free mm/kasan/common.c:328 [inline]
__kasan_slab_free+0xfb/0x130 mm/kasan/common.c:374
kasan_slab_free include/linux/kasan.h:229 [inline]
slab_free_hook mm/slub.c:1650 [inline]
slab_free_freelist_hook+0xdf/0x240 mm/slub.c:1675
slab_free mm/slub.c:3235 [inline]
kfree+0xeb/0x650 mm/slub.c:4295
put_ucounts kernel/ucount.c:200 [inline]
put_ucounts+0x117/0x150 kernel/ucount.c:192
put_cred_rcu+0x27a/0x520 kernel/cred.c:124
rcu_do_batch kernel/rcu/tree.c:2550 [inline]
rcu_core+0x7ab/0x1380 kernel/rcu/tree.c:2785
__do_softirq+0x29b/0x9c2 kernel/softirq.c:558
Last potentially related work creation:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:348
insert_work+0x48/0x370 kernel/workqueue.c:1332
__queue_work+0x5c1/0xed0 kernel/workqueue.c:1498
queue_work_on+0xee/0x110 kernel/workqueue.c:1525
queue_work include/linux/workqueue.h:507 [inline]
call_usermodehelper_exec+0x1f0/0x4c0 kernel/umh.c:435
kobject_uevent_env+0xf8f/0x1650 lib/kobject_uevent.c:618
netdev_queue_add_kobject net/core/net-sysfs.c:1621 [inline]
netdev_queue_update_kobjects+0x374/0x450 net/core/net-sysfs.c:1655
register_queue_kobjects net/core/net-sysfs.c:1716 [inline]
netdev_register_kobject+0x35a/0x430 net/core/net-sysfs.c:1959
register_netdevice+0xd33/0x1500 net/core/dev.c:10331
nsim_init_netdevsim drivers/net/netdevsim/netdev.c:317 [inline]
nsim_create+0x381/0x4d0 drivers/net/netdevsim/netdev.c:364
__nsim_dev_port_add+0x32e/0x830 drivers/net/netdevsim/dev.c:1295
nsim_dev_port_add_all+0x53/0x150 drivers/net/netdevsim/dev.c:1355
nsim_dev_probe+0xcb5/0x1190 drivers/net/netdevsim/dev.c:1496
call_driver_probe drivers/base/dd.c:517 [inline]
really_probe+0x23c/0xcd0 drivers/base/dd.c:595
__driver_probe_device+0x338/0x4d0 drivers/base/dd.c:747
driver_probe_device+0x4c/0x1a0 drivers/base/dd.c:777
__device_attach_driver+0x20b/0x2f0 drivers/base/dd.c:894
bus_for_each_drv+0x15f/0x1e0 drivers/base/bus.c:427
__device_attach+0x228/0x4a0 drivers/base/dd.c:965
bus_probe_device+0x1e4/0x290 drivers/base/bus.c:487
device_add+0xc2f/0x2180 drivers/base/core.c:3356
nsim_bus_dev_new drivers/net/netdevsim/bus.c:431 [inline]
new_device_store+0x436/0x710 drivers/net/netdevsim/bus.c:298
bus_attr_store+0x72/0xa0 drivers/base/bus.c:122
sysfs_kf_write+0x110/0x160 fs/sysfs/file.c:139
kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296
call_write_iter include/linux/fs.h:2152 [inline]
new_sync_write+0x426/0x650 fs/read_write.c:518
vfs_write+0x75a/0xa40 fs/read_write.c:605
ksys_write+0x12d/0x250 fs/read_write.c:658
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
Second to last potentially related work creation:
kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
kasan_record_aux_stack+0xe5/0x110 mm/kasan/generic.c:348
insert_work+0x48/0x370 kernel/workqueue.c:1332
__queue_work+0x5c1/0xed0 kernel/workqueue.c:1498
queue_work_on+0xee/0x110 kernel/workqueue.c:1525
queue_work include/linux/workqueue.h:507 [inline]
call_usermodehelper_exec+0x1f0/0x4c0 kernel/umh.c:435
kobject_uevent_env+0xf8f/0x1650 lib/kobject_uevent.c:618
kobject_synth_uevent+0x701/0x850 lib/kobject_uevent.c:208
uevent_store+0x20/0x50 drivers/base/core.c:2371
dev_attr_store+0x50/0x80 drivers/base/core.c:2072
sysfs_kf_write+0x110/0x160 fs/sysfs/file.c:139
kernfs_fop_write_iter+0x342/0x500 fs/kernfs/file.c:296
call_write_iter include/linux/fs.h:2152 [inline]
new_sync_write+0x426/0x650 fs/read_write.c:518
vfs_write+0x75a/0xa40 fs/read_write.c:605
ksys_write+0x12d/0x250 fs/read_write.c:658
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
The buggy address belongs to the object at ffff88802821e400
which belongs to the cache kmalloc-192 of size 192
The buggy address is located 28 bytes inside of
192-byte region [ffff88802821e400, ffff88802821e4c0)
The buggy address belongs to the page:
page:ffffea0000a08780 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2821e
flags: 0xfff00000000200(slab|node=0|zone=1|lastcpupid=0x7ff)
raw: 00fff00000000200 dead000000000100 dead000000000122 ffff888010841a00
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
page_owner tracks the page as allocated
page last allocated via order 0, migratetype Unmovable, gfp_mask 0x12cc0(GFP_KERNEL|__GFP_NOWARN|__GFP_NORETRY), pid 1, ts 12874702440, free_ts 12637793385
prep_new_page mm/page_alloc.c:2433 [inline]
get_page_from_freelist+0xa72/0x2f80 mm/page_alloc.c:4166
__alloc_pages+0x1b2/0x500 mm/page_alloc.c:5374
alloc_page_interleave+0x1e/0x200 mm/mempolicy.c:2119
alloc_pages+0x238/0x2a0 mm/mempolicy.c:2242
alloc_slab_page mm/slub.c:1713 [inline]
allocate_slab+0x32b/0x4c0 mm/slub.c:1853
new_slab mm/slub.c:1916 [inline]
new_slab_objects mm/slub.c:2662 [inline]
___slab_alloc+0x4ba/0x820 mm/slub.c:2825
__slab_alloc.constprop.0+0xa7/0xf0 mm/slub.c:2865
slab_alloc_node mm/slub.c:2947 [inline]
slab_alloc mm/slub.c:2989 [inline]
__kmalloc+0x312/0x330 mm/slub.c:4133
kmalloc include/linux/slab.h:596 [inline]
kzalloc include/linux/slab.h:721 [inline]
__register_sysctl_table+0x112/0x1090 fs/proc/proc_sysctl.c:1318
rds_tcp_init_net+0x1db/0x4f0 net/rds/tcp.c:551
ops_init+0xaf/0x470 net/core/net_namespace.c:140
__register_pernet_operations net/core/net_namespace.c:1137 [inline]
register_pernet_operations+0x35a/0x850 net/core/net_namespace.c:1214
register_pernet_device+0x26/0x70 net/core/net_namespace.c:1301
rds_tcp_init+0x77/0xe0 net/rds/tcp.c:717
do_one_initcall+0x103/0x650 init/main.c:1285
do_initcall_level init/main.c:1360 [inline]
do_initcalls init/main.c:1376 [inline]
do_basic_setup init/main.c:1396 [inline]
kernel_init_freeable+0x6b8/0x741 init/main.c:1598
page last free stack trace:
reset_page_owner include/linux/page_owner.h:24 [inline]
free_pages_prepare mm/page_alloc.c:1343 [inline]
free_pcp_prepare+0x312/0x7d0 mm/page_alloc.c:1394
free_unref_page_prepare mm/page_alloc.c:3329 [inline]
free_unref_page+0x19/0x690 mm/page_alloc.c:3408
__vunmap+0x783/0xb70 mm/vmalloc.c:2587
free_work+0x58/0x70 mm/vmalloc.c:82
process_one_work+0x98d/0x1630 kernel/workqueue.c:2276
worker_thread+0x658/0x11f0 kernel/workqueue.c:2422
kthread+0x3e5/0x4d0 kernel/kthread.c:319
ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
Memory state around the buggy address:
ffff88802821e300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88802821e380: 00 00 00 00 00 fc fc fc fc fc fc fc fc fc fc fc
>ffff88802821e400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88802821e480: fb fb fb fb fb fb fb fb fc fc fc fc fc fc fc fc
ffff88802821e500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
==================================================================
- The race fix has two parts.
* Changing the code to guarantee that ucounts->count is only decremented
when ucounts_lock is held. This guarantees that find_ucounts
will never find a structure with a zero reference count.
* Changing alloc_ucounts to increment ucounts->count while
ucounts_lock is held. This guarantees the reference count on the
found data structure will not be decremented to zero (and the data
structure freed) before the reference count is incremented.
-- Eric Biederman
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Reported-by: [email protected]
Fixes: b6c336528926 ("Use atomic_t for ucounts reference counting")
Cc: Hillf Danton <[email protected]>
Signed-off-by: Alexey Gladkov <[email protected]>
Link: https://lkml.kernel.org/r/7b2ace1759b281cdd2d66101d6b305deef722efb.1627397820.git.legion@kernel.org
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Hans de Goede:
"A set of bug-fixes and new hardware ids.
Highlights:
- amd-pmc fixes
- think-lmi fixes
- various new hardware-ids"
* tag 'platform-drivers-x86-v5.14-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86:
platform/x86: gigabyte-wmi: add support for B550 Aorus Elite V2
platform/x86: intel-hid: add Alder Lake ACPI device ID
platform/x86: think-lmi: Fix possible mem-leaks on tlmi_analyze() error-exit
platform/x86: think-lmi: Split kobject_init() and kobject_add() calls
platform/x86: think-lmi: Move pending_reboot_attr to the attributes sysfs dir
platform/x86: amd-pmc: Fix undefined reference to __udivdi3
platform/x86: amd-pmc: Fix missing unlock on error in amd_pmc_send_cmd()
platform/x86: wireless-hotkey: remove hardcoded "hp" from the error message
platform/x86: amd-pmc: Use return code on suspend
platform/x86: amd-pmc: Add new acpi id for future PMC controllers
platform/x86: amd-pmc: Add support for ACPI ID AMDI0006
platform/x86: amd-pmc: Add support for logging s0ix counters
platform/x86: amd-pmc: Add support for logging SMU metrics
platform/x86: amd-pmc: call dump registers only once
platform/x86: amd-pmc: Fix SMU firmware reporting mechanism
platform/x86: amd-pmc: Fix command completion code
platform/x86: think-lmi: Add pending_reboot support
|
|
This file was given GPL-2.0 license. But LGPL-2.1 makes more sense
as it needs to be used by libraries outside of the kernel source tree.
Signed-off-by: Tony Luck <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
unix_gc() assumes that candidate sockets can never gain an external
reference (i.e. be installed into an fd) while the unix_gc_lock is
held. Except for MSG_PEEK this is guaranteed by modifying inflight
count under the unix_gc_lock.
MSG_PEEK does not touch any variable protected by unix_gc_lock (file
count is not), yet it needs to be serialized with garbage collection.
Do this by locking/unlocking unix_gc_lock:
1) increment file count
2) lock/unlock barrier to make sure incremented file count is visible
to garbage collection
3) install file into fd
This is a lock barrier (unlike smp_mb()) that ensures that garbage
collection is run completely before or completely after the barrier.
Cc: <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Miklos Szeredi <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When removing a writeable device in __btrfs_free_extra_devids, the rw
device count should be decremented.
This error was caught by Syzbot which reported a warning in
close_fs_devices:
WARNING: CPU: 1 PID: 9355 at fs/btrfs/volumes.c:1168 close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168
Modules linked in:
CPU: 0 PID: 9355 Comm: syz-executor552 Not tainted 5.13.0-rc1-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
RIP: 0010:close_fs_devices+0x763/0x880 fs/btrfs/volumes.c:1168
RSP: 0018:ffffc9000333f2f0 EFLAGS: 00010293
RAX: ffffffff8365f5c3 RBX: 0000000000000001 RCX: ffff888029afd4c0
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000000
RBP: ffff88802846f508 R08: ffffffff8365f525 R09: ffffed100337d128
R10: ffffed100337d128 R11: 0000000000000000 R12: dffffc0000000000
R13: ffff888019be8868 R14: 1ffff1100337d10d R15: 1ffff1100337d10a
FS: 00007f6f53828700(0000) GS:ffff8880b9a00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000000047c410 CR3: 00000000302a6000 CR4: 00000000001506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
btrfs_close_devices+0xc9/0x450 fs/btrfs/volumes.c:1180
open_ctree+0x8e1/0x3968 fs/btrfs/disk-io.c:3693
btrfs_fill_super fs/btrfs/super.c:1382 [inline]
btrfs_mount_root+0xac5/0xc60 fs/btrfs/super.c:1749
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x86/0x270 fs/super.c:1498
fc_mount fs/namespace.c:993 [inline]
vfs_kern_mount+0xc9/0x160 fs/namespace.c:1023
btrfs_mount+0x3d3/0xb50 fs/btrfs/super.c:1809
legacy_get_tree+0xea/0x180 fs/fs_context.c:592
vfs_get_tree+0x86/0x270 fs/super.c:1498
do_new_mount fs/namespace.c:2905 [inline]
path_mount+0x196f/0x2be0 fs/namespace.c:3235
do_mount fs/namespace.c:3248 [inline]
__do_sys_mount fs/namespace.c:3456 [inline]
__se_sys_mount+0x2f9/0x3b0 fs/namespace.c:3433
do_syscall_64+0x3f/0xb0 arch/x86/entry/common.c:47
entry_SYSCALL_64_after_hwframe+0x44/0xae
Because fs_devices->rw_devices was not 0 after
closing all devices. Here is the call trace that was observed:
btrfs_mount_root():
btrfs_scan_one_device():
device_list_add(); <---------------- device added
btrfs_open_devices():
open_fs_devices():
btrfs_open_one_device(); <-------- writable device opened,
rw device count ++
btrfs_fill_super():
open_ctree():
btrfs_free_extra_devids():
__btrfs_free_extra_devids(); <--- writable device removed,
rw device count not decremented
fail_tree_roots:
btrfs_close_devices():
close_fs_devices(); <------- rw device count off by 1
As a note, prior to commit cf89af146b7e ("btrfs: dev-replace: fail
mount if we don't have replace item with target device"), rw_devices
was decremented on removing a writable device in
__btrfs_free_extra_devids only if the BTRFS_DEV_STATE_REPLACE_TGT bit
was not set for the device. However, this check does not need to be
reinstated as it is now redundant and incorrect.
In __btrfs_free_extra_devids, we skip removing the device if it is the
target for replacement. This is done by checking whether device->devid
== BTRFS_DEV_REPLACE_DEVID. Since BTRFS_DEV_STATE_REPLACE_TGT is set
only on the device with devid BTRFS_DEV_REPLACE_DEVID, no devices
should have the BTRFS_DEV_STATE_REPLACE_TGT bit set after the check,
and so it's redundant to test for that bit.
Additionally, following commit 82372bc816d7 ("Btrfs: make
the logic of source device removing more clear"), rw_devices is
incremented whenever a writeable device is added to the alloc
list (including the target device in btrfs_dev_replace_finishing), so
all removals of writable devices from the alloc list should also be
accompanied by a decrement to rw_devices.
Reported-by: [email protected]
Fixes: cf89af146b7e ("btrfs: dev-replace: fail mount if we don't have replace item with target device")
CC: [email protected] # 5.10+
Tested-by: [email protected]
Reviewed-by: Anand Jain <[email protected]>
Signed-off-by: Desmond Cheong Zhi Xi <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
eviction
When checking if we need to log the new name of a renamed inode, we are
checking if the inode and its parent inode have been logged before, and if
not we don't log the new name. The check however is buggy, as it directly
compares the logged_trans field of the inodes versus the ID of the current
transaction. The problem is that logged_trans is a transient field, only
stored in memory and never persisted in the inode item, so if an inode
was logged before, evicted and reloaded, its logged_trans field is set to
a value of 0, meaning the check will return false and the new name of the
renamed inode is not logged. If the old parent directory was previously
fsynced and we deleted the logged directory entries corresponding to the
old name, we end up with a log that when replayed will delete the renamed
inode.
The following example triggers the problem:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt
$ mkdir /mnt/A
$ mkdir /mnt/B
$ echo -n "hello world" > /mnt/A/foo
$ sync
# Add some new file to A and fsync directory A.
$ touch /mnt/A/bar
$ xfs_io -c "fsync" /mnt/A
# Now trigger inode eviction. We are only interested in triggering
# eviction for the inode of directory A.
$ echo 2 > /proc/sys/vm/drop_caches
# Move foo from directory A to directory B.
# This deletes the directory entries for foo in A from the log, and
# does not add the new name for foo in directory B to the log, because
# logged_trans of A is 0, which is less than the current transaction ID.
$ mv /mnt/A/foo /mnt/B/foo
# Now make an fsync to anything except A, B or any file inside them,
# like for example create a file at the root directory and fsync this
# new file. This syncs the log that contains all the changes done by
# previous rename operation.
$ touch /mnt/baz
$ xfs_io -c "fsync" /mnt/baz
<power fail>
# Mount the filesystem and replay the log.
$ mount /dev/sdc /mnt
# Check the filesystem content.
$ ls -1R /mnt
/mnt/:
A
B
baz
/mnt/A:
bar
/mnt/B:
$
# File foo is gone, it's neither in A/ nor in B/.
Fix this by using the inode_logged() helper at btrfs_log_new_name(), which
safely checks if an inode was logged before in the current transaction.
A test case for fstests will follow soon.
CC: [email protected] # 4.14+
Signed-off-by: Filipe Manana <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
In compression write endio sequence, the range which the compressed_bio
writes is marked as uptodate if the last bio of the compressed (sub)bios
is completed successfully. There could be previous bio which may
have failed which is recorded in cb->errors.
Set the writeback range as uptodate only if cb->errors is zero, as opposed
to checking only the last bio's status.
Backporting notes: in all versions up to 4.4 the last argument is always
replaced by "!cb->errors".
CC: [email protected] # 4.4+
Signed-off-by: Goldwyn Rodrigues <[email protected]>
Reviewed-by: David Sterba <[email protected]>
Signed-off-by: David Sterba <[email protected]>
|
|
The current assumption that methods to read PCH FIVR attributes will
return integer, is not correct. There is no good way to return integer
as negative numbers are also valid.
These read methods return a package of integers. The first integer returns
status, which is 0 on success and any other value for failure. When the
returned status is zero, then the second integer returns the actual value.
This change fixes this issue by replacing acpi_evaluate_integer() with
acpi_evaluate_object() and use acpi_extract_package() to extract results.
Fixes: 2ce6324eadb01 ("ACPI: DPTF: Add PCH FIVR participant driver")
Signed-off-by: Srinivas Pandruvada <[email protected]>
Cc: 5.10+ <[email protected]> # 5.10+
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
The commit 0ec4e55e9f57 ("ACPI: resources: Add checks for ACPI IRQ
override") introduces regression on some platforms, at least it makes
the UART can't get correct irq setting on two different platforms,
and it makes the kernel can't bootup on these two platforms.
This reverts commit 0ec4e55e9f571f08970ed115ec0addc691eda613.
Regression-discuss: https://bugzilla.kernel.org/show_bug.cgi?id=213031
Reported-by: PGNd <[email protected]>
Cc: 5.4+ <[email protected]> # 5.4+
Signed-off-by: Hui Wang <[email protected]>
Acked-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
For (2 * 1000000) / min_speed_hz < 10 to be true in naturals with zero,
the min_speed_hz must be above 200000 (i.e. 200001 rounds down to 9, so
the condition triggers). Update the comment. No functional change.
Fixes: 6fd8b8503a0dc ("spi: spi-imx: Fix out-of-order CS/SCLK operation at low speeds")
Signed-off-by: Marek Vasut <[email protected]>
Cc: Uwe Kleine-König <[email protected]>
Cc: Mark Brown <[email protected]>
Acked-by: Uwe Kleine-König <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
Include all wm* sound bindings in the section for Wolfson Micro
drivers. This section already includes the actual driver source
files.
Also update the existing entry to match all wlf,* sound bindings.
Signed-off-by: Richard Fitzgerald <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
soc_cleanup_component_debugfs will debugfs_remove_recursive
the component->debugfs_root, so adsp doesn't need to also
remove the same entry.
By doing that adsp also creates a race with core component,
which causes a NULL pointer dereference
Signed-off-by: Lucas Tanure <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
When the component level pin control functions were added they for some
no longer obvious reason handled adding prefixing of widget names. This
meant that when the lack of prefix handling in the DAPM level pin
operations was fixed by ae4fc532244b3bb4d (ASoC: dapm: use component
prefix when checking widget names) the one device using the component
level API ended up with the prefix being applied twice, causing all
lookups to fail.
Fix this by removing the redundant prefixing from the component code,
which has the nice side effect of also making that code much simpler.
Reported-by: Richard Fitzgerald <[email protected]>
Signed-off-by: Mark Brown <[email protected]>
Tested-by: Lucas Tanure <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
For pure poll requests, it doesn't remove the second poll wait entry
when it's done, neither after vfs_poll() or in the poll completion
handler. We should remove the second poll wait entry.
And we use io_poll_remove_double() rather than io_poll_remove_waitqs()
since the latter has some redundant logic.
Fixes: 88e41cf928a6 ("io_uring: add multishot mode for IORING_OP_POLL_ADD")
Cc: [email protected] # 5.13+
Signed-off-by: Hao Xu <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
Some setups, like SCSI, can throw spurious -EAGAIN off the softirq
completion path. Normally we expect this to happen inline as part
of submission, but apparently SCSI has a weird corner case where it
can happen as part of normal completions.
This should be solved by having the -EAGAIN bubble back up the stack
as part of submission, but previous attempts at this failed and we're
not just quite there yet. Instead we currently use REQ_F_REISSUE to
handle this case.
For now, catch it in io_rw_should_reissue() and prevent a reissue
from a bogus path.
Cc: [email protected]
Reported-by: Fabian Ebner <[email protected]>
Tested-by: Fabian Ebner <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
|
|
Replace pci_enable_device() with pcim_enable_device(),
pci_disable_device() and pci_release_regions() will be
called in release automatically.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Wang Hai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
using same source and destination ip/port for flow hash calculation
within the two directions.
Signed-off-by: zhang kai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Reported as working here:
https://github.com/t-8ch/linux-gigabyte-wmi-driver/issues/1#issuecomment-879398883
Signed-off-by: Thomas Weißschuh <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
|
|
Alder Lake has a new ACPI ID for Intel HID event filter device.
Signed-off-by: Ping Bao <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Hans de Goede <[email protected]>
|
|
The `input_mt_get_slot_by_key` function may return a negative value
if an error occurs (e.g. running out of slots). If this occurs we
should really avoid reporting any data for the slot.
Signed-off-by: Ping Cheng <[email protected]>
Signed-off-by: Jason Gerecke <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
Commit 670e90924bfe ("HID: wacom: support named keys on older devices")
added support for sending named events from the soft buttons on the
24HDT and 27QHDT. In the process, however, it inadvertantly disabled the
touchscreen of the 24HDT and 27QHDT by default. The
`wacom_set_shared_values` function would normally enable touch by default
but because it checks the state of the non-shared `has_mute_touch_switch`
flag and `wacom_setup_touch_input_capabilities` sets the state of the
/shared/ version, touch ends up being disabled by default.
This patch sets the non-shared flag, letting `wacom_set_shared_values`
take care of copying the value over to the shared version and setting
the default touch state to "on".
Fixes: 670e90924bfe ("HID: wacom: support named keys on older devices")
CC: [email protected] # 5.4+
Signed-off-by: Jason Gerecke <[email protected]>
Reviewed-by: Ping Cheng <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
There is a spelling mistake in the Kconfig text. Fix it.
Signed-off-by: Colin Ian King <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
The Keychron K1 wireless keyboard has a set of Apple-like function keys
and an Fn key that works like on an Apple bluetooth keyboard. It
identifies as an Apple Alu RevB ANSI keyboard (05ac:024f) over USB and
BT. Use hid-apple for it so the Fn key and function keys work correctly.
Signed-off-by: Haochen Tong <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
There is a missing space in "relyingon".
Add it.
Signed-off-by: Christophe JAILLET <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
There is a use after free memory corruption during module exit:
- nfcsim_exit()
- nfcsim_device_free(dev0)
- nfc_digital_unregister_device()
This iterates over command queue and frees all commands,
- dev->up = false
- nfcsim_link_shutdown()
- nfcsim_link_recv_wake()
This wakes the sleeping thread nfcsim_link_recv_skb().
- nfcsim_link_recv_skb()
Wake from wait_event_interruptible_timeout(),
call directly the deb->cb callback even though (dev->up == false),
- digital_send_cmd_complete()
Dereference of "struct digital_cmd" cmd which was freed earlier by
nfc_digital_unregister_device().
This causes memory corruption shortly after (with unrelated stack
trace):
nfc nfc0: NFC: nfcsim_recv_wq: Device is down
llcp: nfc_llcp_recv: err -19
nfc nfc1: NFC: nfcsim_recv_wq: Device is down
BUG: unable to handle page fault for address: ffffffffffffffed
Call Trace:
fsnotify+0x54b/0x5c0
__fsnotify_parent+0x1fe/0x300
? vfs_write+0x27c/0x390
vfs_write+0x27c/0x390
ksys_write+0x63/0xe0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x44/0xae
KASAN report:
BUG: KASAN: use-after-free in digital_send_cmd_complete+0x16/0x50
Write of size 8 at addr ffff88800a05f720 by task kworker/0:2/71
Workqueue: events nfcsim_recv_wq [nfcsim]
Call Trace:
dump_stack_lvl+0x45/0x59
print_address_description.constprop.0+0x21/0x140
? digital_send_cmd_complete+0x16/0x50
? digital_send_cmd_complete+0x16/0x50
kasan_report.cold+0x7f/0x11b
? digital_send_cmd_complete+0x16/0x50
? digital_dep_link_down+0x60/0x60
digital_send_cmd_complete+0x16/0x50
nfcsim_recv_wq+0x38f/0x3d5 [nfcsim]
? nfcsim_in_send_cmd+0x4a/0x4a [nfcsim]
? lock_is_held_type+0x98/0x110
? finish_wait+0x110/0x110
? rcu_read_lock_sched_held+0x9c/0xd0
? rcu_read_lock_bh_held+0xb0/0xb0
? lockdep_hardirqs_on_prepare+0x12e/0x1f0
This flow of calling digital_send_cmd_complete() callback on driver exit
is specific to nfcsim which implements reading and sending work queues.
Since the NFC digital device was unregistered, the callback should not
be called.
Fixes: 204bddcb508f ("NFC: nfcsim: Make use of the Digital layer")
Cc: <[email protected]>
Signed-off-by: Krzysztof Kozlowski <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Replace pci_enable_device() with pcim_enable_device(),
pci_disable_device() and pci_release_regions() will be
called in release automatically.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Hulk Robot <[email protected]>
Signed-off-by: Wang Hai <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
As Ben Hutchings noticed, this check should have been inverted: the call
returns true in case of success.
Reported-by: Ben Hutchings <[email protected]>
Fixes: 0c5dc070ff3d ("sctp: validate from_addr_param return")
Signed-off-by: Marcelo Ricardo Leitner <[email protected]>
Reviewed-by: Xin Long <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In the function s3fwrn5_fw_download(), the 'ret' is not assigned,
so the correct value should be given in dev_err function.
Fixes: a0302ff5906a ("nfc: s3fwrn5: remove unnecessary label")
Signed-off-by: Zhang Shengju <[email protected]>
Signed-off-by: Tang Bin <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5 fixes 2021-07-27
This series introduces some fixes to mlx5 driver.
Please pull and let me know if there is any problem.
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
yet available
If the router_xlate can not find the controller in the available DMA
devices then it should return with -EPORBE_DEFER in a same way as the
of_dma_request_slave_channel() does.
The issue can be reproduced if the event router is registered before the
DMA controller itself and a driver would request for a channel before the
controller is registered.
In of_dma_request_slave_channel():
1. of_dma_find_controller() would find the dma_router
2. ofdma->of_dma_xlate() would fail and returned NULL
3. -ENODEV is returned as error code
with this patch we would return in this case the correct -EPROBE_DEFER and
the client can try to request the channel later.
Signed-off-by: Peter Ujfalusi <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
|
|
pm_runtime_get_sync will increment pm usage counter
even it failed. Forgetting to putting operation will
result in reference leak here. We fix it by replacing
it with pm_runtime_resume_and_get to keep usage counter
balanced.
Fixes: 4f3ceca254e0f ("dmaengine: stm32-dmamux: Add PM Runtime support")
Signed-off-by: Zhang Qilong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
|
|
pm_runtime_get_sync will increment pm usage counter
even it failed. Forgetting to putting operation will
result in reference leak here. We fix it by replacing
it with pm_runtime_resume_and_get to keep usage counter
balanced.
Fixes: 48bc73ba14bcd ("dmaengine: stm32-dma: Add PM Runtime support")
Fixes: 05f8740a0e6fc ("dmaengine: stm32-dma: add suspend/resume power management support")
Signed-off-by: Zhang Qilong <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Vinod Koul <[email protected]>
|
|
SET_IDLE value must be shifted 8 bits to the right to get duration.
This confirmed by USBCV test.
Fixes: afcff6dc690e ("usb: gadget: f_hid: added GET_IDLE and SET_IDLE handlers")
Cc: stable <[email protected]>
Signed-off-by: Maxim Devaev <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
blkdev_get_no_open acquires a reference to the block_device through
the block device inode and then tries to acquire a device model
reference to the gendisk. But at this point the disk migh already
be freed (although the race is free). Fix this by only freeing the
gendisk from the whole device bdevs ->free_inode callback as well.
Fixes: 22ae8ce8b892 ("block: simplify bdev/disk lookup in blkdev_get")
Signed-off-by: Christoph Hellwig <[email protected]>
Reviewed-by: Josef Bacik <[email protected]>
Reviewed-by: Ming Lei <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jens Axboe <[email protected]>
|
|
iocg_wake_fn() open-codes wait_queue_entry removal and wakeup because it
wants the wq_entry to be always removed whether it ended up waking the
task or not. finish_wait() tests whether wq_entry needs removal without
grabbing the wait_queue lock and expects the waker to use
list_del_init_careful() after all waking operations are complete, which
iocg_wake_fn() didn't do. The operation order was wrong and the regular
list_del_init() was used.
The result is that if a waiter wakes up racing the waker, it can free pop
the wq_entry off stack before the waker is still looking at it, which can
lead to a backtrace like the following.
[7312084.588951] general protection fault, probably for non-canonical address 0x586bf4005b2b88: 0000 [#1] SMP
...
[7312084.647079] RIP: 0010:queued_spin_lock_slowpath+0x171/0x1b0
...
[7312084.858314] Call Trace:
[7312084.863548] _raw_spin_lock_irqsave+0x22/0x30
[7312084.872605] try_to_wake_up+0x4c/0x4f0
[7312084.880444] iocg_wake_fn+0x71/0x80
[7312084.887763] __wake_up_common+0x71/0x140
[7312084.895951] iocg_kick_waitq+0xe8/0x2b0
[7312084.903964] ioc_rqos_throttle+0x275/0x650
[7312084.922423] __rq_qos_throttle+0x20/0x30
[7312084.930608] blk_mq_make_request+0x120/0x650
[7312084.939490] generic_make_request+0xca/0x310
[7312084.957600] submit_bio+0x173/0x200
[7312084.981806] swap_readpage+0x15c/0x240
[7312084.989646] read_swap_cache_async+0x58/0x60
[7312084.998527] swap_cluster_readahead+0x201/0x320
[7312085.023432] swapin_readahead+0x2df/0x450
[7312085.040672] do_swap_page+0x52f/0x820
[7312085.058259] handle_mm_fault+0xa16/0x1420
[7312085.066620] do_page_fault+0x2c6/0x5c0
[7312085.074459] page_fault+0x2f/0x40
Fix it by switching to list_del_init_careful() and putting it at the end.
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Rik van Riel <[email protected]>
Fixes: 7caa47151ab2 ("blkcg: implement blk-iocost")
Cc: [email protected] # v5.4+
Signed-off-by: Jens Axboe <[email protected]>
|
|
0fa294fb1985 ("cgroup: Replace cgroup_rstat_mutex with a spinlock") added
cgroup_rstat_flush_irqsafe() allowing flushing to happen from the irq
context. However, rstat paths use u64_stats_sync to synchronize access to
64bit stat counters on 32bit machines. u64_stats_sync is implemented using
seq_lock and trying to read from an irq context can lead to A-A deadlock if
the irq happens to interrupt the stat update.
Fix it by using the irqsafe variants - u64_stats_update_begin_irqsave() and
u64_stats_update_end_irqrestore() - in the update paths. Note that none of
this matters on 64bit machines. All these are just for 32bit SMP setups.
Note that the interface was introduced way back, its first and currently
only use was recently added by 2d146aa3aa84 ("mm: memcontrol: switch to
rstat"). Stable tagging targets this commit.
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Rik van Riel <[email protected]>
Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")
Cc: [email protected] # v5.13+
|
|
The offending refactor commit uses u16 chain wrongly. Actually, it
should be u32.
Fixes: c620b772152b ("net/mlx5: Refactor tc flow attributes structure")
CC: Ariel Levkovich <[email protected]>
Signed-off-by: Chris Mi <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
The result of __dev_get_by_index() is not checked for NULL and then gets
dereferenced immediately.
Also, __dev_get_by_index() must be called while holding either RTNL lock
or @dev_base_lock, which isn't satisfied by mlx5e_hairpin_get_mdev() or
its callers. This makes the underlying hlist_for_each_entry() loop not
safe, and can have adverse effects in itself.
Fix by using dev_get_by_index() and handling nullptr return value when
ifindex device is not found. Update mlx5e_hairpin_get_mdev() callers to
check for possible PTR_ERR() result.
Fixes: 77ab67b7f0f9 ("net/mlx5e: Basic setup of hairpin object")
Addresses-Coverity: ("Dereference null return value")
Signed-off-by: Dima Chumak <[email protected]>
Reviewed-by: Vlad Buslov <[email protected]>
Reviewed-by: Roi Dayan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
When fw_fatal reporter reports an error, the firmware in not responding.
Unload the device to ensure that the driver closes all its resources,
even if recovery is not due (user disabled auto-recovery or reporter is
in grace period). On successful recovery the device is loaded back up.
Fixes: b3bd076f7501 ("net/mlx5: Report devlink health on FW fatal issues")
Signed-off-by: Aya Levin <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Set the correct pci-device pointer to the ptp-RQ. This allows access to
dma_mask and avoids allocation request with wrong pci-device.
Fixes: a099da8ffcf6 ("net/mlx5e: Add RQ to PTP channel")
Signed-off-by: Aya Levin <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Set the correct device pointer to the trap-RQ, to allow access to
dma_mask and avoid allocation request with the wrong pci-dev.
WARNING: CPU: 1 PID: 12005 at kernel/dma/mapping.c:151 dma_map_page_attrs+0x139/0x1c0
...
all Trace:
<IRQ>
? __page_pool_alloc_pages_slow+0x5a/0x210
mlx5e_post_rx_wqes+0x258/0x400 [mlx5_core]
mlx5e_trap_napi_poll+0x44/0xc0 [mlx5_core]
__napi_poll+0x24/0x150
net_rx_action+0x22b/0x280
__do_softirq+0xc7/0x27e
do_softirq+0x61/0x80
</IRQ>
__local_bh_enable_ip+0x4b/0x50
mlx5e_handle_action_trap+0x2dd/0x4d0 [mlx5_core]
blocking_notifier_call_chain+0x5a/0x80
mlx5_devlink_trap_action_set+0x8b/0x100 [mlx5_core]
Fixes: 5543e989fe5e ("net/mlx5e: Add trap entity to ETH driver")
Signed-off-by: Aya Levin <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
Add PTP-RQ to the loop when setting rx-vlan-offload feature via ethtool.
On PTP-RQ's creation, set rx-vlan-offload into its parameters.
Fixes: a099da8ffcf6 ("net/mlx5e: Add RQ to PTP channel")
Signed-off-by: Aya Levin <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
If a feature flag is only present in features, but not in hw_features,
the user can't reset it. Although hw_features may contain NETIF_F_HW_TC
by the point where the driver checks whether HTB offload is supported,
this flag is controlled by another condition that may not hold. Set it
explicitly to make sure the user can disable it.
Fixes: 214baf22870c ("net/mlx5e: Support HTB offload")
Signed-off-by: Maxim Mikityanskiy <[email protected]>
Reviewed-by: Tariq Toukan <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
combined
When HW aggregates packets for an LRO session, it writes the payload
of two consecutive packets of a flow contiguously, so that they usually
share a cacheline.
The first byte of a packet's payload is written immediately after
the last byte of the preceding packet.
In this flow, there are two consecutive write requests to the shared
cacheline:
1. Regular write for the earlier packet.
2. Read-modify-write for the following packet.
In case of relaxed-ordering on, these two writes might be re-ordered.
Using the end padding optimization (to avoid partial write for the last
cacheline of a packet) becomes problematic if the two writes occur
out-of-order, as the padding would overwrite payload that belongs to
the following packet, causing data corruption.
Avoid this by disabling the end padding optimization when both
LRO and relaxed-ordering are enabled.
Fixes: 17347d5430c4 ("net/mlx5e: Add support for PCI relaxed ordering")
Signed-off-by: Tariq Toukan <[email protected]>
Reviewed-by: Moshe Shemesh <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|
|
This is the same check as LAG mode checks if to enable lag.
This will fix adding peer miss rules if lag is not supported
and even an incorrect rules in socket direct mode.
Also fix the incorrect comment on mlx5_get_next_phys_dev() as flow #1
doesn't exists.
Fixes: ac004b832128 ("net/mlx5e: E-Switch, Add peer miss rules")
Signed-off-by: Roi Dayan <[email protected]>
Reviewed-by: Maor Dickman <[email protected]>
Reviewed-by: Mark Bloch <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
|