aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-04-01i40e: Fix display statistics for veb_tcEryk Rybak1-6/+46
If veb-stats was enabled, the ethtool stats triggered a warning due to invalid size: 'unexpected stat size for veb.tc_%u_tx_packets'. This was due to an incorrect structure definition for the statistics. Structures and functions have been improved in line with requirements for the presentation of statistics, in particular for the functions: 'i40e_add_ethtool_stats' and 'i40e_add_stat_strings'. Fixes: 1510ae0be2a4 ("i40e: convert VEB TC stats to use an i40e_stats array") Signed-off-by: Eryk Rybak <[email protected]> Signed-off-by: Grzegorz Szczurek <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Tested-by: Dave Switzer <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-04-01i40e: fix receiving of single packets in xsk zero-copy modeMagnus Karlsson1-2/+2
Fix so that single packets are received immediately instead of in batches of 8. If you sent 1 pps to a system, you received 8 packets every 8 seconds instead of 1 packet every second. The problem behind this was that the work_done reporting from the Tx part of the driver was broken. The work_done reporting in i40e controls not only the reporting back to the napi logic but also the setting of the interrupt throttling logic. When Tx or Rx reports that it has more to do, interrupts are throttled or coalesced and when they both report that they are done, interrupts are armed right away. If the wrong work_done value is returned, the logic will start to throttle interrupts in a situation where it should have just enabled them. This leads to the undesired batching behavior seen in user-space. Fix this by returning the correct boolean value from the Tx xsk zero-copy path. Return true if there is nothing to do or if we got fewer packets to process than we asked for. Return false if we got as many packets as the budget since there might be more packets we can process. Fixes: 3106c580fb7c ("i40e: Use batched xsk Tx interfaces to increase performance") Reported-by: Sreedevi Joshi <[email protected]> Signed-off-by: Magnus Karlsson <[email protected]> Acked-by: Maciej Fijalkowski <[email protected]> Tested-by: Kiran Bhandare <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-04-01i40e: Fix inconsistent indentingArkadiusz Kubalewski1-4/+4
Fixed new static analysis findings: "warn: inconsistent indenting" - introduced lately, reported with lkp and smatch build. Fixes: 4b208eaa8078 ("i40e: Add init and default config of software based DCB") Reported-by: kernel test robot <[email protected]> Signed-off-by: Aleksandr Loktionov <[email protected]> Signed-off-by: Arkadiusz Kubalewski <[email protected]> Tested-by: Dave Switzer <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2021-04-01io_uring: fix EIOCBQUEUED iter revertPavel Begunkov1-4/+0
iov_iter_revert() is done in completion handlers that happensf before read/write returns -EIOCBQUEUED, no need to repeat reverting afterwards. Moreover, even though it may appear being just a no-op, it's actually races with 1) user forging a new iovec of a different size 2) reissue, that is done via io-wq continues completely asynchronously. Fixes: 3e6a0d3c7571c ("io_uring: fix -EAGAIN retry with IOPOLL") Signed-off-by: Pavel Begunkov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2021-04-01io_uring/io-wq: protect against sprintf overflowPavel Begunkov2-3/+3
task_pid may be large enough to not fit into the left space of TASK_COMM_LEN-sized buffers and overflow in sprintf. We not so care about uniqueness, so replace it with safer snprintf(). Reported-by: Alexey Dobriyan <[email protected]> Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/1702c6145d7e1c46fbc382f28334c02e1a3d3994.1617267273.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2021-04-01io_uring: don't mark S_ISBLK async work as unboundedJens Axboe1-1/+1
S_ISBLK is marked as unbounded work for async preparation, because it doesn't match S_ISREG. That is incorrect, as any read/write to a block device is also a bounded operation. Fix it up and ensure that S_ISBLK isn't marked unbounded. Signed-off-by: Jens Axboe <[email protected]>
2021-04-01ARM: mvebu: avoid clang -Wtautological-constant warningArnd Bergmann1-1/+1
Clang warns about the comparison when using a 32-bit phys_addr_t: drivers/bus/mvebu-mbus.c:621:17: error: result of comparison of constant 4294967296 with expression of type 'phys_addr_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] if (reg_start >= 0x100000000ULL) Add a cast to shut up the warning. Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Link: https://lore.kernel.org/r/[email protected]' Signed-off-by: Arnd Bergmann <[email protected]>
2021-04-01ARM: pxa: mainstone: avoid -Woverride-init warningArnd Bergmann1-2/+6
The default initializer at the start of the array causes a warning when building with W=1: In file included from arch/arm/mach-pxa/mainstone.c:47: arch/arm/mach-pxa/mainstone.h:124:33: error: initialized field overwritten [-Werror=override-init] 124 | #define MAINSTONE_IRQ(x) (MAINSTONE_NR_IRQS + (x)) | ^ arch/arm/mach-pxa/mainstone.h:133:33: note: in expansion of macro 'MAINSTONE_IRQ' 133 | #define MAINSTONE_S0_CD_IRQ MAINSTONE_IRQ(9) | ^~~~~~~~~~~~~ arch/arm/mach-pxa/mainstone.c:506:15: note: in expansion of macro 'MAINSTONE_S0_CD_IRQ' 506 | [5] = MAINSTONE_S0_CD_IRQ, | ^~~~~~~~~~~~~~~~~~~ Rework the initializer to list each element explicitly and only once. Signed-off-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected]' Signed-off-by: Arnd Bergmann <[email protected]>
2021-04-01ARM: omap1: fix building with clang IASArnd Bergmann1-0/+1
The clang integrated assembler fails to build one file with a complex asm instruction: arch/arm/mach-omap1/ams-delta-fiq-handler.S:249:2: error: invalid instruction, any one of the following would fix this: mov r10, #(1 << (((NR_IRQS_LEGACY + 12) - NR_IRQS_LEGACY) % 32)) @ set deferred_fiq bit ^ arch/arm/mach-omap1/ams-delta-fiq-handler.S:249:2: note: instruction requires: armv6t2 mov r10, #(1 << (((NR_IRQS_LEGACY + 12) - NR_IRQS_LEGACY) % 32)) @ set deferred_fiq bit ^ arch/arm/mach-omap1/ams-delta-fiq-handler.S:249:2: note: instruction requires: thumb2 mov r10, #(1 << (((NR_IRQS_LEGACY + 12) - NR_IRQS_LEGACY) % 32)) @ set deferred_fiq bit ^ The problem is that 'NR_IRQS_LEGACY' is not defined here. Apparently gas does not care because we first add and then subtract this number, leading to the immediate value to be the same regardless of the specific definition of NR_IRQS_LEGACY. Neither the way that 'gas' just silently builds this file, nor the way that clang IAS makes nonsensical suggestions for how to fix it is great. Fortunately there is an easy fix, which is to #include the header that contains the definition. Signed-off-by: Arnd Bergmann <[email protected]> Acked-by: Tony Lindgren <[email protected]> Link: https://lore.kernel.org/r/[email protected]' Signed-off-by: Arnd Bergmann <[email protected]>
2021-04-01soc/fsl: qbman: fix conflicting alignment attributesArnd Bergmann1-1/+1
When building with W=1, gcc points out that the __packed attribute on struct qm_eqcr_entry conflicts with the 8-byte alignment attribute on struct qm_fd inside it: drivers/soc/fsl/qbman/qman.c:189:1: error: alignment 1 of 'struct qm_eqcr_entry' is less than 8 [-Werror=packed-not-aligned] I assume that the alignment attribute is the correct one, and that qm_eqcr_entry cannot actually be unaligned in memory, so add the same alignment on the outer struct. Fixes: c535e923bb97 ("soc/fsl: Introduce DPAA 1.x QMan device driver") Signed-off-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected]' Signed-off-by: Arnd Bergmann <[email protected]>
2021-04-01ARM: keystone: fix integer overflow warningArnd Bergmann1-2/+2
clang warns about an impossible condition when building with 32-bit phys_addr_t: arch/arm/mach-keystone/keystone.c:79:16: error: result of comparison of constant 51539607551 with expression of type 'phys_addr_t' (aka 'unsigned int') is always false [-Werror,-Wtautological-constant-out-of-range-compare] mem_end > KEYSTONE_HIGH_PHYS_END) { ~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~ arch/arm/mach-keystone/keystone.c:78:16: error: result of comparison of constant 34359738368 with expression of type 'phys_addr_t' (aka 'unsigned int') is always true [-Werror,-Wtautological-constant-out-of-range-compare] if (mem_start < KEYSTONE_HIGH_PHYS_START || ~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~~~~~~~~ Change the temporary variable to a fixed-size u64 to avoid the warning. Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Acked-by: Santosh Shilimkar <[email protected]> Link: https://lore.kernel.org/r/[email protected]' Signed-off-by: Arnd Bergmann <[email protected]>
2021-04-02powerpc/vdso: Make sure vdso_wrapper.o is rebuilt everytime vdso.so is rebuiltChristophe Leroy1-0/+4
Commit bce74491c300 ("powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o") moved vdso32_wrapper.o and vdso64_wrapper.o out of arch/powerpc/kernel/vdso[32/64]/ and removed the dependencies in the Makefile. This leads to the wrappers not being re-build hence the kernel embedding the old vdso library. Add back missing dependencies to ensure vdso32_wrapper.o and vdso64_wrapper.o are rebuilt when vdso32.so.dbg and vdso64.so.dbg are changed. Fixes: bce74491c300 ("powerpc/vdso: fix unnecessary rebuilds of vgettimeofday.o") Cc: [email protected] Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/8bb015bc98c51d8ced581415b7e3d157e18da7c9.1617181918.git.christophe.leroy@csgroup.eu
2021-04-02powerpc/signal32: Fix Oops on sigreturn with unmapped VDSOChristophe Leroy1-12/+8
PPC32 encounters a KUAP fault when trying to handle a signal with VDSO unmapped. Kernel attempted to read user page (7fc07ec0) - exploit attempt? (uid: 0) BUG: Unable to handle kernel data access on read at 0x7fc07ec0 Faulting instruction address: 0xc00111d4 Oops: Kernel access of bad area, sig: 11 [#1] BE PAGE_SIZE=16K PREEMPT CMPC885 CPU: 0 PID: 353 Comm: sigreturn_vdso Not tainted 5.12.0-rc4-s3k-dev-01553-gb30c310ea220 #4814 NIP: c00111d4 LR: c0005a28 CTR: 00000000 REGS: cadb3dd0 TRAP: 0300 Not tainted (5.12.0-rc4-s3k-dev-01553-gb30c310ea220) MSR: 00009032 <EE,ME,IR,DR,RI> CR: 48000884 XER: 20000000 DAR: 7fc07ec0 DSISR: 88000000 GPR00: c0007788 cadb3e90 c28d4a40 7fc07ec0 7fc07ed0 000004e0 7fc07ce0 00000000 GPR08: 00000001 00000001 7fc07ec0 00000000 28000282 1001b828 100a0920 00000000 GPR16: 100cac0c 100b0000 105c43a4 105c5685 100d0000 100d0000 100d0000 100b2e9e GPR24: ffffffff 105c43c8 00000000 7fc07ec8 cadb3f40 cadb3ec8 c28d4a40 00000000 NIP [c00111d4] flush_icache_range+0x90/0xb4 LR [c0005a28] handle_signal32+0x1bc/0x1c4 Call Trace: [cadb3e90] [100d0000] 0x100d0000 (unreliable) [cadb3ec0] [c0007788] do_notify_resume+0x260/0x314 [cadb3f20] [c000c764] syscall_exit_prepare+0x120/0x184 [cadb3f30] [c00100b4] ret_from_syscall+0xc/0x28 --- interrupt: c00 at 0xfe807f8 NIP: 0fe807f8 LR: 10001060 CTR: c0139378 REGS: cadb3f40 TRAP: 0c00 Not tainted (5.12.0-rc4-s3k-dev-01553-gb30c310ea220) MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 28000482 XER: 20000000 GPR00: 00000025 7fc081c0 77bb1690 00000000 0000000a 28000482 00000001 0ff03a38 GPR08: 0000d032 00006de5 c28d4a40 00000009 88000482 1001b828 100a0920 00000000 GPR16: 100cac0c 100b0000 105c43a4 105c5685 100d0000 100d0000 100d0000 100b2e9e GPR24: ffffffff 105c43c8 00000000 77ba7628 10002398 10010000 10002124 00024000 NIP [0fe807f8] 0xfe807f8 LR [10001060] 0x10001060 --- interrupt: c00 Instruction dump: 38630010 7c001fac 38630010 4200fff0 7c0004ac 4c00012c 4e800020 7c001fac 2c0a0000 38630010 4082ffcc 4bffffe4 <7c00186c> 2c070000 39430010 4082ff8c ---[ end trace 3973fb72b049cb06 ]--- This is because flush_icache_range() is called on user addresses. The same problem was detected some time ago on PPC64. It was fixed by enabling KUAP in commit 59bee45b9712 ("powerpc/mm: Fix missing KUAP disable in flush_coherent_icache()"). PPC32 doesn't use flush_coherent_icache() and fallbacks on clean_dcache_range() and invalidate_icache_range(). We could fix it similarly by enabling user access in those functions, but this is overkill for just flushing two instructions. The two instructions are 8 bytes aligned, so a single dcbst/icbi is enough to flush them. Do like __patch_instruction() and inline a dcbst followed by an icbi just after the write of the instructions, while user access is still allowed. The isync is not required because rfi will be used to return to user. icbi() is handled as a read so read-write user access is needed. Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/bde9154e5351a5ac7bca3d59cdb5a5e8edacbb79.1617199569.git.christophe.leroy@csgroup.eu
2021-04-02powerpc/ptrace: Don't return error when getting/setting FP regs without ↵Christophe Leroy5-18/+20
CONFIG_PPC_FPU_REGS An #ifdef CONFIG_PPC_FPU_REGS is missing in arch_ptrace() leading to the following Oops because [REGSET_FPR] entry is not initialised in native_regsets[]. [ 41.917608] BUG: Unable to handle kernel instruction fetch [ 41.922849] Faulting instruction address: 0xff8fd228 [ 41.927760] Oops: Kernel access of bad area, sig: 11 [#1] [ 41.933089] BE PAGE_SIZE=4K PREEMPT CMPC885 [ 41.940753] Modules linked in: [ 41.943768] CPU: 0 PID: 366 Comm: gdb Not tainted 5.12.0-rc5-s3k-dev-01666-g7aac86a0f057-dirty #4835 [ 41.952800] NIP: ff8fd228 LR: c004d9e0 CTR: ff8fd228 [ 41.957790] REGS: caae9df0 TRAP: 0400 Not tainted (5.12.0-rc5-s3k-dev-01666-g7aac86a0f057-dirty) [ 41.966741] MSR: 40009032 <EE,ME,IR,DR,RI> CR: 82004248 XER: 20000000 [ 41.973540] [ 41.973540] GPR00: c004d9b4 caae9eb0 c1b64f60 c1b64520 c0713cd4 caae9eb8 c1bacdfc 00000004 [ 41.973540] GPR08: 00000200 ff8fd228 c1bac700 00001032 28004242 1061aaf4 00000001 106d64a0 [ 41.973540] GPR16: 00000000 00000000 7fa0a774 10610000 7fa0aef9 00000000 10610000 7fa0a538 [ 41.973540] GPR24: 7fa0a580 7fa0a570 c1bacc00 c1b64520 c1bacc00 caae9ee8 00000108 c0713cd4 [ 42.009685] NIP [ff8fd228] 0xff8fd228 [ 42.013300] LR [c004d9e0] __regset_get+0x100/0x124 [ 42.018036] Call Trace: [ 42.020443] [caae9eb0] [c004d9b4] __regset_get+0xd4/0x124 (unreliable) [ 42.026899] [caae9ee0] [c004da94] copy_regset_to_user+0x5c/0xb0 [ 42.032751] [caae9f10] [c002f640] sys_ptrace+0xe4/0x588 [ 42.037915] [caae9f30] [c0011010] ret_from_syscall+0x0/0x28 [ 42.043422] --- interrupt: c00 at 0xfd1f8e4 [ 42.047553] NIP: 0fd1f8e4 LR: 1004a688 CTR: 00000000 [ 42.052544] REGS: caae9f40 TRAP: 0c00 Not tainted (5.12.0-rc5-s3k-dev-01666-g7aac86a0f057-dirty) [ 42.061494] MSR: 0000d032 <EE,PR,ME,IR,DR,RI> CR: 48004442 XER: 00000000 [ 42.068551] [ 42.068551] GPR00: 0000001a 7fa0a040 77dad7e0 0000000e 00000170 00000000 7fa0a078 00000004 [ 42.068551] GPR08: 00000000 108deb88 108dda40 106d6010 44004442 1061aaf4 00000001 106d64a0 [ 42.068551] GPR16: 00000000 00000000 7fa0a774 10610000 7fa0aef9 00000000 10610000 7fa0a538 [ 42.068551] GPR24: 7fa0a580 7fa0a570 1078fe00 1078fd70 1078fd70 00000170 0fdd3244 0000000d [ 42.104696] NIP [0fd1f8e4] 0xfd1f8e4 [ 42.108225] LR [1004a688] 0x1004a688 [ 42.111753] --- interrupt: c00 [ 42.114768] Instruction dump: [ 42.117698] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX [ 42.125443] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX [ 42.133195] ---[ end trace d35616f22ab2100c ]--- Adding the missing #ifdef is not good because gdb doesn't like getting an error when getting registers. Instead, make ptrace return 0s when CONFIG_PPC_FPU_REGS is not set. Fixes: b6254ced4da6 ("powerpc/signal: Don't manage floating point regs when no FPU") Cc: [email protected] Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]> Link: https://lore.kernel.org/r/9121a44a2d50ba1af18d8aa5ada06c9a3bea8afd.1617200085.git.christophe.leroy@csgroup.eu
2021-04-01null_blk: fix command timeout completion handlingDamien Le Moal2-5/+22
Memory backed or zoned null block devices may generate actual request timeout errors due to the submission path being blocked on memory allocation or zone locking. Unlike fake timeouts or injected timeouts, the request submission path will call blk_mq_complete_request() or blk_mq_end_request() for these real timeout errors, causing a double completion and use after free situation as the block layer timeout handler executes blk_mq_rq_timed_out() and __blk_mq_free_request() in blk_mq_check_expired(). This problem often triggers a NULL pointer dereference such as: BUG: kernel NULL pointer dereference, address: 0000000000000050 RIP: 0010:blk_mq_sched_mark_restart_hctx+0x5/0x20 ... Call Trace: dd_finish_request+0x56/0x80 blk_mq_free_request+0x37/0x130 null_handle_cmd+0xbf/0x250 [null_blk] ? null_queue_rq+0x67/0xd0 [null_blk] blk_mq_dispatch_rq_list+0x122/0x850 __blk_mq_do_dispatch_sched+0xbb/0x2c0 __blk_mq_sched_dispatch_requests+0x13d/0x190 blk_mq_sched_dispatch_requests+0x30/0x60 __blk_mq_run_hw_queue+0x49/0x90 process_one_work+0x26c/0x580 worker_thread+0x55/0x3c0 ? process_one_work+0x580/0x580 kthread+0x134/0x150 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x1f/0x30 This problem very often triggers when running the full btrfs xfstests on a memory-backed zoned null block device in a VM with limited amount of memory. Avoid this by executing blk_mq_complete_request() in null_timeout_rq() only for commands that are marked for a fake timeout completion using the fake_timeout boolean in struct null_cmd. For timeout errors injected through debugfs, the timeout handler will execute blk_mq_complete_request()i as before. This is safe as the submission path does not execute complete requests in this case. In null_timeout_rq(), also make sure to set the command error field to BLK_STS_TIMEOUT and to propagate this error through to the request completion. Reported-by: Johannes Thumshirn <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Tested-by: Johannes Thumshirn <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-04-01idr test suite: Improve reporting from idr_find_test_1Matthew Wilcox (Oracle)1-1/+10
Instead of just reporting an assertion failure, report enough information that we can start diagnosing exactly went wrong. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
2021-04-01idr test suite: Create anchor before launching throbberMatthew Wilcox (Oracle)1-2/+2
The throbber could race with creation of the anchor entry and cause the IDR to have zero entries in it, which would cause the test to fail. Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
2021-04-01idr test suite: Take RCU read lock in idr_find_test_1Matthew Wilcox (Oracle)1-0/+4
When run on a single CPU, this test would frequently access already-freed memory. Due to timing, this bug never showed up on multi-CPU tests. Reported-by: Chris von Recklinghausen <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
2021-04-01radix tree test suite: Register the main thread with the RCU libraryMatthew Wilcox (Oracle)3-0/+6
Several test runners register individual worker threads with the RCU library, but neglect to register the main thread, which can lead to objects being freed while the main thread is in what appears to be an RCU critical section. Reported-by: Chris von Recklinghausen <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
2021-04-01ACPI: processor: Fix CPU0 wakeup in acpi_idle_play_dead()Vitaly Kuznetsov3-1/+9
Commit 496121c02127 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state") broke CPU0 hotplug on certain systems, e.g. I'm observing the following on AWS Nitro (e.g r5b.xlarge but other instance types are affected as well): # echo 0 > /sys/devices/system/cpu/cpu0/online # echo 1 > /sys/devices/system/cpu/cpu0/online <10 seconds delay> -bash: echo: write error: Input/output error In fact, the above mentioned commit only revealed the problem and did not introduce it. On x86, to wakeup CPU an NMI is being used and hlt_play_dead()/mwait_play_dead() loops are prepared to handle it: /* * If NMI wants to wake up CPU0, start CPU0. */ if (wakeup_cpu0()) start_cpu0(); cpuidle_play_dead() -> acpi_idle_play_dead() (which is now being called on systems where it wasn't called before the above mentioned commit) serves the same purpose but it doesn't have a path for CPU0. What happens now on wakeup is: - NMI is sent to CPU0 - wakeup_cpu0_nmi() works as expected - we get back to while (1) loop in acpi_idle_play_dead() - safe_halt() puts CPU0 to sleep again. The straightforward/minimal fix is add the special handling for CPU0 on x86 and that's what the patch is doing. Fixes: 496121c02127 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state") Signed-off-by: Vitaly Kuznetsov <[email protected]> Cc: 5.10+ <[email protected]> # 5.10+ Signed-off-by: Rafael J. Wysocki <[email protected]>
2021-04-01ASoC: codecs: lpass-rx-macro: set npl clock rate correctlySrinivas Kandagatla1-1/+1
NPL clock rate is twice the MCLK rate, so set this correctly to avoid soundwire timeouts. Fixes: af3d54b99764 ("ASoC: codecs: lpass-rx-macro: add support for lpass rx macro") Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2021-04-01ASoC: codecs: lpass-tx-macro: set npl clock rate correctlySrinivas Kandagatla1-1/+1
NPL clock rate is twice the MCLK rate, so set this correctly to avoid soundwire timeouts. Fixes: c39667ddcfc5 ("ASoC: codecs: lpass-tx-macro: add support for lpass tx macro") Signed-off-by: Srinivas Kandagatla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2021-04-01Merge tag 'imx-fixes-5.12-2' of ↵Arnd Bergmann3-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes i.MX fixes for 5.12, round 2: - Fix a system failure on imx6qdl-phytec-pfla02 board when booting from SD, by adding missing vmmc supply for SD interfaces. - Fix address typo in i.MX8MM/Q IOMUXC_SD1_DATA0_GPIO2_IO2 definition. * tag 'imx-fixes-5.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: imx6: pbab01: Set vmmc supply for both SD interfaces arm64: dts: imx8mm/q: Fix pad control of SD1_DATA0 Link: https://lore.kernel.org/r/20210330090236.GQ22955@dragon Signed-off-by: Arnd Bergmann <[email protected]>
2021-04-01Merge tag 'omap-for-v5.12/fixes-rc4-signed' of ↵Arnd Bergmann6-10/+52
git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap into arm/fixes More fixes for omaps for v5.12-rc cycle Two fixes for hangs, mmc slot order fix, and a voltage typo fix: - Remove unused duplicate sha2md5_fck clock node that can race with the OMAP4_SHA2MD5_CLKCTRL clock node for disable for unused clocks - Add aliases for omap4/5 mmc to put the slots back into the right order again - Fix typo for bionic voltage controllers that accidentally use mpu for all instances instead of mpu, core and iva - Fix random hangs for droid4 caused by missing fix from TI Android kernel tree to do a dummy smc call on cpuidle wakeup path * tag 'omap-for-v5.12/fixes-rc4-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap: ARM: OMAP4: PM: update ROM return address for OSWR and OFF ARM: OMAP4: Fix PMIC voltage domains for bionic ARM: dts: Fix moving mmc devices with aliases for omap4 & 5 ARM: dts: Drop duplicate sha2md5_fck to fix clk_disable race Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2021-04-01Merge tag 'arm-soc/for-5.12/devicetree-part2' of ↵Arnd Bergmann1-12/+0
https://github.com/Broadcom/stblinux into arm/fixes This pull request contains Broadcom ARM-based SoC changes for 5.12, please pull the following: - Florian reverts the adding of the second level interrupt controller for HDMI BSC interrupts since they collide with the main I2C controller (i2c-bcm2835). * tag 'arm-soc/for-5.12/devicetree-part2' of https://github.com/Broadcom/stblinux: Revert "ARM: dts: bcm2711: Add the BSC interrupt controller" Signed-off-by: Arnd Bergmann <[email protected]>
2021-04-01selftests: kvm: Check that TSC page value is small after KVM_SET_CLOCK(0)Vitaly Kuznetsov1-2/+11
Add a test for the issue when KVM_SET_CLOCK(0) call could cause TSC page value to go very big because of a signedness issue around hv_clock->system_time. Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-04-01KVM: x86: Prevent 'hv_clock->system_time' from going negative in ↵Vitaly Kuznetsov1-2/+17
kvm_guest_time_update() When guest time is reset with KVM_SET_CLOCK(0), it is possible for 'hv_clock->system_time' to become a small negative number. This happens because in KVM_SET_CLOCK handling we set 'kvm->arch.kvmclock_offset' based on get_kvmclock_ns(kvm) but when KVM_REQ_CLOCK_UPDATE is handled, kvm_guest_time_update() does (masterclock in use case): hv_clock.system_time = ka->master_kernel_ns + v->kvm->arch.kvmclock_offset; And 'master_kernel_ns' represents the last time when masterclock got updated, it can precede KVM_SET_CLOCK() call. Normally, this is not a problem, the difference is very small, e.g. I'm observing hv_clock.system_time = -70 ns. The issue comes from the fact that 'hv_clock.system_time' is stored as unsigned and 'system_time / 100' in compute_tsc_page_parameters() becomes a very big number. Use 'master_kernel_ns' instead of get_kvmclock_ns() when masterclock is in use and get_kvmclock_base_ns() when it's not to prevent 'system_time' from going negative. Signed-off-by: Vitaly Kuznetsov <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-04-01KVM: x86: disable interrupts while pvclock_gtod_sync_lock is takenPaolo Bonzini1-11/+14
pvclock_gtod_sync_lock can be taken with interrupts disabled if the preempt notifier calls get_kvmclock_ns to update the Xen runstate information: spin_lock include/linux/spinlock.h:354 [inline] get_kvmclock_ns+0x25/0x390 arch/x86/kvm/x86.c:2587 kvm_xen_update_runstate+0x3d/0x2c0 arch/x86/kvm/xen.c:69 kvm_xen_update_runstate_guest+0x74/0x320 arch/x86/kvm/xen.c:100 kvm_xen_runstate_set_preempted arch/x86/kvm/xen.h:96 [inline] kvm_arch_vcpu_put+0x2d8/0x5a0 arch/x86/kvm/x86.c:4062 So change the users of the spinlock to spin_lock_irqsave and spin_unlock_irqrestore. Reported-by: [email protected] Fixes: 30b5c851af79 ("KVM: x86/xen: Add support for vCPU runstate information") Cc: David Woodhouse <[email protected]> Cc: Marcelo Tosatti <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-04-01KVM: x86: reduce pvclock_gtod_sync_lock critical sectionsPaolo Bonzini1-6/+4
There is no need to include changes to vcpu->requests into the pvclock_gtod_sync_lock critical section. The changes to the shared data structures (in pvclock_update_vm_gtod_copy) already occur under the lock. Cc: David Woodhouse <[email protected]> Cc: Marcelo Tosatti <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2021-04-01Merge branch 'kvm-fix-svm-races' into kvm-masterPaolo Bonzini1-5/+23
2021-04-01KVM: SVM: ensure that EFER.SVME is set when running nested guest or on ↵Paolo Bonzini1-1/+17
nested vmexit Fixing nested_vmcb_check_save to avoid all TOC/TOU races is a bit harder in released kernels, so do the bare minimum by avoiding that EFER.SVME is cleared. This is problematic because svm_set_efer frees the data structures for nested virtualization if EFER.SVME is cleared. Also check that EFER.SVME remains set after a nested vmexit; clearing it could happen if the bit is zero in the save area that is passed to KVM_SET_NESTED_STATE (the save area of the nested state corresponds to the nested hypervisor's state and is restored on the next nested vmexit). Cc: [email protected] Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <[email protected]>
2021-04-01KVM: SVM: load control fields from VMCB12 before checking themPaolo Bonzini1-4/+6
Avoid races between check and use of the nested VMCB controls. This for example ensures that the VMRUN intercept is always reflected to the nested hypervisor, instead of being processed by the host. Without this patch, it is possible to end up with svm->nested.hsave pointing to the MSR permission bitmap for nested guests. This bug is CVE-2021-29657. Reported-by: Felix Wilhelm <[email protected]> Cc: [email protected] Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <[email protected]>
2021-04-01Merge tag 'amd-drm-fixes-5.12-2021-03-31' of ↵Dave Airlie11-20/+26
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes amd-drm-fixes-5.12-2021-03-31: amdgpu: - Polaris idle power fix - VM fix - Vangogh S3 fix - Fixes for non-4K page sizes amdkfd: - dqm fence memory corruption fix Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-04-01Merge tag 'exynos-drm-fixes-for-v5.12-rc6' of ↵Dave Airlie1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/daeinki/drm-exynos into drm-fixes Just one cleanup which drops of_gpio.h inclusion. - This header file isn't used anymore so drop it. Signed-off-by: Dave Airlie <[email protected]> From: Inki Dae <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-03-31drm/amdgpu: check alignment on CPU page for bo mapXℹ Ruoyao1-4/+4
The page table of AMDGPU requires an alignment to CPU page so we should check ioctl parameters for it. Return -EINVAL if some parameter is unaligned to CPU page, instead of corrupt the page table sliently. Reviewed-by: Christian König <[email protected]> Signed-off-by: Xi Ruoyao <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-03-31drm/amdgpu: Set a suitable dev_info.gart_page_sizeHuacai Chen1-2/+2
In Mesa, dev_info.gart_page_size is used for alignment and it was set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU driver requires an alignment on CPU pages. So, for non-4KB page system, gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE). Signed-off-by: Rui Wang <[email protected]> Signed-off-by: Huacai Chen <[email protected]> Link: https://github.com/loongson-community/linux-stable/commit/caa9c0a1 [Xi: rebased for drm-next, use max_t for checkpatch, and reworded commit message.] Signed-off-by: Xi Ruoyao <[email protected]> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 Tested-by: Dan Horák <[email protected]> Reviewed-by: Christian König <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-03-31drm/amdgpu/vangogh: don't check for dpm in is_dpm_running when in suspendAlex Deucher1-0/+5
Do the same thing we do for Renoir. We can check, but since the sbios has started DPM, it will always return true which causes the driver to skip some of the SMU init when it shouldn't. Reviewed-by: Zhan Liu <[email protected]> Acked-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-03-31drm/amdkfd: dqm fence memory corruptionQu Huang7-12/+12
Amdgpu driver uses 4-byte data type as DQM fence memory, and transmits GPU address of fence memory to microcode through query status PM4 message. However, query status PM4 message definition and microcode processing are all processed according to 8 bytes. Fence memory only allocates 4 bytes of memory, but microcode does write 8 bytes of memory, so there is a memory corruption. Changes since v1: * Change dqm->fence_addr as a u64 pointer to fix this issue, also fix up query_status and amdkfd_fence_wait_timeout function uses 64 bit fence value to make them consistent. Signed-off-by: Qu Huang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-03-31block: only update parent bi_status when bio failYufen Yu1-1/+1
For multiple split bios, if one of the bio is fail, the whole should return error to application. But we found there is a race between bio_integrity_verify_fn and bio complete, which return io success to application after one of the bio fail. The race as following: split bio(READ) kworker nvme_complete_rq blk_update_request //split error=0 bio_endio bio_integrity_endio queue_work(kintegrityd_wq, &bip->bip_work); bio_integrity_verify_fn bio_endio //split bio __bio_chain_endio if (!parent->bi_status) <interrupt entry> nvme_irq blk_update_request //parent error=7 req_bio_endio bio->bi_status = 7 //parent bio <interrupt exit> parent->bi_status = 0 parent->bi_end_io() // return bi_status=0 The bio has been split as two: split and parent. When split bio completed, it depends on kworker to do endio, while bio_integrity_verify_fn have been interrupted by parent bio complete irq handler. Then, parent bio->bi_status which have been set in irq handler will overwrite by kworker. In fact, even without the above race, we also need to conside the concurrency beteen mulitple split bio complete and update the same parent bi_status. Normally, multiple split bios will be issued to the same hctx and complete from the same irq vector. But if we have updated queue map between multiple split bios, these bios may complete on different hw queue and different irq vector. Then the concurrency update parent bi_status may cause the final status error. Suggested-by: Keith Busch <[email protected]> Signed-off-by: Yufen Yu <[email protected]> Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2021-03-31xdp: fix xdp_return_frame() kernel BUG throw for page_pool memory modelOng Boon Leong1-1/+2
xdp_return_frame() may be called outside of NAPI context to return xdpf back to page_pool. xdp_return_frame() calls __xdp_return() with napi_direct = false. For page_pool memory model, __xdp_return() calls xdp_return_frame_no_direct() unconditionally and below false negative kernel BUG throw happened under preempt-rt build: [ 430.450355] BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/3884 [ 430.451678] caller is __xdp_return+0x1ff/0x2e0 [ 430.452111] CPU: 0 PID: 3884 Comm: modprobe Tainted: G U E 5.12.0-rc2+ #45 Changes in v2: - This patch fixes the issue by making xdp_return_frame_no_direct() is only called if napi_direct = true, as recommended for better by Jesper Dangaard Brouer. Thanks! Fixes: 2539650fadbf ("xdp: Helpers for disabling napi_direct of xdp_return_frame") Signed-off-by: Ong Boon Leong <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31Revert "net: correct sk_acceptq_is_full()"Eric Dumazet1-1/+5
This reverts commit f211ac154577ec9ccf07c15f18a6abf0d9bdb4ab. We had similar attempt in the past, and we reverted it. History: 64a146513f8f12ba204b7bf5cb7e9505594ead42 [NET]: Revert incorrect accept queue backlog changes. 8488df894d05d6fa41c2bd298c335f944bb0e401 [NET]: Fix bugs in "Whether sock accept queue is full" checking I am adding a fat comment so that future attempts will be much harder. Fixes: f211ac154577 ("net: correct sk_acceptq_is_full()") Cc: iuyacan <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31Merge tag 'mlx5-fixes-2021-03-31' of ↵David S. Miller20-122/+227
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5 fixes 2021-03-31 This series introduces some fixes to mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-03-31Merge branch 'master' of ↵David S. Miller15-32/+81
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec Steffen Klassert says: ==================== pull request (net): ipsec 2021-03-31 1) Fix ipv4 pmtu checks for xfrm anf vti interfaces. From Eyal Birger. 2) There are situations where the socket passed to xfrm_output_resume() is not the same as the one attached to the skb. Use the socket passed to xfrm_output_resume() to avoid lookup failures when xfrm is used with VRFs. From Evan Nimmo. 3) Make the xfrm_state_hash_generation sequence counter per network namespace because but its write serialization lock is also per network namespace. Write protection is insufficient otherwise. From Ahmed S. Darwish. 4) Fixup sctp featue flags when used with esp offload. From Xin Long. 5) xfrm BEET mode doesn't support fragments for inner packets. This is a limitation of the protocol, so no fix possible. Warn at least to notify the user about that situation. From Xin Long. 6) Fix NULL pointer dereference on policy lookup when namespaces are uses in combination with esp offload. 7) Fix incorrect transformation on esp offload when packets get segmented at layer 3. 8) Fix some user triggered usages of WARN_ONCE in the xfrm compat layer. From Dmitry Safonov. Please pull or let me know if there are problems. ==================== Signed-off-by: David S. Miller <[email protected]>
2021-03-31net/rds: Fix a use after free in rds_message_map_pagesLv Yunlong1-1/+2
In rds_message_map_pages, the rm is freed by rds_message_put(rm). But rm is still used by rm->data.op_sg in return value. My patch assigns ERR_CAST(rm->data.op_sg) to err before the rm is freed to avoid the uaf. Fixes: 7dba92037baf3 ("net/rds: Use ERR_PTR for rds_message_alloc_sgs()") Signed-off-by: Lv Yunlong <[email protected]> Reviewed-by: Håkon Bugge <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31neighbour: Disregard DEAD dst in neigh_updateTong Zhu1-1/+1
After a short network outage, the dst_entry is timed out and put in DST_OBSOLETE_DEAD. We are in this code because arp reply comes from this neighbour after network recovers. There is a potential race condition that dst_entry is still in DST_OBSOLETE_DEAD. With that, another neighbour lookup causes more harm than good. In best case all packets in arp_queue are lost. This is counterproductive to the original goal of finding a better path for those packets. I observed a worst case with 4.x kernel where a dst_entry in DST_OBSOLETE_DEAD state is associated with loopback net_device. It leads to an ethernet header with all zero addresses. A packet with all zero source MAC address is quite deadly with mac80211, ath9k and 802.11 block ack. It fails ieee80211_find_sta_by_ifaddr in ath9k (xmit.c). Ath9k flushes tx queue (ath_tx_complete_aggr). BAW (block ack window) is not updated. BAW logic is damaged and ath9k transmission is disabled. Signed-off-by: Tong Zhu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-03-31net/mlx5e: Guarantee room for XSK wakeup NOP on async ICOSQTariq Toukan4-12/+34
XSK wakeup flow triggers an IRQ by posting a NOP WQE and hitting the doorbell on the async ICOSQ. It maintains its state so that it doesn't issue another NOP WQE if it has an outstanding one already. For this flow to work properly, the NOP post must not fail. Make sure to reserve room for the NOP WQE in all WQE posts to the async ICOSQ. Fixes: 8d94b590f1e4 ("net/mlx5e: Turn XSK ICOSQ into a general asynchronous one") Fixes: 1182f3659357 ("net/mlx5e: kTLS, Add kTLS RX HW offload support") Fixes: 0419d8c9d8f8 ("net/mlx5e: kTLS, Add kTLS RX resync support") Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Maxim Mikityanskiy <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-03-31net/mlx5e: Consider geneve_opts for encap contextsDima Chumak6-14/+51
Current algorithm for encap keys is legacy from initial vxlan implementation and doesn't take into account all possible fields of a tunnel. For example, for a Geneve tunnel, which may have additional TLV options, they are ignored when comparing encap keys and a rule can be attached to an incorrect encap entry. Fix that by introducing encap_info_equal() operation in struct mlx5e_tc_tunnel. Geneve tunnel type uses custom implementation, which extends generic algorithm and considers options if they are set. Fixes: 7f1a546e3222 ("net/mlx5e: Consider tunnel type for encap contexts") Signed-off-by: Dima Chumak <[email protected]> Reviewed-by: Vlad Buslov <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-03-31net/mlx5: Don't request more than supported EQsDaniel Jurgens1-1/+12
Calculating the number of compeltion EQs based on the number of available IRQ vectors doesn't work now that all async EQs share one IRQ. Thus the max number of EQs can be exceeded on systems with more than approximately 256 CPUs. Take this into account when calculating the number of available completion EQs. Fixes: 81bfa206032a ("net/mlx5: Use a single IRQ for all async EQs") Signed-off-by: Daniel Jurgens <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-03-31net/mlx5e: kTLS, Fix RX counters atomicityTariq Toukan5-20/+16
Some TLS RX counters increment per socket/connection, and are not protected against parallel modifications from several cores. Switch them to atomic counters by taking them out of the RQ stats into the global atomic TLS stats. In this patch, we touch 'rx_tls_ctx/del' that count the number of device-offloaded RX TLS connections added/deleted. These counters are updated in the add/del callbacks, out of the fast data-path. This change is not needed for counters that increment only in NAPI context, as they are protected by the NAPI mechanism. Keep them as tls_* counters under 'struct mlx5e_rq_stats'. Fixes: 76c1e1ac2aae ("net/mlx5e: kTLS, Add kTLS RX stats") Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Maxim Mikityanskiy <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2021-03-31net/mlx5e: kTLS, Fix TX counters atomicityTariq Toukan5-26/+33
Some TLS TX counters increment per socket/connection, and are not protected against parallel modifications from several cores. Switch them to atomic counters by taking them out of the SQ stats into the global atomic TLS stats. In this patch, we touch a single counter 'tx_tls_ctx' that counts the number of device-offloaded TX TLS connections added. Now that this counter can be increased without the for having the SQ context in hand, move it to the mlx5e_ktls_add_tx() callback where it really belongs, out of the fast data-path. This change is not needed for counters that increment only in NAPI context or under the TX lock, as they are already protected. Keep them as tls_* counters under 'struct mlx5e_sq_stats'. Fixes: d2ead1f360e8 ("net/mlx5e: Add kTLS TX HW offload support") Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Maxim Mikityanskiy <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>