Age | Commit message (Collapse) | Author | Files | Lines |
|
In the 32-bit platform, the second argument of getline is expectd to be
'size_t *'(aka 'unsigned int *'), but line_sz is of type
'unsigned long *'. Therefore, declare line_sz as size_t.
Signed-off-by: Ben Zong-You Xie <[email protected]>
Reviewed-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
Puranjay Mohan says:
====================
bpf: prevent userspace memory access
V5: https://lore.kernel.org/bpf/[email protected]/
Changes in V6:
- Disable the verifier's instrumentation in x86-64 and update the JIT to
take care of vsyscall page in addition to userspace addresses.
- Update bpf_testmod to test for vsyscall addresses.
V4: https://lore.kernel.org/bpf/[email protected]/
Changes in V5:
- Use TASK_SIZE_MAX + PAGE_SIZE, VSYSCALL_ADDR as userspace boundary in
x86-64 JIT.
- Added Acked-by: Ilya Leoshkevich <[email protected]>
V3: https://lore.kernel.org/bpf/[email protected]/
Changes in V4:
- Disable this feature on architectures that don't define
CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE.
- By doing the above, we don't need anything explicitly for s390x.
V2: https://lore.kernel.org/bpf/[email protected]/
Changes in V3:
- Return 0 from bpf_arch_uaddress_limit() in disabled case because it
returns u64.
- Modify the check in verifier to no do instrumentation when uaddress_limit
is 0.
V1: https://lore.kernel.org/bpf/[email protected]/
Changes in V2:
- Disable this feature on s390x.
With BPF_PROBE_MEM, BPF allows de-referencing an untrusted pointer. To
thwart invalid memory accesses, the JITs add an exception table entry for
all such accesses. But in case the src_reg + offset is a userspace address,
the BPF program might read that memory if the user has mapped it.
x86-64 JIT already instruments the BPF_PROBE_MEM based loads with checks to
skip loads from userspace addresses, but is doesn't check for vsyscall page
because it falls in the kernel address space but is considered a userspace
page. The second patch in this series fixes the x86-64 JIT to also skip
loads from the vsyscall page. The last patch updates the bpf_testmod so
this address can be checked as part of the selftests.
Other architectures don't have the complexity of the vsyscall address and
just need to skip loads from the userspace. To make this more scalable and
robust, the verifier is updated in the first patch to instrument
BPF_PROBE_MEM to skip loads from the userspace addresses.
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
The vsyscall is a legacy API for fast execution of system calls. It maps
a page at address VSYSCALL_ADDR into the userspace program. This address
is in the top 10MB of the address space:
ffffffffff600000 - ffffffffff600fff | 4 kB | legacy vsyscall ABI
The last commit fixes the x86-64 BPF JIT to skip accessing addresses in
this memory region. Add this address to bpf_testmod_return_ptr() so we
can make sure that it is fixed.
After this change and without the previous commit, subprogs_extable
selftest will crash the kernel.
Signed-off-by: Puranjay Mohan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
When a load is marked PROBE_MEM - e.g. due to PTR_UNTRUSTED access - the
address being loaded from is not necessarily valid. The BPF jit sets up
exception handlers for each such load which catch page faults and 0 out
the destination register.
If the address for the load is outside kernel address space, the load
will escape the exception handling and crash the kernel. To prevent this
from happening, the emits some instruction to verify that addr is > end
of userspace addresses.
x86 has a legacy vsyscall ABI where a page at address 0xffffffffff600000
is mapped with user accessible permissions. The addresses in this page
are considered userspace addresses by the fault handler. Therefore, a
BPF program accessing this page will crash the kernel.
This patch fixes the runtime checks to also check that the PROBE_MEM
address is below VSYSCALL_ADDR.
Example BPF program:
SEC("fentry/tcp_v4_connect")
int BPF_PROG(fentry_tcp_v4_connect, struct sock *sk)
{
*(volatile unsigned long *)&sk->sk_tsq_flags;
return 0;
}
BPF Assembly:
0: (79) r1 = *(u64 *)(r1 +0)
1: (79) r1 = *(u64 *)(r1 +344)
2: (b7) r0 = 0
3: (95) exit
x86-64 JIT
==========
BEFORE AFTER
------ -----
0: nopl 0x0(%rax,%rax,1) 0: nopl 0x0(%rax,%rax,1)
5: xchg %ax,%ax 5: xchg %ax,%ax
7: push %rbp 7: push %rbp
8: mov %rsp,%rbp 8: mov %rsp,%rbp
b: mov 0x0(%rdi),%rdi b: mov 0x0(%rdi),%rdi
-------------------------------------------------------------------------------
f: movabs $0x100000000000000,%r11 f: movabs $0xffffffffff600000,%r10
19: add $0x2a0,%rdi 19: mov %rdi,%r11
20: cmp %r11,%rdi 1c: add $0x2a0,%r11
23: jae 0x0000000000000029 23: sub %r10,%r11
25: xor %edi,%edi 26: movabs $0x100000000a00000,%r10
27: jmp 0x000000000000002d 30: cmp %r10,%r11
29: mov 0x0(%rdi),%rdi 33: ja 0x0000000000000039
--------------------------------\ 35: xor %edi,%edi
2d: xor %eax,%eax \ 37: jmp 0x0000000000000040
2f: leave \ 39: mov 0x2a0(%rdi),%rdi
30: ret \--------------------------------------------
40: xor %eax,%eax
42: leave
43: ret
Signed-off-by: Puranjay Mohan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
With BPF_PROBE_MEM, BPF allows de-referencing an untrusted pointer. To
thwart invalid memory accesses, the JITs add an exception table entry
for all such accesses. But in case the src_reg + offset is a userspace
address, the BPF program might read that memory if the user has
mapped it.
Make the verifier add guard instructions around such memory accesses and
skip the load if the address falls into the userspace region.
The JITs need to implement bpf_arch_uaddress_limit() to define where
the userspace addresses end for that architecture or TASK_SIZE is taken
as default.
The implementation is as follows:
REG_AX = SRC_REG
if(offset)
REG_AX += offset;
REG_AX >>= 32;
if (REG_AX <= (uaddress_limit >> 32))
DST_REG = 0;
else
DST_REG = *(size *)(SRC_REG + offset);
Comparing just the upper 32 bits of the load address with the upper
32 bits of uaddress_limit implies that the values are being aligned down
to a 4GB boundary before comparison.
The above means that all loads with address <= uaddress_limit + 4GB are
skipped. This is acceptable because there is a large hole (much larger
than 4GB) between userspace and kernel space memory, therefore a
correctly functioning BPF program should not access this 4GB memory
above the userspace.
Let's analyze what this patch does to the following fentry program
dereferencing an untrusted pointer:
SEC("fentry/tcp_v4_connect")
int BPF_PROG(fentry_tcp_v4_connect, struct sock *sk)
{
*(volatile long *)sk;
return 0;
}
BPF Program before | BPF Program after
------------------ | -----------------
0: (79) r1 = *(u64 *)(r1 +0) 0: (79) r1 = *(u64 *)(r1 +0)
-----------------------------------------------------------------------
1: (79) r1 = *(u64 *)(r1 +0) --\ 1: (bf) r11 = r1
----------------------------\ \ 2: (77) r11 >>= 32
2: (b7) r0 = 0 \ \ 3: (b5) if r11 <= 0x8000 goto pc+2
3: (95) exit \ \-> 4: (79) r1 = *(u64 *)(r1 +0)
\ 5: (05) goto pc+1
\ 6: (b7) r1 = 0
\--------------------------------------
7: (b7) r0 = 0
8: (95) exit
As you can see from above, in the best case (off=0), 5 extra instructions
are emitted.
Now, we analyze the same program after it has gone through the JITs of
ARM64 and RISC-V architectures. We follow the single load instruction
that has the untrusted pointer and see what instrumentation has been
added around it.
x86-64 JIT
==========
JIT's Instrumentation
(upstream)
---------------------
0: nopl 0x0(%rax,%rax,1)
5: xchg %ax,%ax
7: push %rbp
8: mov %rsp,%rbp
b: mov 0x0(%rdi),%rdi
---------------------------------
f: movabs $0x800000000000,%r11
19: cmp %r11,%rdi
1c: jb 0x000000000000002a
1e: mov %rdi,%r11
21: add $0x0,%r11
28: jae 0x000000000000002e
2a: xor %edi,%edi
2c: jmp 0x0000000000000032
2e: mov 0x0(%rdi),%rdi
---------------------------------
32: xor %eax,%eax
34: leave
35: ret
The x86-64 JIT already emits some instructions to protect against user
memory access. This patch doesn't make any changes for the x86-64 JIT.
ARM64 JIT
=========
No Intrumentation Verifier's Instrumentation
(upstream) (This patch)
----------------- --------------------------
0: add x9, x30, #0x0 0: add x9, x30, #0x0
4: nop 4: nop
8: paciasp 8: paciasp
c: stp x29, x30, [sp, #-16]! c: stp x29, x30, [sp, #-16]!
10: mov x29, sp 10: mov x29, sp
14: stp x19, x20, [sp, #-16]! 14: stp x19, x20, [sp, #-16]!
18: stp x21, x22, [sp, #-16]! 18: stp x21, x22, [sp, #-16]!
1c: stp x25, x26, [sp, #-16]! 1c: stp x25, x26, [sp, #-16]!
20: stp x27, x28, [sp, #-16]! 20: stp x27, x28, [sp, #-16]!
24: mov x25, sp 24: mov x25, sp
28: mov x26, #0x0 28: mov x26, #0x0
2c: sub x27, x25, #0x0 2c: sub x27, x25, #0x0
30: sub sp, sp, #0x0 30: sub sp, sp, #0x0
34: ldr x0, [x0] 34: ldr x0, [x0]
--------------------------------------------------------------------------------
38: ldr x0, [x0] ----------\ 38: add x9, x0, #0x0
-----------------------------------\\ 3c: lsr x9, x9, #32
3c: mov x7, #0x0 \\ 40: cmp x9, #0x10, lsl #12
40: mov sp, sp \\ 44: b.ls 0x0000000000000050
44: ldp x27, x28, [sp], #16 \\--> 48: ldr x0, [x0]
48: ldp x25, x26, [sp], #16 \ 4c: b 0x0000000000000054
4c: ldp x21, x22, [sp], #16 \ 50: mov x0, #0x0
50: ldp x19, x20, [sp], #16 \---------------------------------------
54: ldp x29, x30, [sp], #16 54: mov x7, #0x0
58: add x0, x7, #0x0 58: mov sp, sp
5c: autiasp 5c: ldp x27, x28, [sp], #16
60: ret 60: ldp x25, x26, [sp], #16
64: nop 64: ldp x21, x22, [sp], #16
68: ldr x10, 0x0000000000000070 68: ldp x19, x20, [sp], #16
6c: br x10 6c: ldp x29, x30, [sp], #16
70: add x0, x7, #0x0
74: autiasp
78: ret
7c: nop
80: ldr x10, 0x0000000000000088
84: br x10
There are 6 extra instructions added in ARM64 in the best case. This will
become 7 in the worst case (off != 0).
RISC-V JIT (RISCV_ISA_C Disabled)
==========
No Intrumentation Verifier's Instrumentation
(upstream) (This patch)
----------------- --------------------------
0: nop 0: nop
4: nop 4: nop
8: li a6, 33 8: li a6, 33
c: addi sp, sp, -16 c: addi sp, sp, -16
10: sd s0, 8(sp) 10: sd s0, 8(sp)
14: addi s0, sp, 16 14: addi s0, sp, 16
18: ld a0, 0(a0) 18: ld a0, 0(a0)
---------------------------------------------------------------
1c: ld a0, 0(a0) --\ 1c: mv t0, a0
--------------------------\ \ 20: srli t0, t0, 32
20: li a5, 0 \ \ 24: lui t1, 4096
24: ld s0, 8(sp) \ \ 28: sext.w t1, t1
28: addi sp, sp, 16 \ \ 2c: bgeu t1, t0, 12
2c: sext.w a0, a5 \ \--> 30: ld a0, 0(a0)
30: ret \ 34: j 8
\ 38: li a0, 0
\------------------------------
3c: li a5, 0
40: ld s0, 8(sp)
44: addi sp, sp, 16
48: sext.w a0, a5
4c: ret
There are 7 extra instructions added in RISC-V.
Fixes: 800834285361 ("bpf, arm64: Add BPF exception tables")
Reported-by: Breno Leitao <[email protected]>
Suggested-by: Alexei Starovoitov <[email protected]>
Acked-by: Ilya Leoshkevich <[email protected]>
Signed-off-by: Puranjay Mohan <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Alexei Starovoitov <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into for-next
Qualcomm driver fix for v6.9
This reworks the memory layout of the argument buffers passed to trusted
applications in QSEECOM, to avoid failures and system crashes.
* tag 'qcom-drivers-fixes-for-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
firmware: qcom: uefisecapp: Fix memory related IO errors and crashes
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into for-next
i.MX fixes for 6.9, round 2:
- Fix i.MX8MP the second CSI2 assigned-clock property which got wrong by
commit f78835d1e616 ("arm64: dts: imx8mp: reparent MEDIA_MIPI_PHY1_REF
to CLK_24M")
- Correct USB over-current polarity for imx6ull-tarragon board
* tag 'imx-fixes-6.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx6ull-tarragon: fix USB over-current polarity
arm64: dts: imx8mp: Fix assigned-clocks for second CSI2
Link: https://lore.kernel.org/r/ZioopqscxwUOwQkf@dragon
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux into for-next
MediaTek ARM64 DTS fixes for v6.9
This fixes some dts validation issues against bindings for multiple SoCs,
GPU voltage constraints for Chromebook devices, missing gce-client-reg
on various nodes (performance issues) on MT8183/92/95, and also fixes
boot issues on MT8195 when SPMI is built as module.
* tag 'mtk-dts64-fixes-for-v6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/mediatek/linux:
arm64: dts: mediatek: mt2712: fix validation errors
arm64: dts: mediatek: mt7986: prefix BPI-R3 cooling maps with "map-"
arm64: dts: mediatek: mt7986: drop invalid thermal block clock
arm64: dts: mediatek: mt7986: drop "#reset-cells" from Ethernet controller
arm64: dts: mediatek: mt7986: drop invalid properties from ethsys
arm64: dts: mediatek: mt7622: drop "reset-names" from thermal block
arm64: dts: mediatek: mt7622: fix ethernet controller "compatible"
arm64: dts: mediatek: mt7622: fix IR nodename
arm64: dts: mediatek: mt7622: fix clock controllers
arm64: dts: mediatek: mt8186-corsola: Update min voltage constraint for Vgpu
arm64: dts: mediatek: mt8183-kukui: Use default min voltage for MT6358
arm64: dts: mediatek: mt8195-cherry: Update min voltage constraint for MT6315
arm64: dts: mediatek: mt8192-asurada: Update min voltage constraint for MT6315
arm64: dts: mediatek: cherry: Describe CPU supplies
arm64: dts: mediatek: mt8195: Add missing gce-client-reg to mutex1
arm64: dts: mediatek: mt8195: Add missing gce-client-reg to mutex
arm64: dts: mediatek: mt8195: Add missing gce-client-reg to vpp/vdosys
arm64: dts: mediatek: mt8192: Add missing gce-client-reg to mutex
arm64: dts: mediatek: mt8183: Add power-domains properity to mfgcfg
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux into for-next
AT91 fixes for 6.9
It contains:
- fixes for regulator nodes on SAMA7G5 based boards: proper DT property is used
to setup regulators suspend voltage.
* tag 'at91-fixes-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/at91/linux:
ARM: dts: microchip: at91-sama7g54_curiosity: Replace regulator-suspend-voltage with the valid property
ARM: dts: microchip: at91-sama7g5ek: Replace regulator-suspend-voltage with the valid property
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into for-next
Qualcomm Arm64 DeviceTree fixes for v6.9
This corrects the watchdog IRQ flags for a number of remoteproc
instances, which otherwise prevents the driver from probe in the face of
a probe deferral.
Improvements in other areas, such as USB, have made it possible for CX
rail voltage on SC8280XP to be lowered, no longer meeting requirements
of active PCIe controllers. Necessary votes are added to these
controllers.
The MSI definitions for PCIe controllers in SM8450, SM8550, and SM8650
was incorrect, due to a bug in the driver. As this has now been fixed
the definition needs to be corrected.
Lastly, the SuperSpeed PHY irq of the second USB controller in SC8180x,
and the compatible string for X1 Elite domain idle states are corrected.
* tag 'qcom-arm64-fixes-for-6.9' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
arm64: dts: qcom: sc8180x: Fix ss_phy_irq for secondary USB controller
arm64: dts: qcom: sm8650: Fix the msi-map entries
arm64: dts: qcom: sm8550: Fix the msi-map entries
arm64: dts: qcom: sm8450: Fix the msi-map entries
arm64: dts: qcom: sc8280xp: add missing PCIe minimum OPP
arm64: dts: qcom: x1e80100: Fix the compatible for cluster idle states
arm64: dts: qcom: Fix type of "wdog" IRQs for remoteprocs
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into for-next
* 'v6.9-armsoc/dtsfixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
arm64: dts: rockchip: Fix USB interface compatible string on kobol-helios64
arm64: dts: rockchip: regulator for sd needs to be always on for BPI-R2Pro
dt-bindings: rockchip: grf: Add missing type to 'pcie-phy' node
arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 2
arm64: dts: rockchip: drop redundant disable-gpios in Lubancat 1
arm64: dts: rockchip: drop redundant pcie-reset-suspend in Scarlet Dumo
arm64: dts: rockchip: mark system power controller and fix typo on orangepi-5-plus
arm64: dts: rockchip: Designate the system power controller on QuartzPro64
arm64: dts: rockchip: drop panel port unit address in GRU Scarlet
arm64: dts: rockchip: Remove unsupported node from the Pinebook Pro dts
arm64: dts: rockchip: Fix the i2c address of es8316 on Cool Pi CM5
arm64: dts: rockchip: add regulators for PCIe on RK3399 Puma Haikou
arm64: dts: rockchip: enable internal pull-up on PCIE_WAKE# for RK3399 Puma
arm64: dts: rockchip: enable internal pull-up on Q7_USB_ID for RK3399 Puma
arm64: dts: rockchip: fix alphabetical ordering RK3399 puma
arm64: dts: rockchip: enable internal pull-up for Q7_THRM# on RK3399 Puma
arm64: dts: rockchip: set PHY address of MT7531 switch to 0x1f
Link: https://lore.kernel.org/r/3413596.CbtlEUcBR6@phil
Signed-off-by: Arnd Bergmann <[email protected]>
|
|
The return-address (RA) register r14 is specified as volatile in the
s390x ELF ABI [1]. Nevertheless proper CFI directives must be provided
for an unwinder to restore the return address, if the RA register
value is changed from its value at function entry, as it is the case.
[1]: s390x ELF ABI, https://github.com/IBM/s390x-abi/releases
Fixes: 4bff8cb54502 ("s390: convert to GENERIC_VDSO")
Signed-off-by: Jens Remus <[email protected]>
Acked-by: Heiko Carstens <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>
|
|
Since commit 1b2ac5a6d61f ("s390/3270: use new address translation
helpers") rq->buffer is passed unconditionally to virt_to_dma32().
The 3270 driver allocates requests without buffer, so the value passed
to virt_to_dma32 might be NULL. Check for NULL before assigning.
Fixes: 1b2ac5a6d61f ("s390/3270: use new address translation helpers")
Reviewed-by: Heiko Carstens <[email protected]>
Signed-off-by: Sven Schnelle <[email protected]>
Signed-off-by: Alexander Gordeev <[email protected]>
|
|
Since thermal_debug_cdev_remove() does not run under cdev->lock, it can
run in parallel with thermal_debug_cdev_state_update() and it may free
the struct thermal_debugfs object used by the latter after it has been
checked against NULL.
If that happens, thermal_debug_cdev_state_update() will access memory
that has been freed already causing the kernel to crash.
Address this by using cdev->lock in thermal_debug_cdev_remove() around
the cdev->debugfs value check (in case the same cdev is removed at the
same time in two different threads) and its reset to NULL.
Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information")
Cc :6.8+ <[email protected]> # 6.8+
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Lukasz Luba <[email protected]>
|
|
In netfs_perform_write(), when the file is marked NETFS_ICTX_WRITETHROUGH
or O_*SYNC or RWF_*SYNC was specified, write-through caching is performed
on a buffered file. When setting up for write-through, we flush any
conflicting writes in the region and wait for the write to complete,
failing if there's a write error to return.
The issue arises if we're writing at or above the EOF position because we
skip the flush and - more importantly - the wait. This becomes a problem
if there's a partial folio at the end of the file that is being written out
and we want to make a write to it too. Both the already-running write and
the write we start both want to clear the writeback mark, but whoever is
second causes a warning looking something like:
------------[ cut here ]------------
R=00000012: folio 11 is not under writeback
WARNING: CPU: 34 PID: 654 at fs/netfs/write_collect.c:105
...
CPU: 34 PID: 654 Comm: kworker/u386:27 Tainted: G S ...
...
Workqueue: events_unbound netfs_write_collection_worker
...
RIP: 0010:netfs_writeback_lookup_folio
Fix this by making the flush-and-wait unconditional. It will do nothing if
there are no folios in the pagecache and will return quickly if there are
no folios in the region specified.
Further, move the WBC attachment above the flush call as the flush is going
to attach a WBC and detach it again if it is not present - and since we
need one anyway we might as well share it.
Fixes: 41d8e7673a77 ("netfs: Implement a write-through caching option")
Reported-by: kernel test robot <[email protected]>
Closes: https://lore.kernel.org/oe-lkp/[email protected]
Signed-off-by: David Howells <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Reviewed-by: Jeffrey Layton <[email protected]>
cc: Eric Van Hensbergen <[email protected]>
cc: Latchesar Ionkov <[email protected]>
cc: Dominique Martinet <[email protected]>
cc: Christian Schoenebeck <[email protected]>
cc: Marc Dionne <[email protected]>
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
cc: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
Drop the flow-hash of the skb when forwarding to the L2TP netdev.
This avoids the L2TP qdisc from using the flow-hash from the outer
packet, which is identical for every flow within the tunnel.
This does not affect every platform but is specific for the ethernet
driver. It depends on the platform including L4 information in the
flow-hash.
One such example is the Mediatek Filogic MT798x family of networking
processors.
Fixes: d9e31d17ceba ("l2tp: Add L2TP ethernet pseudowire support")
Acked-by: James Chapman <[email protected]>
Signed-off-by: David Bauer <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
nsh_gso_segment().
syzbot triggered various splats (see [0] and links) by a crafted GSO
packet of VIRTIO_NET_HDR_GSO_UDP layering the following protocols:
ETH_P_8021AD + ETH_P_NSH + ETH_P_IPV6 + IPPROTO_UDP
NSH can encapsulate IPv4, IPv6, Ethernet, NSH, and MPLS. As the inner
protocol can be Ethernet, NSH GSO handler, nsh_gso_segment(), calls
skb_mac_gso_segment() to invoke inner protocol GSO handlers.
nsh_gso_segment() does the following for the original skb before
calling skb_mac_gso_segment()
1. reset skb->network_header
2. save the original skb->{mac_heaeder,mac_len} in a local variable
3. pull the NSH header
4. resets skb->mac_header
5. set up skb->mac_len and skb->protocol for the inner protocol.
and does the following for the segmented skb
6. set ntohs(ETH_P_NSH) to skb->protocol
7. push the NSH header
8. restore skb->mac_header
9. set skb->mac_header + mac_len to skb->network_header
10. restore skb->mac_len
There are two problems in 6-7 and 8-9.
(a)
After 6 & 7, skb->data points to the NSH header, so the outer header
(ETH_P_8021AD in this case) is stripped when skb is sent out of netdev.
Also, if NSH is encapsulated by NSH + Ethernet (so NSH-Ethernet-NSH),
skb_pull() in the first nsh_gso_segment() will make skb->data point
to the middle of the outer NSH or Ethernet header because the Ethernet
header is not pulled by the second nsh_gso_segment().
(b)
While restoring skb->{mac_header,network_header} in 8 & 9,
nsh_gso_segment() does not assume that the data in the linear
buffer is shifted.
However, udp6_ufo_fragment() could shift the data and change
skb->mac_header accordingly as demonstrated by syzbot.
If this happens, even the restored skb->mac_header points to
the middle of the outer header.
It seems nsh_gso_segment() has never worked with outer headers so far.
At the end of nsh_gso_segment(), the outer header must be restored for
the segmented skb, instead of the NSH header.
To do that, let's calculate the outer header position relatively from
the inner header and set skb->{data,mac_header,protocol} properly.
[0]:
BUG: KMSAN: uninit-value in ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:524 [inline]
BUG: KMSAN: uninit-value in ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline]
BUG: KMSAN: uninit-value in ipvlan_queue_xmit+0xf44/0x16b0 drivers/net/ipvlan/ipvlan_core.c:668
ipvlan_process_outbound drivers/net/ipvlan/ipvlan_core.c:524 [inline]
ipvlan_xmit_mode_l3 drivers/net/ipvlan/ipvlan_core.c:602 [inline]
ipvlan_queue_xmit+0xf44/0x16b0 drivers/net/ipvlan/ipvlan_core.c:668
ipvlan_start_xmit+0x5c/0x1a0 drivers/net/ipvlan/ipvlan_main.c:222
__netdev_start_xmit include/linux/netdevice.h:4989 [inline]
netdev_start_xmit include/linux/netdevice.h:5003 [inline]
xmit_one net/core/dev.c:3547 [inline]
dev_hard_start_xmit+0x244/0xa10 net/core/dev.c:3563
__dev_queue_xmit+0x33ed/0x51c0 net/core/dev.c:4351
dev_queue_xmit include/linux/netdevice.h:3171 [inline]
packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
__sys_sendto+0x735/0xa10 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
Uninit was created at:
slab_post_alloc_hook mm/slub.c:3819 [inline]
slab_alloc_node mm/slub.c:3860 [inline]
__do_kmalloc_node mm/slub.c:3980 [inline]
__kmalloc_node_track_caller+0x705/0x1000 mm/slub.c:4001
kmalloc_reserve+0x249/0x4a0 net/core/skbuff.c:582
__alloc_skb+0x352/0x790 net/core/skbuff.c:651
skb_segment+0x20aa/0x7080 net/core/skbuff.c:4647
udp6_ufo_fragment+0xcab/0x1150 net/ipv6/udp_offload.c:109
ipv6_gso_segment+0x14be/0x2ca0 net/ipv6/ip6_offload.c:152
skb_mac_gso_segment+0x3e8/0x760 net/core/gso.c:53
nsh_gso_segment+0x6f4/0xf70 net/nsh/nsh.c:108
skb_mac_gso_segment+0x3e8/0x760 net/core/gso.c:53
__skb_gso_segment+0x4b0/0x730 net/core/gso.c:124
skb_gso_segment include/net/gso.h:83 [inline]
validate_xmit_skb+0x107f/0x1930 net/core/dev.c:3628
__dev_queue_xmit+0x1f28/0x51c0 net/core/dev.c:4343
dev_queue_xmit include/linux/netdevice.h:3171 [inline]
packet_xmit+0x9c/0x6b0 net/packet/af_packet.c:276
packet_snd net/packet/af_packet.c:3081 [inline]
packet_sendmsg+0x8aef/0x9f10 net/packet/af_packet.c:3113
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
__sys_sendto+0x735/0xa10 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x125/0x1c0 net/socket.c:2199
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcf/0x1e0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x63/0x6b
CPU: 1 PID: 5101 Comm: syz-executor421 Not tainted 6.8.0-rc5-syzkaller-00297-gf2e367d6ad3b #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024
Fixes: c411ed854584 ("nsh: add GSO support")
Reported-and-tested-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=42a0dc856239de4de60e
Reported-and-tested-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=c298c9f0e46a3c86332b
Link: https://lore.kernel.org/netdev/[email protected]/
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
With the current thermal zone locking arrangement in the debugfs code,
user space can open the "mitigations" file for a thermal zone before
the zone's debugfs pointer is set which will result in a NULL pointer
dereference in tze_seq_start().
Moreover, thermal_debug_tz_remove() is not called under the thermal
zone lock, so it can run in parallel with the other functions accessing
the thermal zone's struct thermal_debugfs object. Then, it may clear
tz->debugfs after one of those functions has checked it and the
struct thermal_debugfs object may be freed prematurely.
To address the first problem, pass a pointer to the thermal zone's
struct thermal_debugfs object to debugfs_create_file() in
thermal_debug_tz_add() and make tze_seq_start(), tze_seq_next(),
tze_seq_stop(), and tze_seq_show() retrieve it from s->private
instead of a pointer to the thermal zone object. This will ensure
that tz_debugfs will be valid across the "mitigations" file accesses
until thermal_debugfs_remove_id() called by thermal_debug_tz_remove()
removes that file.
To address the second problem, use tz->lock in thermal_debug_tz_remove()
around the tz->debugfs value check (in case the same thermal zone is
removed at the same time in two different threads) and its reset to NULL.
Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Cc :6.8+ <[email protected]> # 6.8+
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Lukasz Luba <[email protected]>
|
|
Because thermal_debug_tz_remove() does not free all memory allocated for
thermal zone diagnostics, some of that memory becomes unreachable after
freeing the thermal zone's struct thermal_debugfs object.
Address this by making thermal_debug_tz_remove() free all of the memory
in question.
Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes")
Cc :6.8+ <[email protected]> # 6.8+
Signed-off-by: Rafael J. Wysocki <[email protected]>
Reviewed-by: Lukasz Luba <[email protected]>
|
|
In the context of changing my career path, my Pengutronix email address
will soon stop to be available to me. Update the PWM maintainer entry to
my kernel.org identity.
I drop my co-maintenance of SIOX. Thorsten will continue to care for
it with the support of the Pengutronix kernel team.
Signed-off-by: Uwe Kleine-König <[email protected]>
Acked-by: Thorsten Scherer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Uwe Kleine-König <[email protected]>
|
|
I no longer have access to PCA9541 hardware, and I am no longer involved
in related development. Listing me as PCA9541 maintainer does not make
sense anymore. Remove PCA9541 from MAINTAINERS to let its support default
to the generic I2C multiplexer entry.
Signed-off-by: Guenter Roeck <[email protected]>
Acked-by: Peter Rosin <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux into i2c/for-current
at24 fixes for v6.9-rc6
- move the nvmem registration after the test one-byte read to improve the
situation with a race condition in nvmem
- fix the DT schema for ST M24C64-D
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
- Fix error paths on managed allocations
- Fix PF/VF relay messages
Signed-off-by: Dave Airlie <[email protected]>
From: Lucas De Marchi <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/gxaxtvxeoax7mnddxbl3tfn2hfnm5e4ngnl3wpi4p5tvn7il4s@fwsvpntse7bh
|
|
https://git.pengutronix.de/git/lst/linux into drm-fixes
- fix GC7000 TX clock gating
- revert NPU UAPI changes
Signed-off-by: Dave Airlie <[email protected]>
From: Lucas Stach <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
Short summary of fixes pull:
atomic-helpers:
- Fix memory leak in drm_format_conv_state_copy()
fbdev:
- fbdefio: Fix address calculation
gma500:
- Fix crash during boot
Signed-off-by: Dave Airlie <[email protected]>
From: Thomas Zimmermann <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.9-2024-04-24:
amdgpu:
- Suspend/resume fix
- Don't expose gpu_od directory if it's empty
- SDMA 4.4.2 fix
- VPE fix
- BO eviction fix
- UMSCH fix
- SMU 13.0.6 reset fixes
- GPUVM flush accounting fix
- SDMA 5.2 fix
- Fix possible UAF in mes code
amdkfd:
- Eviction fence handling fix
- Fix memory leak when GPU memory allocation fails
- Fix dma-buf validation
- Fix rescheduling of restore worker
- SVM fix
Signed-off-by: Dave Airlie <[email protected]>
From: Alex Deucher <[email protected]>
Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
|
|
Bui Quang Minh says:
====================
Ensure the copied buf is NUL terminated (part)
I found that some drivers contains an out-of-bound read pattern like this
kern_buf = memdup_user(user_buf, count);
...
sscanf(kern_buf, ...);
The sscanf can be replaced by some other string-related functions. This
pattern can lead to out-of-bound read of kern_buf in string-related
functions.
This series fix the above issue by replacing memdup_user with
memdup_user_nul.
v1: https://lore.kernel.org/r/[email protected]
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
We try to access count + 1 byte from userspace with memdup_user(buffer,
count + 1). However, the userspace only provides buffer of count bytes and
only these count bytes are verified to be okay to access. To ensure the
copied buffer is NUL terminated, we use memdup_user_nul instead.
Fixes: 3a2eb515d136 ("octeontx2-af: Fix an off by one in rvu_dbg_qsize_write()")
Signed-off-by: Bui Quang Minh <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Currently, we allocate a nbytes-sized kernel buffer and copy nbytes from
userspace to that buffer. Later, we use sscanf on this buffer but we don't
ensure that the string is terminated inside the buffer, this can lead to
OOB read when using sscanf. Fix this issue by using memdup_user_nul
instead of memdup_user.
Fixes: 7afc5dbde091 ("bna: Add debugfs interface.")
Signed-off-by: Bui Quang Minh <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Currently, we allocate a count-sized kernel buffer and copy count bytes
from userspace to that buffer. Later, we use sscanf on this buffer but we
don't ensure that the string is terminated inside the buffer, this can lead
to OOB read when using sscanf. Fix this issue by using memdup_user_nul
instead of memdup_user.
Fixes: 96a9a9341cda ("ice: configure FW logging")
Fixes: 73671c3162c8 ("ice: enable FW logging")
Reviewed-by: Przemek Kitszel <[email protected]>
Signed-off-by: Bui Quang Minh <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Pull virtio fix from Michael Tsirkin:
"enum renames for vdpa uapi - we better do this now before the names
have been exposed in any releases"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vDPA: code clean for vhost_vdpa uapi
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
Pull 9p fix from Eric Van Hensbergen:
"This contains a single mitigation to help deal with an apparent race
condition between client and server having to deal with inode number
collisions"
* tag '9p-for-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
fs/9p: mitigate inode collisions
|
|
Ensure that args.acl is initialized early. It is used in an
unconditional call to kfree() on the way out of
nfsd4_encode_fattr4().
Reported-by: Scott Mayhew <[email protected]>
Fixes: 83ab8678ad0c ("NFSD: Add struct nfsd4_fattr_args")
Signed-off-by: Chuck Lever <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix three recent regressions, one introduced while enabling a
new platform firmware feature for power management, and two introduced
by a recent CPPC library update.
Specifics:
- Allow two overlapping Low-Power S0 Idle _DSM function sets to be
used at the same time (Rafael Wysocki)
- Fix bit offset computation in MASK_VAL() macro used for applying a
bitmask to a new CPPC register value (Jarred White)
- Fix access width field usage for PCC registers in CPPC (Vanshidhar
Konda)"
* tag 'acpi-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: PM: s2idle: Evaluate all Low-Power S0 Idle _DSM functions
ACPI: CPPC: Fix access width used for PCC registers
ACPI: CPPC: Fix bit_offset shift in MASK_VAL() macro
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from netfilter, wireless and bluetooth.
Nothing major, regression fixes are mostly in drivers, two more of
those are flowing towards us thru various trees. I wish some of the
changes went into -rc5, we'll try to keep an eye on frequency of PRs
from sub-trees.
Also disproportional number of fixes for bugs added in v6.4, strange
coincidence.
Current release - regressions:
- igc: fix LED-related deadlock on driver unbind
- wifi: mac80211: small fixes to recent clean up of the connection
process
- Revert "wifi: iwlwifi: bump FW API to 90 for BZ/SC devices", kernel
doesn't have all the code to deal with that version, yet
- Bluetooth:
- set power_ctrl_enabled on NULL returned by gpiod_get_optional()
- qca: fix invalid device address check, again
- eth: ravb: fix registered interrupt names
Current release - new code bugs:
- wifi: mac80211: check EHT/TTLM action frame length
Previous releases - regressions:
- fix sk_memory_allocated_{add|sub} for architectures where
__this_cpu_{add|sub}* are not IRQ-safe
- dsa: mv88e6xx: fix link setup for 88E6250
Previous releases - always broken:
- ip: validate dev returned from __in_dev_get_rcu(), prevent possible
null-derefs in a few places
- switch number of for_each_rcu() loops using call_rcu() on the
iterator to for_each_safe()
- macsec: fix isolation of broadcast traffic in presence of offload
- vxlan: drop packets from invalid source address
- eth: mlxsw: trap and ACL programming fixes
- eth: bnxt: PCIe error recovery fixes, fix counting dropped packets
- Bluetooth:
- lots of fixes for the command submission rework from v6.4
- qca: fix NULL-deref on non-serdev suspend
Misc:
- tools: ynl: don't ignore errors in NLMSG_DONE messages"
* tag 'net-6.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (88 commits)
af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().
net: b44: set pause params only when interface is up
tls: fix lockless read of strp->msg_ready in ->poll
dpll: fix dpll_pin_on_pin_register() for multiple parent pins
net: ravb: Fix registered interrupt names
octeontx2-af: fix the double free in rvu_npc_freemem()
net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets
ice: fix LAG and VF lock dependency in ice_reset_vf()
iavf: Fix TC config comparison with existing adapter TC config
i40e: Report MFS in decimal base instead of hex
i40e: Do not use WQ_MEM_RECLAIM flag for workqueue
net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()
net/mlx5e: Advertise mlx5 ethernet driver updates sk_buff md_dst for MACsec
macsec: Detect if Rx skb is macsec-related for offloading devices that update md_dst
ethernet: Add helper for assigning packet type when dest address does not match device address
macsec: Enable devices to advertise whether they update sk_buff md_dst during offloads
net: phy: dp83869: Fix MII mode failure
netfilter: nf_tables: honor table dormant flag from netdev release event path
eth: bnxt: fix counting packets discarded due to OOM and netpoll
igc: Fix LED-related deadlock on driver unbind
...
|
|
Coverity spotted that the cifs_sync_mid_result function could deadlock
"Thread deadlock (ORDER_REVERSAL) lock_order: Calling spin_lock acquires
lock TCP_Server_Info.srv_lock while holding lock TCP_Server_Info.mid_lock"
Addresses-Coverity: 1590401 ("Thread deadlock (ORDER_REVERSAL)")
Cc: [email protected]
Reviewed-by: Shyam Prasad N <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
Coverity spotted a place where we should have been holding the
channel lock when accessing the ses channel index.
Addresses-Coverity: 1582039 ("Data race condition (MISSING_LOCK)")
Cc: [email protected]
Reviewed-by: Shyam Prasad N <[email protected]>
Signed-off-by: Steve French <[email protected]>
|
|
* acpi-cppc:
ACPI: CPPC: Fix access width used for PCC registers
ACPI: CPPC: Fix bit_offset shift in MASK_VAL() macro
|
|
T-Head's memory attribute extension (XTheadMae) (non-compatible
equivalent of RVI's Svpbmt) is currently assumed for all T-Head harts.
However, QEMU recently decided to drop acceptance of guests that write
reserved bits in PTEs.
As XTheadMae uses reserved bits in PTEs and Linux applies the MAE errata
for all T-Head harts, this broke the Linux startup on QEMU emulations
of the C906 emulation.
This patch attempts to address this issue by testing the MAE-enable bit
in the th.sxstatus CSR. This CSR is available in HW and can be
emulated in QEMU.
This patch also makes the XTheadMae probing mechanism reliable, because
a test for the right combination of mvendorid, marchid, and mimpid
is not sufficient to enable MAE.
Reviewed-by: Conor Dooley <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
T-Head's vendor extension to set page attributes has the name
MAE (memory attribute extension).
Let's rename it, so it is clear what this referes to.
Link: https://github.com/T-head-Semi/thead-extension-spec/blob/master/xtheadmae.adoc
Reviewed-by: Conor Dooley <[email protected]>
Signed-off-by: Christoph Müllner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
When I did memory failure tests recently, below warning occurs:
DEBUG_LOCKS_WARN_ON(1)
WARNING: CPU: 8 PID: 1011 at kernel/locking/lockdep.c:232 __lock_acquire+0xccb/0x1ca0
Modules linked in: mce_inject hwpoison_inject
CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
RIP: 0010:__lock_acquire+0xccb/0x1ca0
RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082
RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8
RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0
RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb
R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10
R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004
FS: 00007ff9f32aa740(0000) GS:ffffa1ce5fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff9f3134ba0 CR3: 00000008484e4000 CR4: 00000000000006f0
Call Trace:
<TASK>
lock_acquire+0xbe/0x2d0
_raw_spin_lock_irqsave+0x3a/0x60
hugepage_subpool_put_pages.part.0+0xe/0xc0
free_huge_folio+0x253/0x3f0
dissolve_free_huge_page+0x147/0x210
__page_handle_poison+0x9/0x70
memory_failure+0x4e6/0x8c0
hard_offline_page_store+0x55/0xa0
kernfs_fop_write_iter+0x12c/0x1d0
vfs_write+0x380/0x540
ksys_write+0x64/0xe0
do_syscall_64+0xbc/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff9f3114887
RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887
RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001
RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00
</TASK>
Kernel panic - not syncing: kernel: panic_on_warn set ...
CPU: 8 PID: 1011 Comm: bash Kdump: loaded Not tainted 6.9.0-rc3-next-20240410-00012-gdb69f219f4be #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
panic+0x326/0x350
check_panic_on_warn+0x4f/0x50
__warn+0x98/0x190
report_bug+0x18e/0x1a0
handle_bug+0x3d/0x70
exc_invalid_op+0x18/0x70
asm_exc_invalid_op+0x1a/0x20
RIP: 0010:__lock_acquire+0xccb/0x1ca0
RSP: 0018:ffffa7a1c7fe3bd0 EFLAGS: 00000082
RAX: 0000000000000000 RBX: eb851eb853975fcf RCX: ffffa1ce5fc1c9c8
RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffffa1ce5fc1c9c0
RBP: ffffa1c6865d3280 R08: ffffffffb0f570a8 R09: 0000000000009ffb
R10: 0000000000000286 R11: ffffffffb0f2ad50 R12: ffffa1c6865d3d10
R13: ffffa1c6865d3c70 R14: 0000000000000000 R15: 0000000000000004
lock_acquire+0xbe/0x2d0
_raw_spin_lock_irqsave+0x3a/0x60
hugepage_subpool_put_pages.part.0+0xe/0xc0
free_huge_folio+0x253/0x3f0
dissolve_free_huge_page+0x147/0x210
__page_handle_poison+0x9/0x70
memory_failure+0x4e6/0x8c0
hard_offline_page_store+0x55/0xa0
kernfs_fop_write_iter+0x12c/0x1d0
vfs_write+0x380/0x540
ksys_write+0x64/0xe0
do_syscall_64+0xbc/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff9f3114887
RSP: 002b:00007ffecbacb458 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007ff9f3114887
RDX: 000000000000000c RSI: 0000564494164e10 RDI: 0000000000000001
RBP: 0000564494164e10 R08: 00007ff9f31d1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
R13: 00007ff9f321b780 R14: 00007ff9f3217600 R15: 00007ff9f3216a00
</TASK>
After git bisecting and digging into the code, I believe the root cause is
that _deferred_list field of folio is unioned with _hugetlb_subpool field.
In __update_and_free_hugetlb_folio(), folio->_deferred_list is
initialized leading to corrupted folio->_hugetlb_subpool when folio is
hugetlb. Later free_huge_folio() will use _hugetlb_subpool and above
warning happens.
But it is assumed hugetlb flag must have been cleared when calling
folio_put() in update_and_free_hugetlb_folio(). This assumption is broken
due to below race:
CPU1 CPU2
dissolve_free_huge_page update_and_free_pages_bulk
update_and_free_hugetlb_folio hugetlb_vmemmap_restore_folios
folio_clear_hugetlb_vmemmap_optimized
clear_flag = folio_test_hugetlb_vmemmap_optimized
if (clear_flag) <-- False, it's already cleared.
__folio_clear_hugetlb(folio) <-- Hugetlb is not cleared.
folio_put
free_huge_folio <-- free_the_page is expected.
list_for_each_entry()
__folio_clear_hugetlb <-- Too late.
Fix this issue by checking whether folio is hugetlb directly instead of
checking clear_flag to close the race window.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 32c877191e02 ("hugetlb: do not clear hugetlb dtor until allocating vmemmap")
Signed-off-by: Miaohe Lin <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
script
The save/restore of nr_hugepages was added to the test itself by using the
atexit() functionality. But it is broken as parent exits after creating
child. Hence calling the atexit() function early. That's not it. The
child exits after creating its child and so on.
The parent cannot wait to get the termination status for its children as
it'll keep on holding the resources until the new pkey allocation fails.
It is impossible to wait for exits of all the grand and great grand
children. Hence the restoring of nr_hugepages value from parent is wrong.
Let's save/restore the nr_hugepages settings in the launch script
instead of doing it in the test.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: c52eb6db7b7d ("selftests: mm: restore settings from only parent process")
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Reported-by: Joey Gouly <[email protected]>
Closes: https://lore.kernel.org/all/[email protected]
Cc: Joey Gouly <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Currently, the sud_test expects the emulated syscall to return the
emulated syscall number. This assumption only works on architectures
were the syscall calling convention use the same register for syscall
number/syscall return value. This is not the case for RISC-V and thus
the return value must be also emulated using the provided ucontext.
Signed-off-by: Clément Léger <[email protected]>
Reviewed-by: Palmer Dabbelt <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fixes from Chuck Lever:
- Revert some backchannel fixes that went into v6.9-rc
* tag 'nfsd-6.9-5' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
Revert "NFSD: Convert the callback workqueue to use delayed_work"
Revert "NFSD: Reschedule CB operations when backchannel rpc_clnt is shut down"
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:
- A couple of i2c-hid fixes (Kenny Levinsen & Nam Cao)
- A config issue with mcp-2221 when CONFIG_IIO is not enabled
(Abdelrahman Morsy)
- A dev_err fix in intel-ish-hid (Zhang Lixu)
- A couple of mouse fixes for both nintendo and Logitech-dj (Nuno
Pereira and Yaraslau Furman)
- I'm changing my main kernel email address as it's way simpler for me
than the Red Hat one (Benjamin Tissoires)
* tag 'for-linus-2024042501' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: mcp-2221: cancel delayed_work only when CONFIG_IIO is enabled
HID: logitech-dj: allow mice to use all types of reports
HID: i2c-hid: Revert to await reset ACK before reading report descriptor
HID: nintendo: Fix N64 controller being identified as mouse
MAINTAINERS: update Benjamin's email address
HID: intel-ish-hid: ipc: Fix dev_err usage with uninitialized dev->devc
HID: i2c-hid: remove I2C_HID_READ_PENDING flag to prevent lock-up
|
|
When e.g. 8 bytes are to be read, sgm->consumed equals 8 immediately after
sg_miter_next() call. The driver then increments it as bytes are read,
so sgm->consumed becomes 16 and this warning triggers in sg_miter_stop():
WARN_ON(miter->consumed > miter->length);
WARNING: CPU: 0 PID: 28 at lib/scatterlist.c:925 sg_miter_stop+0x2c/0x10c
CPU: 0 PID: 28 Comm: kworker/0:2 Tainted: G W 6.9.0-rc5-dirty #249
Hardware name: Generic DT based system
Workqueue: events_freezable mmc_rescan
Call trace:.
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x44/0x5c
dump_stack_lvl from __warn+0x78/0x16c
__warn from warn_slowpath_fmt+0xb0/0x160
warn_slowpath_fmt from sg_miter_stop+0x2c/0x10c
sg_miter_stop from moxart_request+0xb0/0x468
moxart_request from mmc_start_request+0x94/0xa8
mmc_start_request from mmc_wait_for_req+0x60/0xa8
mmc_wait_for_req from mmc_app_send_scr+0xf8/0x150
mmc_app_send_scr from mmc_sd_setup_card+0x1c/0x420
mmc_sd_setup_card from mmc_sd_init_card+0x12c/0x4dc
mmc_sd_init_card from mmc_attach_sd+0xf0/0x16c
mmc_attach_sd from mmc_rescan+0x1e0/0x298
mmc_rescan from process_scheduled_works+0x2e4/0x4ec
process_scheduled_works from worker_thread+0x1ec/0x24c
worker_thread from kthread+0xd4/0xe0
kthread from ret_from_fork+0x14/0x38
This patch adds initial zeroing of sgm->consumed. It is then incremented
as bytes are read or written.
Signed-off-by: Sergei Antonov <[email protected]>
Cc: Linus Walleij <[email protected]>
Fixes: 3ee0e7c3e67c ("mmc: moxart-mmc: Use sg_miter for PIO")
Reviewed-by: Linus Walleij <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Ulf Hansson <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains two Netfilter/IPVS fixes for net:
Patch #1 fixes SCTP checksumming for IPVS with gso packets,
from Ismael Luceno.
Patch #2 honor dormant flag from netdev event path to fix a possible
double hook unregistration.
* tag 'nf-24-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: honor table dormant flag from netdev release event path
ipvs: Fix checksumming on GSO of SCTP packets
====================
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
syzbot reported a lockdep splat regarding unix_gc_lock and
unix_state_lock().
One is called from recvmsg() for a connected socket, and another
is called from GC for TCP_LISTEN socket.
So, the splat is false-positive.
Let's add a dedicated lock class for the latter to suppress the splat.
Note that this change is not necessary for net-next.git as the issue
is only applied to the old GC impl.
[0]:
WARNING: possible circular locking dependency detected
6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0 Not tainted
-----------------------------------------------------
kworker/u8:1/11 is trying to acquire lock:
ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffff88807cea4e70 (&u->lock){+.+.}-{2:2}, at: __unix_gc+0x40e/0xf70 net/unix/garbage.c:302
but task is already holding lock:
ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (unix_gc_lock){+.+.}-{2:2}:
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
unix_notinflight+0x13d/0x390 net/unix/garbage.c:140
unix_detach_fds net/unix/af_unix.c:1819 [inline]
unix_destruct_scm+0x221/0x350 net/unix/af_unix.c:1876
skb_release_head_state+0x100/0x250 net/core/skbuff.c:1188
skb_release_all net/core/skbuff.c:1200 [inline]
__kfree_skb net/core/skbuff.c:1216 [inline]
kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1252
kfree_skb include/linux/skbuff.h:1262 [inline]
manage_oob net/unix/af_unix.c:2672 [inline]
unix_stream_read_generic+0x1125/0x2700 net/unix/af_unix.c:2749
unix_stream_splice_read+0x239/0x320 net/unix/af_unix.c:2981
do_splice_read fs/splice.c:985 [inline]
splice_file_to_pipe+0x299/0x500 fs/splice.c:1295
do_splice+0xf2d/0x1880 fs/splice.c:1379
__do_splice fs/splice.c:1436 [inline]
__do_sys_splice fs/splice.c:1652 [inline]
__se_sys_splice+0x331/0x4a0 fs/splice.c:1634
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&u->lock){+.+.}-{2:2}:
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
__unix_gc+0x40e/0xf70 net/unix/garbage.c:302
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(unix_gc_lock);
lock(&u->lock);
lock(unix_gc_lock);
lock(&u->lock);
*** DEADLOCK ***
3 locks held by kworker/u8:1/11:
#0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3229 [inline]
#0: ffff888015089148 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_scheduled_works+0x8e0/0x17c0 kernel/workqueue.c:3335
#1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_one_work kernel/workqueue.c:3230 [inline]
#1: ffffc90000107d00 (unix_gc_work){+.+.}-{0:0}, at: process_scheduled_works+0x91b/0x17c0 kernel/workqueue.c:3335
#2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: spin_lock include/linux/spinlock.h:351 [inline]
#2: ffffffff8f6ab638 (unix_gc_lock){+.+.}-{2:2}, at: __unix_gc+0x117/0xf70 net/unix/garbage.c:261
stack backtrace:
CPU: 0 PID: 11 Comm: kworker/u8:1 Not tainted 6.9.0-rc5-syzkaller-00007-g4d2008430ce8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
Workqueue: events_unbound __unix_gc
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:88 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114
check_noncircular+0x36a/0x4a0 kernel/locking/lockdep.c:2187
check_prev_add kernel/locking/lockdep.c:3134 [inline]
check_prevs_add kernel/locking/lockdep.c:3253 [inline]
validate_chain+0x18cb/0x58e0 kernel/locking/lockdep.c:3869
__lock_acquire+0x1346/0x1fd0 kernel/locking/lockdep.c:5137
lock_acquire+0x1ed/0x550 kernel/locking/lockdep.c:5754
__raw_spin_lock include/linux/spinlock_api_smp.h:133 [inline]
_raw_spin_lock+0x2e/0x40 kernel/locking/spinlock.c:154
spin_lock include/linux/spinlock.h:351 [inline]
__unix_gc+0x40e/0xf70 net/unix/garbage.c:302
process_one_work kernel/workqueue.c:3254 [inline]
process_scheduled_works+0xa10/0x17c0 kernel/workqueue.c:3335
worker_thread+0x86d/0xd70 kernel/workqueue.c:3416
kthread+0x2f0/0x390 kernel/kthread.c:388
ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
</TASK>
Fixes: 47d8ac011fe1 ("af_unix: Fix garbage collector racing against connect()")
Reported-and-tested-by: [email protected]
Closes: https://syzkaller.appspot.com/bug?extid=fa379358c28cc87cc307
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Remove argument `params` from the `module` macro example, because the
macro does not currently support module parameters since it was not sent
with the initial merge.
Signed-off-by: Aswin Unnikrishnan <[email protected]>
Reviewed-by: Alice Ryhl <[email protected]>
Cc: [email protected]
Fixes: 1fbde52bde73 ("rust: add `macros` crate")
Link: https://lore.kernel.org/r/[email protected]
[ Reworded slightly. ]
Signed-off-by: Miguel Ojeda <[email protected]>
|
|
If one attempts to build an essentially empty file somewhere in the
kernel tree, it leads to a build error because the compiler does not
recognize the `new_uninit` unstable feature:
error[E0635]: unknown feature `new_uninit`
--> <crate attribute>:1:9
|
1 | feature(new_uninit)
| ^^^^^^^^^^
The reason is that we pass `-Zcrate-attr='feature(new_uninit)'` (together
with `-Zallow-features=new_uninit`) to let non-`rust/` code use that
unstable feature.
However, the compiler only recognizes the feature if the `alloc` crate
is resolved (the feature is an `alloc` one). `--extern alloc`, which we
pass, is not enough to resolve the crate.
Introducing a reference like `use alloc;` or `extern crate alloc;`
solves the issue, thus this is not seen in normal files. For instance,
`use`ing the `kernel` prelude introduces such a reference, since `alloc`
is used inside.
While normal use of the build system is not impacted by this, it can still
be fairly confusing for kernel developers [1], thus use the unstable
`force` option of `--extern` [2] (added in Rust 1.71 [3]) to force the
compiler to resolve `alloc`.
This new unstable feature is only needed meanwhile we use the other
unstable feature, since then we will not need `-Zcrate-attr`.
Cc: [email protected] # v6.6+
Reported-by: Daniel Almeida <[email protected]>
Reported-by: Julian Stecklina <[email protected]>
Closes: https://rust-for-linux.zulipchat.com/#narrow/stream/288089-General/topic/x/near/424096982 [1]
Fixes: 2f7ab1267dc9 ("Kbuild: add Rust support")
Link: https://github.com/rust-lang/rust/issues/111302 [2]
Link: https://github.com/rust-lang/rust/pull/109421 [3]
Reviewed-by: Alice Ryhl <[email protected]>
Reviewed-by: Gary Guo <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Miguel Ojeda <[email protected]>
|