Age | Commit message (Collapse) | Author | Files | Lines |
|
Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2.
The pmd_trans_huge() code in mfill_atomic() is wrong in three different
ways depending on kernel version:
1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit
the right two race windows) - I've tested this in a kernel build with
some extra mdelay() calls. See the commit message for a description
of the race scenario.
On older kernels (before 6.5), I think the same bug can even
theoretically lead to accessing transhuge page contents as a page table
if you hit the right 5 narrow race windows (I haven't tested this case).
2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for
detecting PMDs that don't point to page tables.
On older kernels (before 6.5), you'd just have to win a single fairly
wide race to hit this.
I've tested this on 6.1 stable by racing migration (with a mdelay()
patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86
VM, that causes a kernel oops in ptlock_ptr().
3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed
to yank page tables out from under us (though I haven't tested that),
so I think the BUG_ON() checks in mfill_atomic() are just wrong.
I decided to write two separate fixes for these (one fix for bugs 1+2, one
fix for bug 3), so that the first fix can be backported to kernels
affected by bugs 1+2.
This patch (of 2):
This fixes two issues.
I discovered that the following race can occur:
mfill_atomic other thread
============ ============
<zap PMD>
pmdp_get_lockless() [reads none pmd]
<bail if trans_huge>
<if none:>
<pagefault creates transhuge zeropage>
__pte_alloc [no-op]
<zap PMD>
<bail if pmd_trans_huge(*dst_pmd)>
BUG_ON(pmd_none(*dst_pmd))
I have experimentally verified this in a kernel with extra mdelay() calls;
the BUG_ON(pmd_none(*dst_pmd)) triggers.
On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow
pte_offset_map[_lock]() to fail"), this can't lead to anything worse than
a BUG_ON(), since the page table access helpers are actually designed to
deal with page tables concurrently disappearing; but on older kernels
(<=6.4), I think we could probably theoretically race past the two
BUG_ON() checks and end up treating a hugepage as a page table.
The second issue is that, as Qi Zheng pointed out, there are other types
of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs
(in particular, migration PMDs).
On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a
PMD that contains a migration entry (which just requires winning a single,
fairly wide race), it will pass the PMD to pte_offset_map_lock(), which
assumes that the PMD points to a page table.
Breakage follows: First, the kernel tries to take the PTE lock (which will
crash or maybe worse if there is no "struct page" for the address bits in
the migration entry PMD - I think at least on X86 there usually is no
corresponding "struct page" thanks to the PTE inversion mitigation, amd64
looks different).
If that didn't crash, the kernel would next try to write a PTE into what
it wrongly thinks is a page table.
As part of fixing these issues, get rid of the check for pmd_trans_huge()
before __pte_alloc() - that's redundant, we're going to have to check for
that after the __pte_alloc() anyway.
Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
Signed-off-by: Jann Horn <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jann Horn <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: Qi Zheng <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Commit 8c61291fd850 ("mm: fix incorrect vbq reference in
purge_fragmented_block") extended the 'vmap_block' structure to contain a
'cpu' field which is set at allocation time to the id of the initialising
CPU.
When a new 'vmap_block' is being instantiated by new_vmap_block(), the
partially initialised structure is added to the local 'vmap_block_queue'
xarray before the 'cpu' field has been initialised. If another CPU is
concurrently walking the xarray (e.g. via vm_unmap_aliases()), then it
may perform an out-of-bounds access to the remote queue thanks to an
uninitialised index.
This has been observed as UBSAN errors in Android:
| Internal error: UBSAN: array index out of bounds: 00000000f2005512 [#1] PREEMPT SMP
|
| Call trace:
| purge_fragmented_block+0x204/0x21c
| _vm_unmap_aliases+0x170/0x378
| vm_unmap_aliases+0x1c/0x28
| change_memory_common+0x1dc/0x26c
| set_memory_ro+0x18/0x24
| module_enable_ro+0x98/0x238
| do_init_module+0x1b0/0x310
Move the initialisation of 'vb->cpu' in new_vmap_block() ahead of the
addition to the xarray.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8c61291fd850 ("mm: fix incorrect vbq reference in purge_fragmented_block")
Signed-off-by: Will Deacon <[email protected]>
Reviewed-by: Baoquan He <[email protected]>
Reviewed-by: Uladzislau Rezki (Sony) <[email protected]>
Cc: Zhaoyang Huang <[email protected]>
Cc: Hailong.Liu <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The __NR_mmap isn't found on armhf. The mmap() is commonly available
system call and its wrapper is present on all architectures. So it should
be used directly. It solves problem for armhf and doesn't create problem
for other architectures.
Remove sys_mmap() functions as they aren't doing anything else other than
calling mmap(). There is no need to set errno = 0 manually as glibc
always resets it.
For reference errors are as following:
CC seal_elf
seal_elf.c: In function 'sys_mmap':
seal_elf.c:39:33: error: '__NR_mmap' undeclared (first use in this function)
39 | sret = (void *) syscall(__NR_mmap, addr, len, prot,
| ^~~~~~~~~
mseal_test.c: In function 'sys_mmap':
mseal_test.c:90:33: error: '__NR_mmap' undeclared (first use in this function)
90 | sret = (void *) syscall(__NR_mmap, addr, len, prot,
| ^~~~~~~~~
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 4926c7a52de7 ("selftest mm/mseal memory sealing")
Signed-off-by: Muhammad Usama Anjum <[email protected]>
Cc: Jeff Xu <[email protected]>
Cc: Kees Cook <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Enable WB2 hardware block, enabling writeback support on this platform.
Signed-off-by: Dmitry Baryshkov <[email protected]>
Tested-by: Luca Weiss <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/570194/
Link: https://lore.kernel.org/r/[email protected]
|
|
Enable WB2 hardware block, enabling writeback support on this platform.
Signed-off-by: Dmitry Baryshkov <[email protected]>
Reviewed-by: Abhinav Kumar <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/570193/
Link: https://lore.kernel.org/r/[email protected]
|
|
Enable WB2 hardware block, enabling writeback support on this platform.
Signed-off-by: Dmitry Baryshkov <[email protected]>
Reviewed-by: Abhinav Kumar <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/570196/
Link: https://lore.kernel.org/r/[email protected]
|
|
Enable WB2 hardware block, enabling writeback support on this platform.
Reviewed-by: Abhinav Kumar <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/570192/
Link: https://lore.kernel.org/r/[email protected]
[DB: picked up WB_SDM845_MASK from sdm845 patch]
Signed-off-by: Dmitry Baryshkov <[email protected]>
|
|
The following build error was triggered because of NULL string argument:
BUILDSTDERR: drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c: In function 'mdp5_smp_dump':
BUILDSTDERR: drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c:352:51: error: '%s' directive argument is null [-Werror=format-overflow=]
BUILDSTDERR: 352 | drm_printf(p, "%s:%d\t%d\t%s\n",
BUILDSTDERR: | ^~
BUILDSTDERR: drivers/gpu/drm/msm/disp/mdp5/mdp5_smp.c:352:51: error: '%s' directive argument is null [-Werror=format-overflow=]
This happens from the commit a61ddb4393ad ("drm: enable (most) W=1
warnings by default across the subsystem"). Using "(null)" instead
to fix it.
Fixes: bc5289eed481 ("drm/msm/mdp5: add debugfs to show smp block status")
Signed-off-by: Sherry Yang <[email protected]>
Reviewed-by: Abhinav Kumar <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/611071/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
|
|
According to the display-drivers, 5nm DSI PLL (v4.2, v4.3) have
different boundaries for pll_clock_inverters programming. Follow the
vendor code and use correct values.
Fixes: 2f9ae4e395ed ("drm/msm/dsi: add support for DSI-PHY on SM8350 and SM8450")
Signed-off-by: Dmitry Baryshkov <[email protected]>
Reviewed-by: Abhinav Kumar <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/606947/
Link: https://lore.kernel.org/r/[email protected]
|
|
Hardware document indicates that widebus is recommended on DP on all
MDSS chipsets starting version 5.x.x and above.
Follow the guideline and mark widebus support on all relevant
chipsets for DP.
Fixes: 766f705204a0 ("drm/msm/dp: Remove now unused connector_type from desc")
Fixes: 1b2d98bdd7b7 ("drm/msm/dp: Add DisplayPort controller for SM8650")
Signed-off-by: Abhinav Kumar <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Fixes: 757a2f36ab09 ("drm/msm/dp: enable widebus feature for display port")
Fixes: 1b2d98bdd7b7 ("drm/msm/dp: Add DisplayPort controller for SM8650")
Patchwork: https://patchwork.freedesktop.org/patch/606556/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
|
|
Add support for the HDMI PHY as present on the Qualcomm MSM8998 SoC.
This code is mostly copy & paste of the vendor code from msm-4.4
kernel.lnx.4.4.r38-rel.
Signed-off-by: Arnaud Vrac <[email protected]>
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Marc Gonzalez <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/605631/
Link: https://lore.kernel.org/r/[email protected]
[DB: replaced division with do_div64 to fix build issues on ARM32]
Signed-off-by: Dmitry Baryshkov <[email protected]>
|
|
Current driver already supports the msm8998 HDMI TX.
We just need to add the compatible string.
Reviewed-by: Dmitry Baryshkov <[email protected]>
Signed-off-by: Marc Gonzalez <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/605632/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
|
|
HDMI TX block embedded in the APQ8098.
Reviewed-by: Rob Herring (Arm) <[email protected]>
Reviewed-by: Conor Dooley <[email protected]>
Signed-off-by: Marc Gonzalez <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/605638/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
|
|
HDMI PHY block embedded in the APQ8098.
Acked-by: Rob Herring (Arm) <[email protected]>
Acked-by: Vinod Koul <[email protected]>
Signed-off-by: Marc Gonzalez <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/605634/
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Dmitry Baryshkov <[email protected]>
|
|
Some platforms provides a mechanism for configuring the mapping between
(one or two) DisplayPort intfs and their PHYs.
In particular SC8180X requires this to be configured, since on this
platform there are fewer controllers than PHYs.
The change implements the logic for optionally configuring which PHY
each of the DP INTFs should be connected to and marks the SC8180X DPU to
program 2 entries.
For now the request is simply to program the mapping 1:1, any support
for alternative mappings is left until the use case arrise.
Note that e.g. msm-4.14 unconditionally maps INTF 0 to PHY 0 on all
platforms, so perhaps this is needed in order to get DisplayPort working
on some other platforms as well.
Co-developed-by: Bjorn Andersson <[email protected]>
Signed-off-by: Bjorn Andersson <[email protected]>
Signed-off-by: Dmitry Baryshkov <[email protected]>
Reviewed-by: Abhinav Kumar <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/600895/
Link: https://lore.kernel.org/r/[email protected]
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
- x2apic_disable() clears x2apic_state and x2apic_mode unconditionally,
even when the state is X2APIC_ON_LOCKED, which prevents the kernel to
disable it thereby creating inconsistent state.
Reorder the logic so it actually works correctly
- The XSTATE logic for handling LBR is incorrect as it assumes that
XSAVES supports LBR when the CPU supports LBR. In fact both
conditions need to be true. Otherwise the enablement of LBR in the
IA32_XSS MSR fails and subsequently the machine crashes on the next
XRSTORS operation because IA32_XSS is not initialized.
Cache the XSTATE support bit during init and make the related
functions use this cached information and the LBR CPU feature bit to
cure this.
- Cure a long standing bug in KASLR
KASLR uses the full address space between PAGE_OFFSET and vaddr_end
to randomize the starting points of the direct map, vmalloc and
vmemmap regions. It thereby limits the size of the direct map by
using the installed memory size plus an extra configurable margin for
hot-plug memory. This limitation is done to gain more randomization
space because otherwise only the holes between the direct map,
vmalloc, vmemmap and vaddr_end would be usable for randomizing.
The limited direct map size is not exposed to the rest of the kernel,
so the memory hot-plug and resource management related code paths
still operate under the assumption that the available address space
can be determined with MAX_PHYSMEM_BITS.
request_free_mem_region() allocates from (1 << MAX_PHYSMEM_BITS) - 1
downwards. That means the first allocation happens past the end of
the direct map and if unlucky this address is in the vmalloc space,
which causes high_memory to become greater than VMALLOC_START and
consequently causes iounmap() to fail for valid ioremap addresses.
Cure this by exposing the end of the direct map via PHYSMEM_END and
use that for the memory hot-plug and resource management related
places instead of relying on MAX_PHYSMEM_BITS. In the KASLR case
PHYSMEM_END maps to a variable which is initialized by the KASLR
initialization and otherwise it is based on MAX_PHYSMEM_BITS as
before.
- Prevent a data leak in mmio_read(). The TDVMCALL exposes the value of
an initialized variabled on the stack to the VMM. The variable is
only required as output value, so it does not have to exposed to the
VMM in the first place.
- Prevent an array overrun in the resource control code on systems with
Sub-NUMA Clustering enabled because the code failed to adjust the
index by the number of SNC nodes per L3 cache.
* tag 'x86-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/resctrl: Fix arch_mbm_* array overrun on SNC
x86/tdx: Fix data leak in mmio_read()
x86/kaslr: Expose and use the end of the physical memory address space
x86/fpu: Avoid writing LBR bit to IA32_XSS unless supported
x86/apic: Make x2apic_disable() work correctly
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fix from Thomas Gleixner:
"A single fix for x86 performance monitoring.
Haswell PMUs suffer from several errata and require a limit the
minimal period for counter events, otherwise they suffer from endless
loops in the PMU interrupt"
* tag 'perf-urgent-2024-09-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Limit the period on Haswell
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Thomas Gleixner:
"A single fix for rt_mutex.
The deadlock detection code drops into an infinite scheduling loop
while still holding rt_mutex::wait_lock, which rightfully triggers a
'scheduling in atomic' warning.
Unlock it before that"
* tag 'locking-urgent-2024-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rtmutex: Drop rt_mutex::wait_lock before scheduling
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes for interrupt chip drivers:
- Unbreak the PLIC driver for Allwinner D1 systems
The recent conversion of the PLIC driver to a platform driver broke
Allwinnder D1 systems due to the deferred probing of platform
drivers.
Due to that the only timer available on D1 systems cannot get an
interrupt, which causes the system to hang at boot. Other RISCV
platforms are not affected because they provide the architected SBI
timer which uses the built in core interrupt controller.
Cure this by probing PLIC early on D1 systems
- Cure a regression in ARM/GIC-V3 on 32-bit ARM systems caused by the
recent addition of a initialization function, which accesses system
registers before they are enabled. On 64-bit ARM they are enabled
prior to that by sheer luck.
Ensure they are enabled.
- Cure a use before check problem in the MSI library. The existing
NULL pointer check is too late.
- Cure a lock order inversion in the ARM/GIC-V4 driver
- Fix a IS_ERR() vs. NULL pointer check issue in the RISCV APLIC
driver
- Plug a reference count leak in the ARM/GIC-V2 driver"
* tag 'irq-urgent-2024-08-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/irq-msi-lib: Check for NULL ops in msi_lib_irq_domain_select()
irqchip/gic-v3: Init SRE before poking sysregs
irqchip/gic-v2m: Fix refcount leak in gicv2m_of_init()
irqchip/riscv-aplic: Fix an IS_ERR() vs NULL bug in probe()
irqchip/gic-v4: Fix ordering between vmapp and vpe locks
irqchip/sifive-plic: Probe plic driver early for Allwinner D1 platform
|
|
Fixes: 49aa7830396b ("bcachefs: Fix rebalance_work accounting")
Signed-off-by: Kent Overstreet <[email protected]>
|
|
Matt Johnston says:
====================
net: mctp-serial: Fix for missing tx escapes
The mctp-serial code to add escape characters was incorrect due to an
off-by-one error. This series adds a test for the chunking which splits
by escape characters, and fixes the bug.
v2: Fix kunit param const pointer
====================
Signed-off-by: David S. Miller <[email protected]>
|
|
0x7d and 0x7e bytes are meant to be escaped in the data portion of
frames, but this didn't occur since next_chunk_len() had an off-by-one
error. That also resulted in the final byte of a payload being written
as a separate tty write op.
The chunk prior to an escaped byte would be one byte short, and the
next call would never test the txpos+1 case, which is where the escaped
byte was located. That meant it never hit the escaping case in
mctp_serial_tx_work().
Example Input: 01 00 08 c8 7e 80 02
Previous incorrect chunks from next_chunk_len():
01 00 08
c8 7e 80
02
With this fix:
01 00 08 c8
7e
80 02
Cc: [email protected]
Fixes: a0c2ccd9b5ad ("mctp: Add MCTP-over-serial transport binding")
Signed-off-by: Matt Johnston <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Test various edge cases of inputs that contain characters
that need escaping.
This adds a new kunit suite for mctp-serial.
Signed-off-by: Matt Johnston <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Add support for Adreno 306A GPU what is found in MSM8917 SoC.
This GPU marketing name is Adreno 308.
Signed-off-by: Otto Pflüger <[email protected]>
[use internal name of the GPU, reword the commit message]
Reviewed-by: Konrad Dybcio <[email protected]>
Signed-off-by: Barnabás Czémán <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/605403/
Signed-off-by: Rob Clark <[email protected]>
|
|
A621 is a clear A662 derivative (same lineage as A650), no explosions
or sick features, other than a NoC bug which can stall the GPU..
Add support for it.
Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/611100/
Signed-off-by: Rob Clark <[email protected]>
|
|
This was apparently never done before.. Program the expected values.
This also gets rid of sneakily setting that register through the HWCG
reg list on A690.
Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/611098/
Signed-off-by: Rob Clark <[email protected]>
|
|
This register's magic value differs wildly between different GPUs, use
the hardcoded data instead of trying to make some logic out of it.
Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/611096/
Signed-off-by: Rob Clark <[email protected]>
|
|
Store the correct values that we happen to have for some A7xx SKUs in
the GPU info struct and fill out the missing information for A6xx GPUs
based on downstream kernel information.
Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/611094/
[add missing entry to a615 catalog to resolve conflict]
Signed-off-by: Rob Clark <[email protected]>
|
|
The if-else monster is so unmaintainable that one case is repeated
twice. Get rid of it.
Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/611092/
[add missing entry to a615 catalog to resolve conflict]
Signed-off-by: Rob Clark <[email protected]>
|
|
A650 family includes A660 family (they've got a big family), A650
itself, and some more A6XX_GEN3 SKUs, all of which should fall into
the same branch of the if-condition. Simplify that.
Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/605206/
Signed-off-by: Rob Clark <[email protected]>
|
|
There is another cause for soft lock-up of GPU in empty ring-buffer:
race between GPU executing last commands and CPU checking ring for
emptiness. On GPU side IRQ for retire is triggered by CACHE_FLUSH_TS
event and RPTR shadow (which is used to check ring emptiness) is updated
a bit later from CP_CONTEXT_SWITCH_YIELD. Thus if GPU is executing its
last commands slow enough or we check that ring too fast we will miss a
chance to trigger switch to lower priority ring because current ring isn't
empty just yet. This can escalate to lock-up situation described in
previous patch.
To work-around this issue we keep track of last submit sequence number
for each ring and compare it with one written to memptrs from GPU during
execution of CACHE_FLUSH_TS event.
Fixes: b1fc2839d2f9 ("drm/msm: Implement preemption for A5XX targets")
Signed-off-by: Vladimir Lypak <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/612047/
Signed-off-by: Rob Clark <[email protected]>
|
|
On A5XX GPUs when preemption is used it's invietable to enter a soft
lock-up state in which GPU is stuck at empty ring-buffer doing nothing.
This appears as full UI lockup and not detected as GPU hang (because
it's not). This happens due to not triggering preemption when it was
needed. Sometimes this state can be recovered by some new submit but
generally it won't happen because applications are waiting for old
submits to retire.
One of the reasons why this happens is a race between a5xx_submit and
a5xx_preempt_trigger called from IRQ during submit retire. Former thread
updates ring->cur of previously empty and not current ring right after
latter checks it for emptiness. Then both threads can just exit because
for first one preempt_state wasn't NONE yet and for second one all rings
appeared to be empty.
To prevent such situations from happening we need to establish guarantee
for preempt_trigger to make decision after each submit or retire. To
implement this we serialize preemption initiation using spinlock. If
switch is already in progress we need to re-trigger preemption when it
finishes.
Fixes: b1fc2839d2f9 ("drm/msm: Implement preemption for A5XX targets")
Signed-off-by: Vladimir Lypak <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/612045/
Signed-off-by: Rob Clark <[email protected]>
|
|
Two fields of preempt_record which are used by CP aren't reset on
resume: "data" and "info". This is the reason behind faults which happen
when we try to switch to the ring that was active last before suspend.
In addition those faults can't be recovered from because we use suspend
and resume to do so (keeping values of those fields again).
Fixes: b1fc2839d2f9 ("drm/msm: Implement preemption for A5XX targets")
Signed-off-by: Vladimir Lypak <[email protected]>
Reviewed-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/612043/
Signed-off-by: Rob Clark <[email protected]>
|
|
Fine grain preemption (switching from/to points within submits)
requires extra handling in command stream of those submits, especially
when rendering with tiling (using GMEM). However this handling is
missing at this point in mesa (and always was). For this reason we get
random GPU faults and hangs if more than one priority level is used
because local preemption is enabled prior to executing command stream
from submit.
With that said it was ahead of time to enable local preemption by
default considering the fact that even on downstream kernel it is only
enabled if requested via UAPI.
Fixes: a7a4c19c36de ("drm/msm/a5xx: fix setting of the CP_PREEMPT_ENABLE_LOCAL register")
Signed-off-by: Vladimir Lypak <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/612041/
Signed-off-by: Rob Clark <[email protected]>
|
|
There are some cases, such as the one uncovered by Commit 46d4efcccc68
("drm/msm/a6xx: Avoid a nullptr dereference when speedbin setting fails")
where
msm_gpu_cleanup() : platform_set_drvdata(gpu->pdev, NULL);
is called on gpu->pdev == NULL, as the GPU device has not been fully
initialized yet.
Turns out that there's more than just the aforementioned path that
causes this to happen (e.g. the case when there's speedbin data in the
catalog, but opp-supported-hw is missing in DT).
Assigning msm_gpu->pdev earlier seems like the least painful solution
to this, therefore do so.
Signed-off-by: Konrad Dybcio <[email protected]>
Patchwork: https://patchwork.freedesktop.org/patch/602742/
Signed-off-by: Rob Clark <[email protected]>
|
|
In the LTC2991, V5 and V6 channels use the low nibble of the
"V5, V6, V7, and V8 Control Register" for configuration, but currently,
the high nibble is defined.
This patch changes the defines to use the low nibble.
Fixes: 2b9ea4262ae9 ("hwmon: Add driver for ltc2991")
Signed-off-by: Pawel Dembicki <[email protected]>
Message-ID: <[email protected]>
Signed-off-by: Guenter Roeck <[email protected]>
|
|
The fscache_cookie_lru_timer is initialized when the fscache module
is inserted, but is not deleted when the fscache module is removed.
If timer_reduce() is called before removing the fscache module,
the fscache_cookie_lru_timer will be added to the timer list of
the current cpu. Afterwards, a use-after-free will be triggered
in the softIRQ after removing the fscache module, as follows:
==================================================================
BUG: unable to handle page fault for address: fffffbfff803c9e9
PF: supervisor read access in kernel mode
PF: error_code(0x0000) - not-present page
PGD 21ffea067 P4D 21ffea067 PUD 21ffe6067 PMD 110a7c067 PTE 0
Oops: Oops: 0000 [#1] PREEMPT SMP KASAN PTI
CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Tainted: G W 6.11.0-rc3 #855
Tainted: [W]=WARN
RIP: 0010:__run_timer_base.part.0+0x254/0x8a0
Call Trace:
<IRQ>
tmigr_handle_remote_up+0x627/0x810
__walk_groups.isra.0+0x47/0x140
tmigr_handle_remote+0x1fa/0x2f0
handle_softirqs+0x180/0x590
irq_exit_rcu+0x84/0xb0
sysvec_apic_timer_interrupt+0x6e/0x90
</IRQ>
<TASK>
asm_sysvec_apic_timer_interrupt+0x1a/0x20
RIP: 0010:default_idle+0xf/0x20
default_idle_call+0x38/0x60
do_idle+0x2b5/0x300
cpu_startup_entry+0x54/0x60
start_secondary+0x20d/0x280
common_startup_64+0x13e/0x148
</TASK>
Modules linked in: [last unloaded: netfs]
==================================================================
Therefore delete fscache_cookie_lru_timer when removing the fscahe module.
Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning")
Cc: [email protected]
Signed-off-by: Baokun Li <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Acked-by: David Howells <[email protected]>
Signed-off-by: Christian Brauner <[email protected]>
|
|
|
|
Pull smb client fixes from Steve French:
- copy_file_range fix
- two read fixes including read past end of file rc fix and read retry
crediting fix
- falloc zero range fix
* tag 'v6.11-rc5-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix FALLOC_FL_ZERO_RANGE to preflush buffered part of target region
cifs: Fix copy offload to flush destination region
netfs, cifs: Fix handling of short DIO read
cifs: Fix lack of credit renegotiation on read retry
|
|
Push bcachefs fixes from Kent Overstreet:
"The data corruption in the buffered write path is troubling; inode
lock should not have been able to cause that...
- Fix a rare data corruption in the rebalance path, caught as a nonce
inconsistency on encrypted filesystems
- Revert lockless buffered write path
- Mark more errors as autofix"
* tag 'bcachefs-2024-08-21' of https://github.com/koverstreet/bcachefs:
bcachefs: Mark more errors as autofix
bcachefs: Revert lockless buffered IO path
bcachefs: Fix bch2_extents_match() false positive
bcachefs: Fix failure to return error in data_update_index_update()
|
|
raw_copy_{to,from}_user() do not call access_ok(), so this code allowed
userspace to access any virtual memory address.
Cc: [email protected]
Fixes: 7c83232161f6 ("riscv: add support for misaligned trap handling in S-mode")
Fixes: 441381506ba7 ("riscv: misaligned: remove CONFIG_RISCV_M_MODE specific code")
Signed-off-by: Samuel Holland <[email protected]>
Reviewed-by: Alexandre Ghiti <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|
|
errors that are known to always be safe to fix should be autofix: this
should be most errors even at this point, but that will need some
thorough review.
note that errors are still logged in the superblock, so we'll still know
that they happened.
Signed-off-by: Kent Overstreet <[email protected]>
|
|
We had a report of data corruption on nixos when building installer
images.
https://github.com/NixOS/nixpkgs/pull/321055#issuecomment-2184131334
It seems that writes are being dropped, but only when issued by QEMU,
and possibly only in snapshot mode. It's undetermined if it's write
calls are being dropped or dirty folios.
Further testing, via minimizing the original patch to just the change
that skips the inode lock on non appends/truncates, reveals that it
really is just not taking the inode lock that causes the corruption: it
has nothing to do with the other logic changes for preserving write
atomicity in corner cases.
It's also kernel config dependent: it doesn't reproduce with the minimal
kernel config that ktest uses, but it does reproduce with nixos's distro
config. Bisection the kernel config initially pointer the finger at page
migration or compaction, but it appears that was erroneous; we haven't
yet determined what kernel config option actually triggers it.
Sadly it appears this will have to be reverted since we're getting too
close to release and my plate is full, but we'd _really_ like to fully
debug it.
My suspicion is that this patch is exposing a preexisting bug - the
inode lock actually covers very little in IO paths, and we have a
different lock (the pagecache add lock) that guards against races with
truncate here.
Fixes: 7e64c86cdc6c ("bcachefs: Buffered write path now can avoid the inode lock")
Signed-off-by: Kent Overstreet <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull misc fixes from Guenter Roeck.
These are fixes for regressions that Guenther has been reporting, and
the maintainers haven't picked up and sent in. With rc6 fairly imminent,
I'm taking them directly from Guenter.
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
apparmor: fix policy_unpack_test on big endian systems
Revert "MIPS: csrc-r4k: Apply verification clocksource flags"
microblaze: don't treat zero reserved memory regions as error
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux
Pull power sequencing fix from Bartosz Golaszewski:
"A follow-up fix for the power sequencing subsystem. It turned out the
previous fix for this driver was incomplete and broke the WLAN support
on some platforms. This addresses the issue.
- set the direction of the wlan-enable GPIO to output after
requesting it as-is"
* tag 'pwrseq-fixes-for-v6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
power: sequencing: qcom-wcn: set the wlan-enable GPIO to output
|
|
Commit a9aaf1ff88a8 ("power: sequencing: request the WLAN enable GPIO
as-is") broke WLAN on boards on which the wlan-enable GPIO enabling the
wifi module isn't in output mode by default. We need to set direction to
output while retaining the value that was already set to keep the ath
module on if it's already started.
Fixes: a9aaf1ff88a8 ("power: sequencing: request the WLAN enable GPIO as-is")
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB fixes for 6.11-rc6. Included in here are:
- dwc3 driver fixes for reported issues
- MAINTAINER file update, marking a driver as unsupported :(
- cdnsp driver fixes
- USB gadget driver fix
- USB sysfs fix
- other tiny fixes
- new device ids for usb serial driver
All of these have been in linux-next this week with no reported
issues"
* tag 'usb-6.11-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: serial: option: add MeiG Smart SRM825L
usb: cdnsp: fix for Link TRB with TC
usb: dwc3: st: add missing depopulate in probe error path
usb: dwc3: st: fix probed platform device ref count on probe error path
usb: dwc3: ep0: Don't reset resource alloc flag (including ep0)
usb: core: sysfs: Unmerge @usb3_hardware_lpm_attr_group in remove_power_attributes()
usb: typec: fsa4480: Relax CHIP_ID check
usb: dwc3: xilinx: add missing depopulate in probe error path
usb: dwc3: omap: add missing depopulate in probe error path
dt-bindings: usb: microchip,usb2514: Fix reference USB device schema
usb: gadget: uvc: queue pump work in uvcg_video_enable()
cdc-acm: Add DISABLE_ECHO quirk for GE HealthCare UI Controller
usb: cdnsp: fix incorrect index in cdnsp_get_hw_deq function
usb: dwc3: core: Prevent USB core invalid event buffer address access
MAINTAINERS: Mark UVC gadget driver as orphan
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Minor fixes only.
The sd.c one ignores a sync cache request if format is in progress
which can happen if formatting a drive across suspend/resume"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: Ignore command SYNCHRONIZE CACHE error if format in progress
scsi: aacraid: Fix double-free on probe failure
scsi: lpfc: Fix overflow build issue
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
Pull nfsd fix from Chuck Lever:
- One more write delegation fix
* tag 'nfsd-6.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux:
nfsd: fix nfsd4_deleg_getattr_conflict in presence of third party lease
|
|
Pull xfs fixes from Chandan Babu:
- Do not call out v1 inodes with non-zero di_nlink field as being
corrupt
- Change xfs_finobt_count_blocks() to count "free inode btree" blocks
rather than "inode btree" blocks
- Don't report the number of trimmed bytes via FITRIM because the
underlying storage isn't required to do anything and failed discard
IOs aren't reported to the caller anyway
- Fix incorrect setting of rm_owner field in an rmap query
- Report missing disk offset range in an fsmap query
- Obtain m_growlock when extending realtime section of the filesystem
- Reset rootdir extent size hint after extending realtime section of
the filesystem
* tag 'xfs-6.11-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: reset rootdir extent size hint after growfsrt
xfs: take m_growlock when running growfsrt
xfs: Fix missing interval for missing_owner in xfs fsmap
xfs: use XFS_BUF_DADDR_NULL for daddrs in getfsmap code
xfs: Fix the owner setting issue for rmap query in xfs fsmap
xfs: don't bother reporting blocks trimmed via FITRIM
xfs: xfs_finobt_count_blocks() walks the wrong btree
xfs: fix folio dirtying for XFILE_ALLOC callers
xfs: fix di_onlink checking for V1/V2 inodes
|