Age | Commit message (Collapse) | Author | Files | Lines |
|
The core infrastructure has everything in place to switch ICU to per
device MSI domains and avoid the convoluted construct of the existing
platform-MSI layering violation.
The new infrastructure provides a wired interrupt specific interface in the
MSI core which converts the 'hardware interrupt number + trigger type'
allocation which is required for wired interrupts in the regular irqdomain
code to a normal MSI allocation.
The hardware interrupt number and the trigger type are stored in the MSI
descriptor device cookie by the core code so the ICU specific code can
retrieve them.
The new per device domain is only instantiated when the irqdomain which is
associated to the ICU device provides MSI parent functionality. Up to
that point it invokes the existing code. Once the parent is converted the
code for the current platform-MSI mechanism is removed.
The new domain shares the interrupt chip callbacks and the translation
function. The only new functionality aside of filling out the
msi_domain_templates is a domain specific set_desc() callback, which will go
away once all platform-MSI code has been converted.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
All platform MSI users and the PCI/MSI code handle per device MSI domains
when the irqdomain associated to the device provides MSI parent
functionality.
Remove the "global" platform domain related code and provide the MSI parent
functionality by filling in msi_parent_ops.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
All platform MSI users and the PCI/MSI code handle per device MSI domains
when the irqdomain associated to the device provides MSI parent
functionality.
Remove the "global" PCI/MSI and platform domain related code and provide
the MSI parent functionality by filling in msi_parent_ops.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The MBI chip creates two MSI domains:
- PCI/MSI
- Platform device domain
Both have the MBI domain as parent and differ slightly in the interrupt
chip callbacks and the platform device domain supports level type
signaling.
Convert it over to the MSI parent domain mechanism by:
- Providing the required templates
- Implementing a custom init_dev_msi_info() callback which sets the chip
callbacks and the level support flags depending on the domain bus token
type of the per device domain.
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
No more users.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Now that ITS provides the MSI parent domain, remove the unused fallback
code.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Similar to the previous conversion of the PCI/MSI support lift the
prepare() callback from the existing platform MSI code and enable
platform MSI and the related device domain bus tokens in select
and the child domain initialization code.
All platform MSI users are automatically using the new per device MSI model
now.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add the new bus token to the accepted list of child domain tokens.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The core infrastructure has everything in place to switch MBIGEN to per
device MSI domains and avoid the convoluted construct of the existing
platform-MSI layering violation.
The new infrastructure provides a wired interrupt specific interface in the
MSI core which converts the 'hardware interrupt number + trigger type'
allocation which is required for wired interrupts in the regular irqdomain
code to a normal MSI allocation.
The hardware interrupt number and the trigger type are stored in the MSI
descriptor device cookie by the core code so the MBIGEN specific code can
retrieve them.
The new per device domain is only instantiated when the irqdomain which is
associated to the MBIGEN device provides MSI parent functionality. Up to
that point it invokes the existing code. Once the parent is converted the
code for the current platform-MSI mechanism is removed.
The new domain shares the interrupt chip callbacks and the translation
function. The only new functionality aside of filling out the
msi_domain_template is a domain specific set_desc() callback, which will go
away once all platform-MSI code has been converted.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add the prerequisites for DEVICE MSI into the shared select() and child
domain init function. These domains are really trivial and just provide a
custom irq chip callback to write the MSI message.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
The its_pci_msi_prepare() function from the ITS-PCI/MSI code provides the
'global' PCI/MSI domains. Move this function to the ITS-MSI parent code and
amend the function to use the domain hardware size, which is the MSI[X]
vector count, for allocating the ITS slots for the PCI device.
Enable PCI matching in msi_parent_ops and provide the necessary update to
the ITS specific child domain initialization function so that the prepare
callback gets invoked on allocations.
The latter might be optimized to do the allocation right at the point where
the child domain is initialized, but keep it simple for now.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Add the bus tokens for DOMAIN_BUS_PCI_DEVICE_MSI and
DOMAIN_BUS_PCI_DEVICE_MSIX to the common child init
function.
Provide the match mask which can be used by parent domain
implementation so the bitmask based child bus token match
works.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
To support per device MSI domains the ITS must provide MSI parent domain
functionality.
Provide the basic skeleton for this:
- msi_parent_ops
- child domain init callback
- the MSI parent flag set in irqdomain::flags
This does not make ITS a functional parent domain as there is no bit set in
the bus_select_mask yet, but it provides the base to implement PCI and
platform MSI support gradually on top.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
All irqdomains which provide MSI parent domain functionality for per device
MSI domains need to provide a select() callback for the irqdomain and a
function to initialize the child domain.
Most of these functions would just be copy&paste with minimal
modifications, so provide a library function which implements the required
functionality and is customizable via parent_domain::msi_parent_ops. The
check for the supported bus tokens in msi_lib_init_dev_msi_info() is
expanded step by step within the next patches.
Signed-off-by: Thomas Gleixner <[email protected]>
Signed-off-by: Anna-Maria Behnsen <[email protected]>
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
|
|
Most ARM(64) PCI/MSI domains mask and unmask in the parent domain after or
before the PCI mask/unmask operation takes place. So there are more than a
dozen of the same wrapper implementation all over the place.
Don't make the same mistake with the new per device PCI/MSI domains and
provide a new MSI feature flag, which lets the domain implementation
enable this sequence in the PCI/MSI code.
Signed-off-by: Shivamurthy Shastri <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Acked-by: Bjorn Helgaas <[email protected]>
Link: https://lore.kernel.org/r/87ed8j34pj.ffs@tglx
|
|
of_platform_populate()
Commit 50b040ef3732 ("PCI/pwrctl: only call of_platform_populate() if
CONFIG_OF is enabled") added the CONFIG_OF guard for the
of_platform_populate() API. But it missed the fact that the CONFIG_OF
platforms can also run on ACPI without devicetree (so dev.of_node will
be NULL). In those cases, of_platform_populate() will fail with below
error messages as seen on the Ampere Altra box:
pci 000c:00:01.0: failed to populate child OF nodes (-22)
pci 000c:00:02.0: failed to populate child OF nodes (-22)
Fix this by checking for the existence of 'dev.of_node' before calling
the of_platform_populate() API. This also warrants the removal of
CONFIG_OF check, since dev_of_node() helper will return NULL if
CONFIG_OF is not enabled.
While at it, let's also use dev_of_node() to pass device OF node pointer
to of_platform_populate().
Fixes: 50b040ef3732 ("PCI/pwrctl: only call of_platform_populate() if CONFIG_OF is enabled")
Reported-by: Linus Torvalds <[email protected]>
Closes: https://lore.kernel.org/linux-arm-msm/CAHk-=wjcO_9dkNf-bNda6bzykb5ZXWtAYA97p7oDsXPHmMRi6g@mail.gmail.com
Reviewed-by: Bartosz Golaszewski <[email protected]>
Signed-off-by: Manivannan Sadhasivam <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Several versions of GCC mis-compile asm goto with outputs. We try to
workaround this, but our workaround is demonstrably incomplete and
liable to result in subtle bugs, especially on arm64 where get_user()
has recently been moved over to using asm goto with outputs.
From discussion(s) with Linus at:
https://lore.kernel.org/linux-arm-kernel/[email protected]/
https://lore.kernel.org/linux-arm-kernel/[email protected]/
... it sounds like the best thing to do for now is to remove the
workaround and make CC_HAS_ASM_GOTO_OUTPUT depend on working compiler
versions.
The issue was originally reported to GCC by Sean Christopherson:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113921
... and Jakub Jelinek fixed this for GCC 14, with the fix backported to
13.3.0, 12.4.0, and 11.5.0.
In the kernel, we tried to workaround broken compilers in commits:
4356e9f841f7 ("work around gcc bugs with 'asm goto' with outputs")
68fb3ca0e408 ("update workarounds for gcc "asm goto" issue")
... but the workaround of adding an empty asm("") after the asm volatile
goto(...) demonstrably does not always avoid the problem, as can be seen
in the following test case:
| #define asm_goto_output(x...) \
| do { asm volatile goto(x); asm (""); } while (0)
|
| #define __good_or_bad(__val, __key) \
| do { \
| __label__ __failed; \
| unsigned long __tmp; \
| asm_goto_output( \
| " cbnz %[key], %l[__failed]\n" \
| " mov %[val], #0x900d\n" \
| : [val] "=r" (__tmp) \
| : [key] "r" (__key) \
| : \
| : __failed); \
| (__val) = __tmp; \
| break; \
| __failed: \
| (__val) = 0xbad; \
| } while (0)
|
| unsigned long get_val(unsigned long key);
| unsigned long get_val(unsigned long key)
| {
| unsigned long val = 0xbad;
|
| __good_or_bad(val, key);
|
| return val;
| }
GCC 13.2.0 (at -O2) compiles this to:
| cbnz x0, .Lfailed
| mov x0, #0x900d
| .Lfailed:
| ret
GCC 14.1.0 (at -O2) compiles this to:
| cbnz x0, .Lfailed
| mov x0, #0x900d
| ret
| .Lfailed:
| mov x0, #0xbad
| ret
Note that GCC 13.2.0 erroneously omits the assignment to 'val' in the
error path (even though this does not depend on an output of the asm
goto). GCC 14.1.0 correctly retains the assignment.
This problem can be seen within the kernel with the following test case:
| #include <linux/uaccess.h>
| #include <linux/types.h>
|
| noinline unsigned long test_unsafe_get_user(unsigned long __user *ptr);
| noinline unsigned long test_unsafe_get_user(unsigned long __user *ptr)
| {
| unsigned long val;
|
| unsafe_get_user(val, ptr, Efault);
| return val;
|
| Efault:
| val = 0x900d;
| return val;
| }
GCC 13.2.0 (arm64 defconfig) compiles this to:
| and x0, x0, #0xff7fffffffffffff
| ldtr x0, [x0]
| .Lextable_fixup:
| ret
GCC 13.2.0 (x86_64 defconfig + MITIGATION_RETPOLINE=n) compiles this to:
| endbr64
| mov (%rdi),%rax
| .Lextable_fixup:
| ret
... omitting the assignment to 'val' in the error path, and leaving
garbage in the result register returned by the function (which happens
to contain the faulting address in the generated code).
GCC 14.1.0 (arm64 defconfig) compiles this to:
| and x0, x0, #0xff7fffffffffffff
| ldtr x0, [x0]
| ret
| .Lextable_fixup:
| mov x0, #0x900d // #36877
| ret
GCC 14.1.0 (x86_64 defconfig + MITIGATION_RETPOLINE=n) compiles this to:
| endbr64
| mov (%rdi),%rax
| ret
| .Lextable_fixup:
| mov $0x900d,%eax
| ret
... retaining the expected assignment to 'val' in the error path.
We don't have a complete and reasonable workaround. While placing empty
asm("") blocks after each goto label *might* be sufficient, we don't
know for certain, this is tedious and error-prone, and there doesn't
seem to be a neat way to wrap this up (which is especially painful for
cases with multiple goto labels).
Avoid this issue by disabling CONFIG_CC_HAS_ASM_GOTO_OUTPUT for
known-broken compiler versions and removing the workaround (along with
the CONFIG_GCC_ASM_GOTO_OUTPUT_WORKAROUND config option).
For the moment I've left the default implementation of asm_goto_output()
unchanged. This should now be redundant since any compiler with the fix
for the clobbering issue whould also have a fix for the (earlier)
volatile issue, but it's far less churny to leave it around, which makes
it easier to backport this patch if necessary.
Signed-off-by: Mark Rutland <[email protected]>
Cc: Alex Coplan <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Jakub Jelinek <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sean Christopherson <[email protected]>
Cc: Szabolcs Nagy <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: [email protected]
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit fbb5c0606fa4 ("kbuild: add syscall table generation to
scripts/Makefile.asm-headers") started to generate syscall headers
for architectures using generic syscalls.
However, these headers are always rebuilt using GNU Make 4.4.1 or newer.
When using GNU Make 4.4 or older, these headers are not rebuilt when the
command to generate them is changed, despite the use of the if_changed
macro.
scripts/Makefile.asm-headers now uses FORCE, but it is not marked as
.PHONY. To handle the command line change correctly, .*.cmd files must
be included.
Fixes: fbb5c0606fa4 ("kbuild: add syscall table generation to scripts/Makefile.asm-headers")
Reported-by: Linus Torvalds <[email protected]>
Closes: https://lore.kernel.org/lkml/CAHk-=wibB7SvXnUftBgAt+4-3vEKRpvEgBeDEH=i=j2GvDitoA@mail.gmail.com/
Signed-off-by: Masahiro Yamada <[email protected]>
Tested-by: Arnd Bergmann <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Pull drm updates from Dave Airlie:
"There's a lot of stuff in here, amd, i915 and xe have new platform
work, lots of core rework around EDID handling, some new COMPILE_TEST
options, maintainer changes and a lots of other stuff. Summary:
core:
- deprecate DRM data and return 0 date
- connector: Create a set of helpers to help with HDMI support
- Remove driver owner assignments
- Allow more drivers to compile with COMPILE_TEST
- Conversions to drm_edid
- Sprinkle MODULE_DESCRIPTIONS everywhere they are missing
- Remove drm_mm_replace_node
- print: Add a drm prefix to warn level messages too, remove
___drm_dbg, consolidate prefix handling
- New monochrome TV mode variant
ttm:
- improve number of page faults on some platforms
- fix test builds under PREEMPT_RT
- more test coverage
ci:
- Require a more recent version of mesa
- improve farm setup and test generation
dma-buf:
- warn if reserving 0 fence slots
- internal API heap enhancements
fbdev:
- Create memory manager optimized fbdev emulation
panic:
- Allow to select fonts
- improve drm_fb_dma_get_scanout_buffer
- Allow to dump kmsg to the screen
bridge:
- Remove redundant checks on bridge->encoder
- Remove drm_bridge_chain_mode_fixup
- bridge-connector: Plumb in the new HDMI helper
- analogix_dp: Various improvements, handle AUX transfers timeout
- samsung-dsim: Fix timings calculation
- tc358767: Plenty of small fixes, fix no connector attach, fix
clocks
- sii902x: state validation improvements
panels:
- Switch panels from register table initialization to proper code
- Now that the panel code tracks the panel state, remove every ad-hoc
implementation in the panel drivers
- More cleanup of prepare / enable state tracking in drivers
- edp: Drop legacy panel compatibles
- simple-bridge: Switch to devm_drm_bridge_add
- New panels: Lincoln Tech Sol LCD185-101CT, Microtips Technology
13-101HIEBCAF0-C, Microtips Technology MF-103HIEB0GA0,
BOE nv110wum-l60, IVO t109nw41, WL-355608-A8, PrimeView
PM070WL4, Lincoln Technologies LCD197, Ortustech
COM35H3P70ULC, AUO G104STN01, K&d kd101ne3-40ti
amdgpu:
- DCN 4.0.x support
- GC 12.0 support
- GMC 12.0 support
- SDMA 7.0 support
- MES12 support
- MMHUB 4.1 support
- GFX12 modifier and DCC support
- lots of IP fixes/updates
amdkfd:
- Contiguous VRAM allocations
- GC 12.0 support
- SDMA 7.0 support
- SR-IOV fixes
- KFD GFX ALU exceptions
i915:
- Battlemage Xe2 HPD display enablement
- Panel Replay enabling
- DP AUX-less ALPM/LOBF
- Enable link training failure fallback for DP MST links
- CMRR (Content Match Refresh Rate) enabling
- Increase ADL-S/ADL-P/DG2+ max TMDS bitrate to 6 Gbps
- Enable eDP AUX based HDR backlight
- Support replaying GPU hangs with captured context image
- Automate CCS Mode setting during engine resets
- lots of refactoring
- Support replaying GPU hangs with captured context image
- Increase FLR timeout from 3s to 9s
- Enable w/a 16021333562 for DG2, MTL and ARL [guc]
xe:
- update MAINATINERS
- New uapi adding OA functionality to Xe
- expose l3 bank mask
- fix display detect on ADL-N
- runtime PM Fixes
- Fix silent backmerge issues
- More prep for SR-IOV
- HWmon additions
- per client usage info
- Rework GPU page fault handling
- Drop EXEC_QUEUE_FLAG_BANNED
- Add BMG PCI IDs
- Scheduler fixes and improvements
- Rename xe_exec_queue::compute to xe_exec_queue::lr
- Use ttm_uncached for BO with NEEDS_UC flag
- Rename xe perf layer as xe observation layer
- lots of refactoring
radeon:
- Backlight workaround for iMac
- Silence UBSAN flex array warnings
msm:
- Validate registers XML description against schema in CI
- core/dpu: SM7150 support
- mdp5: Add support for MSM8937
- gpu: Add param for userspace to know if raytracing is supported
- gpu: X185 support (aka gpu in X1 laptop chips)
- gpu: a505 support
ivpu:
- hardware scheduler support
- profiling support
- improvements to the platform support layer
- firmware handling improvements
- clocks/power mgmt improvements
- scheduler/logging improvements
habanalabs:
- Gradual sleep in polling memory macro
- Reduce Gaudi2 MSI-X interrupt count to 128
- Add Gaudi2-D revision support
- Add timestamp to CPLD info
- Gaudi2: Assume hard-reset by firmware upon MC SEI severe error
- Align Gaudi2 interrupt names
- Check for errors after preboot is ready
- Change habanalabs maintainer and git repo path
mgag200:
- refactoring and improvements
- Add BMC output
- enable polling
nouveau:
- add registry command line
v3d:
- perf counters improvements
zynqmp:
- irq and debugfs improvements
atmel-hlcdc:
- Support XLCDC in sam9x7
mipi-dbi:
- Remove mipi_dbi_machine_little_endian
- make SPI bits per word configurable
- support RGB888
- allow pixel formats to be specified in the DT
sun4i:
- Rework the blender setup for DE2
panfrost:
- Enable MT8188 support
vc4:
- Monochrome TV support
exynos:
- fix fallback mode regression
- fix memory leak
- Use drm_edid_duplicate() instead of kmemdup()
etnaviv:
- fix i.MX8MP NPU clock gating
- workaround FE register cdc issues on some cores
- fix DMA sync handling for cached buffers
- fix job timeout handling
- keep TS enabled on MMUv2 cores for improved performance
mediatek:
- Convert to platform remove callback returning void-
- Drop chain_mode_fixup call in mode_valid()
- Fixes the errors of MediaTek display driver found by IGT
- Add display support for the MT8365-EVK board
- Fix bit depth overwritten for mtk_ovl_set bit_depth()
- Fix possible_crtcs calculation
- Fix spurious kfree()
ast:
- refactor mode setting code
stm:
- Add LVDS support
- DSI PHY updates"
* tag 'drm-next-2024-07-18' of https://gitlab.freedesktop.org/drm/kernel: (2501 commits)
drm/amdgpu/mes12: add missing opcode string
drm/amdgpu/mes11: update opcode strings
Revert "drm/amd/display: Reset freesync config before update new state"
drm/omap: Restrict compile testing to PAGE_SIZE less than 64KB
drm/xe: Drop trace_xe_hw_fence_free
drm/xe/uapi: Rename xe perf layer as xe observation layer
drm/amdgpu: remove exp hw support check for gfx12
drm/amdgpu: timely save bad pages to eeprom after gpu ras reset is completed
drm/amdgpu: flush all cached ras bad pages to eeprom
drm/amdgpu: select compute ME engines dynamically
drm/amd/display: Allow display DCC for DCN401
drm/amdgpu: select compute ME engines dynamically
drm/amdgpu/job: Replace DRM_INFO/ERROR logging
drm/amdgpu: select compute ME engines dynamically
drm/amd/pm: Ignore initial value in smu response register
drm/amdgpu: Initialize VF partition mode
drm/amd/amdgpu: fix SDMA IRQ client ID <-> req mapping
MAINTAINERS: fix Xinhui's name
MAINTAINERS: update powerplay and swsmu
drm/qxl: Pin buffer objects for internal mappings
...
|
|
The GSS routine errors are values, not flags.
Fixes: 0c77668ddb4e ("SUNRPC: Introduce trace points in rpc_auth_gss.ko")
Signed-off-by: Benjamin Coddington <[email protected]>
Reviewed-by: Chuck Lever <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
We've observed NFS clients with sync tasks sleeping in __rpc_execute
waiting on RPC_TASK_QUEUED that have not responded to a wake-up from
rpc_make_runnable(). I suspect this problem usually goes unnoticed,
because on a busy client the task will eventually be re-awoken by another
task completion or xprt event. However, if the state manager is draining
the slot table, a sync task missing a wake-up can result in a hung client.
We've been able to prove that the waker in rpc_make_runnable() successfully
calls wake_up_bit() (ie- there's no race to tk_runstate), but the
wake_up_bit() call fails to wake the waiter. I suspect the waker is
missing the load of the bit's wait_queue_head, so waitqueue_active() is
false. There are some very helpful comments about this problem above
wake_up_bit(), prepare_to_wait(), and waitqueue_active().
Fix this by inserting smp_mb__after_atomic() before the wake_up_bit(),
which pairs with prepare_to_wait() calling set_current_state().
Signed-off-by: Benjamin Coddington <[email protected]>
Reviewed-by: Jeff Layton <[email protected]>
Signed-off-by: Anna Schumaker <[email protected]>
|
|
Drivers report a string with a name for each PCM, log it during startup of
pcm-test as a diagnostic aid.
Signed-off-by: Mark Brown <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
Currently for the PCM and mixer tests we report test names which identify
the card being tested with the card number. This ensures we have unique
names but since card numbers are dynamically assigned at runtime the names
we end up with will often not be stable on systems with multiple cards
especially where those cards are provided by separate modules loeaded at
runtime. This makes it difficult for automated systems and UIs to relate
test results between runs on affected platforms.
Address this by replacing our use of card numbers with card names which are
more likely to be stable across runs. We use the card ID since it is
guaranteed to be unique by default, unlike the long name. There is still
some vulnerability to ordering issues if multiple cards with the same base
ID are present in the system but have separate dependencies but not all
drivers put distinguishing information in their long names.
Signed-off-by: Mark Brown <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
Samsung Galaxy Book Pro 360 (13" 2022 NT935QDB-KC71S) with codec SSID
144d:c1a4 requires the same workaround to enable the speaker amp
as other Samsung models with the ALC298 codec.
Signed-off-by: Seunghun Han <[email protected]>
Cc: <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Takashi Iwai <[email protected]>
|
|
Kconfig will ask the user twice about power sequencing: once for the QCom
WCN power sequencing driver and then again for the PCI power control
driver using it.
Let's automate the selection of PCI_PWRCTL by introducing a new hidden
symbol: HAVE_PWRCTL which should be selected by all platforms that have
the need to include PCI power control code (right now: only ARCH_QCOM).
The pwrseq-based PCI pwrctl driver itself will then be selected by the
drivers binding to devices that may require external handling of the
power-up sequence (currently: ath11k and ath12k) based on the value
of HAVE_PWRCTL.
Make all PCI pwrctl Kconfig symbols hidden so that no questions are
asked during configuration.
Fixes: 4565d2652a37 ("PCI/pwrctl: Add PCI power control core code")
Reported-by: Linus Torvalds <[email protected]>
Closes: https://lore.kernel.org/lkml/CAHk-=wjWc5dzcj2O1tEgNHY1rnQW63JwtuZi_vAZPqy6wqpoUQ@mail.gmail.com/
Acked-by: Jeff Johnson <[email protected]> # drivers/net/wireless/ath
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Bartosz Golaszewski <[email protected]>
|
|
The iwlwifi wireless driver registers a thermal zone that is only needed
when the network interface handled by it is up and it wants that thermal
zone to be effectively ignored by the core otherwise.
Before commit a8a261774466 ("thermal: core: Call monitor_thermal_zone()
if zone temperature is invalid") that could be achieved by returning
an error code from the thermal zone's .get_temp() callback because the
core did not really handle errors returned by it almost at all.
However, commit a8a261774466 made the core attempt to recover from the
situation in which the temperature of a thermal zone cannot be
determined due to errors returned by its .get_temp() and is always
invalid from the core's perspective.
That was done because there are thermal zones in which .get_temp()
returns errors to start with due to some difficulties related to the
initialization ordering, but then it will start to produce valid
temperature values at one point.
Unfortunately, the simple approach taken by commit a8a261774466,
which is to poll the thermal zone periodically until its .get_temp()
callback starts to return valid temperature values, is at odds with
the special thermal zone in iwlwifi in which .get_temp() may always
return an error because its network interface may always be down. If
that happens, every attempt to invoke the thermal zone's .get_temp()
callback resulting in an error causes the thermal core to print a
dev_warn() message to the kernel log which is super-noisy.
To address this problem, make the core handle the case in which
.get_temp() returns 0, but the temperature value returned by it
is not actually valid, in a special way. Namely, make the core
completely ignore the invalid temperature value coming from
.get_temp() in that case, which requires folding in
update_temperature() into its caller and a few related changes.
On the iwlwifi side, modify iwl_mvm_tzone_get_temp() to return 0
and put THERMAL_TEMP_INVALID into the temperature return memory
location instead of returning an error when the firmware is not
running or it is not of the right type.
Also, to clearly separate the handling of invalid temperature
values from the thermal zone initialization, introduce a special
THERMAL_TEMP_INIT value specifically for the latter purpose.
Fixes: a8a261774466 ("thermal: core: Call monitor_thermal_zone() if zone temperature is invalid")
Closes: https://lore.kernel.org/linux-pm/[email protected]/
Reported-by: Eric Biggers <[email protected]>
Reported-by: Stefan Lippers-Hollmann <[email protected]>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=201761
Tested-by: Oleksandr Natalenko <[email protected]>
Tested-by: Stefan Lippers-Hollmann <[email protected]>
Cc: 6.10+ <[email protected]> # 6.10+
Signed-off-by: Rafael J. Wysocki <[email protected]>
Link: https://patch.msgid.link/[email protected]
[ rjw: Rebased on top of the current mainline ]
Signed-off-by: Rafael J. Wysocki <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf
Pablo Neira Ayuso says:
====================
Netfilter/IPVS fixes for net
The following patchset contains Netfilter/IPVS fixes for net:
1) Call nf_expect_get_id() to delete expectation by ID. By trial and
error it is possible to leak the LSB of the expectation address on
x86_64. This bug is a leftover when converting the existing code
to use nf_expect_get_id().
2) Incorrect initialization in pipapo set backend leads to packet
mismatches. From Florian Westphal.
3) Extend netfilter's selftests to cover for the pipapo set backend,
also from Florian.
4) Fix sparse warning in IPVS when adding service, from Chen Hanxiao.
netfilter pull request 24-07-17
* tag 'nf-24-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
ipvs: properly dereference pe in ip_vs_add_service
selftests: netfilter: add test case for recent mismatch bug
netfilter: nf_set_pipapo: fix initial map fill
netfilter: ctnetlink: use helper function to calculate expect ID
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Martin Willi says:
====================
net: dsa: Fix chip-wide frame size config in some drivers
Some DSA chips support a chip-wide frame size configurations, only. Some
drivers adjust that chip-wide setting for user port changes, overriding
the frame size requirements on the CPU port that includes tagger overhead.
Fix the mv88e6xxx and b53 drivers and align them to the behavior of other
drivers.
v2:
- Skip chip-wide config for non-CPU ports instead of finding the maximim
MTU over all ports
- Add a patch fixing the b53 driver as well
v1: https://lore.kernel.org/netdev/[email protected]/
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Broadcom switches supported by the b53 driver use a chip-wide jumbo frame
configuration. In the commit referenced with the Fixes tag, the setting
is applied just for the last port changing its MTU.
While configuring CPU ports accounts for tagger overhead, user ports do
not. When setting the MTU for a user port, the chip-wide setting is
reduced to not include the tagger overhead, resulting in an potentially
insufficient chip-wide maximum frame size for the CPU port.
As, by design, the CPU port MTU is adjusted for any user port change,
apply the chip-wide setting only for CPU ports. This aligns the driver
to the behavior of other switch drivers.
Fixes: 6ae5834b983a ("net: dsa: b53: add MTU configuration support")
Suggested-by: Vladimir Oltean <[email protected]>
Signed-off-by: Martin Willi <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Marvell chips not supporting per-port jumbo frame size configurations use
a chip-wide frame size configuration. In the commit referenced with the
Fixes tag, the setting is applied just for the last port changing its MTU.
While configuring CPU ports accounts for tagger overhead, user ports do
not. When setting the MTU for a user port, the chip-wide setting is
reduced to not include the tagger overhead, resulting in an potentially
insufficient maximum frame size for the CPU port. Specifically, sending
full-size frames from the CPU port on a MV88E6097 having a user port MTU
of 1500 bytes results in dropped frames.
As, by design, the CPU port MTU is adjusted for any user port change,
apply the chip-wide setting only for CPU ports.
Fixes: 1baf0fac10fb ("net: dsa: mv88e6xxx: Use chip-wide max frame size for MTU")
Suggested-by: Vladimir Oltean <[email protected]>
Signed-off-by: Martin Willi <[email protected]>
Reviewed-by: Vladimir Oltean <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Move page_pool_get_dma_dir() inside the while loop of
airoha_qdma_cleanup_rx_queue routine in order to avoid possible NULL
pointer dereference if airoha_qdma_init_rx_queue() fails before
properly allocating the page_pool pointer.
Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Lorenzo Bianconi <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/7330a41bba720c33abc039955f6172457a3a34f0.1721205981.git.lorenzo@kernel.org
Signed-off-by: Paolo Abeni <[email protected]>
|
|
add support for Dell DW5933e (0x14c0, 0x4d75)
Signed-off-by: Jack Wu <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Ido Schimmel says:
====================
ipv4: Fix incorrect TOS in route get reply
Two small fixes for incorrect TOS in route get reply. See more details
in the commit messages.
No regressions in FIB tests:
# ./fib_tests.sh
[...]
Tests passed: 218
Tests failed: 0
====================
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The TOS value that is returned to user space in the route get reply is
the one with which the lookup was performed ('fl4->flowi4_tos'). This is
fine when the matched route is configured with a TOS as it would not
match if its TOS value did not match the one with which the lookup was
performed.
However, matching on TOS is only performed when the route's TOS is not
zero. It is therefore possible to have the kernel incorrectly return a
non-zero TOS:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route get fibmatch 192.0.2.2 tos 0xfc
192.0.2.0/24 tos 0x1c dev dummy1 proto kernel scope link src 192.0.2.1
Fix by instead returning the DSCP field from the FIB result structure
which was populated during the route lookup.
Output after the patch:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route get fibmatch 192.0.2.2 tos 0xfc
192.0.2.0/24 dev dummy1 proto kernel scope link src 192.0.2.1
Extend the existing selftests to not only verify that the correct route
is returned, but that it is also returned with correct "tos" value (or
without it).
Fixes: b61798130f1b ("net: ipv4: RTM_GETROUTE: return matched fib result when requested")
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Guillaume Nault <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The TOS value that is returned to user space in the route get reply is
the one with which the lookup was performed ('fl4->flowi4_tos'). This is
fine when the matched route is configured with a TOS as it would not
match if its TOS value did not match the one with which the lookup was
performed.
However, matching on TOS is only performed when the route's TOS is not
zero. It is therefore possible to have the kernel incorrectly return a
non-zero TOS:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route get 192.0.2.2 tos 0xfc
192.0.2.2 tos 0x1c dev dummy1 src 192.0.2.1 uid 0
cache
Fix by adding a DSCP field to the FIB result structure (inside an
existing 4 bytes hole), populating it in the route lookup and using it
when filling the route get reply.
Output after the patch:
# ip link add name dummy1 up type dummy
# ip address add 192.0.2.1/24 dev dummy1
# ip route get 192.0.2.2 tos 0xfc
192.0.2.2 dev dummy1 src 192.0.2.1 uid 0
cache
Fixes: 1a00fee4ffb2 ("ipv4: Remove rt_key_{src,dst,tos} from struct rtable.")
Signed-off-by: Ido Schimmel <[email protected]>
Reviewed-by: David Ahern <[email protected]>
Reviewed-by: Guillaume Nault <[email protected]>
Signed-off-by: Paolo Abeni <[email protected]>
|
|
The following splat is easy to reproduce upstream as well as in -stable
kernels. Florian Westphal provided the following commit:
d1dab4f71d37 ("net: add and use __skb_get_hash_symmetric_net")
but this complementary fix has been also suggested by Willem de Bruijn
and it can be easily backported to -stable kernel which consists in
using DEBUG_NET_WARN_ON_ONCE instead to silence the following splat
given __skb_get_hash() is used by the nftables tracing infrastructure to
to identify packets in traces.
[69133.561393] ------------[ cut here ]------------
[69133.561404] WARNING: CPU: 0 PID: 43576 at net/core/flow_dissector.c:1104 __skb_flow_dissect+0x134f/
[...]
[69133.561944] CPU: 0 PID: 43576 Comm: socat Not tainted 6.10.0-rc7+ #379
[69133.561959] RIP: 0010:__skb_flow_dissect+0x134f/0x2ad0
[69133.561970] Code: 83 f9 04 0f 84 b3 00 00 00 45 85 c9 0f 84 aa 00 00 00 41 83 f9 02 0f 84 81 fc ff
ff 44 0f b7 b4 24 80 00 00 00 e9 8b f9 ff ff <0f> 0b e9 20 f3 ff ff 41 f6 c6 20 0f 84 e4 ef ff ff 48 8d 7b 12 e8
[69133.561979] RSP: 0018:ffffc90000006fc0 EFLAGS: 00010246
[69133.561988] RAX: 0000000000000000 RBX: ffffffff82f33e20 RCX: ffffffff81ab7e19
[69133.561994] RDX: dffffc0000000000 RSI: ffffc90000007388 RDI: ffff888103a1b418
[69133.562001] RBP: ffffc90000007310 R08: 0000000000000000 R09: 0000000000000000
[69133.562007] R10: ffffc90000007388 R11: ffffffff810cface R12: ffff888103a1b400
[69133.562013] R13: 0000000000000000 R14: ffffffff82f33e2a R15: ffffffff82f33e28
[69133.562020] FS: 00007f40f7131740(0000) GS:ffff888390800000(0000) knlGS:0000000000000000
[69133.562027] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[69133.562033] CR2: 00007f40f7346ee0 CR3: 000000015d200001 CR4: 00000000001706f0
[69133.562040] Call Trace:
[69133.562044] <IRQ>
[69133.562049] ? __warn+0x9f/0x1a0
[ 1211.841384] ? __skb_flow_dissect+0x107e/0x2860
[...]
[ 1211.841496] ? bpf_flow_dissect+0x160/0x160
[ 1211.841753] __skb_get_hash+0x97/0x280
[ 1211.841765] ? __skb_get_hash_symmetric+0x230/0x230
[ 1211.841776] ? mod_find+0xbf/0xe0
[ 1211.841786] ? get_stack_info_noinstr+0x12/0xe0
[ 1211.841798] ? bpf_ksym_find+0x56/0xe0
[ 1211.841807] ? __rcu_read_unlock+0x2a/0x70
[ 1211.841819] nft_trace_init+0x1b9/0x1c0 [nf_tables]
[ 1211.841895] ? nft_trace_notify+0x830/0x830 [nf_tables]
[ 1211.841964] ? get_stack_info+0x2b/0x80
[ 1211.841975] ? nft_do_chain_arp+0x80/0x80 [nf_tables]
[ 1211.842044] nft_do_chain+0x79c/0x850 [nf_tables]
Fixes: 9b52e3f267a6 ("flow_dissector: handle no-skb use case")
Suggested-by: Willem de Bruijn <[email protected]>
Signed-off-by: Pablo Neira Ayuso <[email protected]>
Reviewed-by: Willem de Bruijn <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Paolo Abeni <[email protected]>
|
|
Ensure that rcu read lock is given up before returning.
Link: https://lore.kernel.org/r/20240716-elixier-fliesen-1ab342151a61@brauner
Fixes: ca567df74a28 ("nsfs: add pid translation ioctls")
Reported-by: [email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
Fix the 'make W=1' issue:
WARNING: modpost: missing MODULE_DESCRIPTION() in fs/adfs/adfs.o
Signed-off-by: Jeff Johnson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Christian Brauner <[email protected]>
|
|
Upon adding CONFIG_ARCH_THUNDER & CONFIG_COMPILE_TEST dependency,
compilation errors arise on 32-bit ARM with writeq() & readq() calls
which are used for accessing 64-bit values.
Since DPI hardware only works with 64-bit register accesses, using
CONFIG_64BIT dependency to skip compilation on 32-bit systems.
Fixes: a5e43e2d202d ("misc: Kconfig: add a new dependency for MARVELL_CN10K_DPI")
Signed-off-by: Vamsi Attunuru <[email protected]>
Tested-by: Nathan Chancellor <[email protected]>
Tested-by: Jeff Johnson <[email protected]>
Acked-by: Arnd Bergmann <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
* Fix some typos, incomplete or confusing phrases.
* Split paragraphs where appropriate.
* List the same error code multiple times,
if it has multiple possible causes.
* Bring wording closer to the man page wording,
which has undergone more thorough review
(esp. for LANDLOCK_ACCESS_FS_WRITE_FILE).
* Small semantic clarifications
* Call the ephemeral port range "ephemeral"
* Clarify reasons for EFAULT in landlock_add_rule()
* Clarify @rule_type doc for landlock_add_rule()
This is a collection of small fixes which I collected when preparing the
corresponding man pages [1].
Cc: Alejandro Colomar <[email protected]>
Cc: Konstantin Meskhidze <[email protected]>
Link: https://lore.kernel.org/r/[email protected] [1]
Signed-off-by: Günther Noack <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
[mic: Add label to link, fix formatting spotted by make htmldocs,
synchronize userspace-api documentation's date]
Signed-off-by: Mickaël Salaün <[email protected]>
|
|
Fix the documentation of the below field of struct auxiliary_device
include/linux/auxiliary_bus.h:150: warning: Function parameter or struct member 'sysfs' not described in 'auxiliary_device'
include/linux/auxiliary_bus.h:150: warning: Excess struct member 'irqs' description in 'auxiliary_device'
include/linux/auxiliary_bus.h:150: warning: Excess struct member 'lock' description in 'auxiliary_device'
include/linux/auxiliary_bus.h:150: warning: Excess struct member 'irq_dir_exists' description in 'auxiliary_device'
Fixes: a808878308a8 ("driver core: auxiliary bus: show auxiliary device IRQs")
Reported-by: Stephen Rothwell <[email protected]>
Signed-off-by: Shay Drory <[email protected]>
Signed-off-by: Saeed Mahameed <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Fix error case management in airoha_dev_xmit routine since we need to
DMA unmap pending buffers starting from q->head.
Moreover fix a typo in error case branch in airoha_set_gdm_ports
routine.
Fixes: 23020f049327 ("net: airoha: Introduce ethernet support for EN7581 SoC")
Signed-off-by: Lorenzo Bianconi <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/b628871bc8ae4861b5e2ab4db90aaf373cbb7cee.1721203880.git.lorenzo@kernel.org
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
In gve_clean_xdp_done, the driver processes the TX completions based on
a 32-bit NIC counter and a 32-bit completion counter stored in the tx
queue.
Fix the for loop so that the counter wraparound is handled correctly.
Fixes: 75eaae158b1b ("gve: Add XDP DROP and TX support for GQI-QPL format")
Signed-off-by: Joshua Washington <[email protected]>
Signed-off-by: Praveen Kaligineedi <[email protected]>
Reviewed-by: Simon Horman <[email protected]>
Link: https://patch.msgid.link/[email protected]
Signed-off-by: Jakub Kicinski <[email protected]>
|
|
Merge the powerpc KVM topic branch.
|
|
Link: https://lkml.kernel.org/r/c72e5467-06a8-4739-ae6a-7c84c96cad77@p183
Signed-off-by: Alexey Dobriyan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
For systems on which the performance counter can expire early due to turbo
modes the watchdog handler has a safety net in place which validates that
since the last watchdog event there has at least 4/5th of the watchdog
period elapsed.
This works reliably only after the first watchdog event because the per
CPU variable which holds the timestamp of the last event is never
initialized.
So a first spurious event will validate against a timestamp of 0 which
results in a delta which is likely to be way over the 4/5 threshold of the
period. As this might happen before the first watchdog hrtimer event
increments the watchdog counter, this can lead to false positives.
Fix this by initializing the timestamp before enabling the hardware event.
Reset the rearm counter as well, as that might be non zero after the
watchdog was disabled and reenabled.
Link: https://lkml.kernel.org/r/87frsfu15a.ffs@tglx
Fixes: 7edaeb6841df ("kernel/watchdog: Prevent false positives with turbo modes")
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Arjan van de Ven <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
mem_cgroup_calculate_protection() is not stateless and should only be used
as part of a top-down tree traversal. shrink_one() traverses the per-node
memcg LRU instead of the root_mem_cgroup tree, and therefore it should not
call mem_cgroup_calculate_protection().
The existing misuse in shrink_one() can cause ineffective protection of
sub-trees that are grandchildren of root_mem_cgroup. Fix it by reusing
lru_gen_age_node(), which already traverses the root_mem_cgroup tree, to
calculate the protection.
Previously lru_gen_age_node() opportunistically skips the first pass,
i.e., when scan_control->priority is DEF_PRIORITY. On the second pass,
lruvec_is_sizable() uses appropriate scan_control->priority, set by
set_initial_priority() from lru_gen_shrink_node(), to decide whether a
memcg is too small to reclaim from.
Now lru_gen_age_node() unconditionally traverses the root_mem_cgroup tree.
So it should call set_initial_priority() upfront, to make sure
lruvec_is_sizable() uses appropriate scan_control->priority on the first
pass. Otherwise, lruvec_is_reclaimable() can return false negatives and
result in premature OOM kills when min_ttl_ms is used.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e4dde56cd208 ("mm: multi-gen LRU: per-node lru_gen_folio lists")
Signed-off-by: Yu Zhao <[email protected]>
Reported-by: T.J. Mercier <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
We accidentally deleted a tab in commit f84152e9efc5 ("mm/zswap: use only
one pool in zswap"). Add it back.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Dan Carpenter <[email protected]>
Reviewed-by: Chengming Zhou <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
A kernel crash was observed when migrating hugetlb folio:
BUG: kernel NULL pointer dereference, address: 0000000000000008
PGD 0 P4D 0
Oops: Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 3435 Comm: bash Not tainted 6.10.0-rc6-00450-g8578ca01f21f #66
RIP: 0010:__folio_undo_large_rmappable+0x70/0xb0
RSP: 0018:ffffb165c98a7b38 EFLAGS: 00000097
RAX: fffffbbc44528090 RBX: 0000000000000000 RCX: 0000000000000000
RDX: ffffa30e000a2800 RSI: 0000000000000246 RDI: ffffa3153ffffcc0
RBP: fffffbbc44528000 R08: 0000000000002371 R09: ffffffffbe4e5868
R10: 0000000000000001 R11: 0000000000000001 R12: ffffa3153ffffcc0
R13: fffffbbc44468000 R14: 0000000000000001 R15: 0000000000000001
FS: 00007f5b3a716740(0000) GS:ffffa3151fc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 000000010959a000 CR4: 00000000000006f0
Call Trace:
<TASK>
__folio_migrate_mapping+0x59e/0x950
__migrate_folio.constprop.0+0x5f/0x120
move_to_new_folio+0xfd/0x250
migrate_pages+0x383/0xd70
soft_offline_page+0x2ab/0x7f0
soft_offline_page_store+0x52/0x90
kernfs_fop_write_iter+0x12c/0x1d0
vfs_write+0x380/0x540
ksys_write+0x64/0xe0
do_syscall_64+0xb9/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f5b3a514887
RSP: 002b:00007ffe138fce68 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007f5b3a514887
RDX: 000000000000000c RSI: 0000556ab809ee10 RDI: 0000000000000001
RBP: 0000556ab809ee10 R08: 00007f5b3a5d1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
R13: 00007f5b3a61b780 R14: 00007f5b3a617600 R15: 00007f5b3a616a00
It's because hugetlb folio is passed to __folio_undo_large_rmappable()
unexpectedly. large_rmappable flag is imperceptibly set to hugetlb folio
since commit f6a8dd98a2ce ("hugetlb: convert alloc_buddy_hugetlb_folio to
use a folio"). Then commit be9581ea8c05 ("mm: fix crashes from deferred
split racing folio migration") makes folio_migrate_mapping() call
folio_undo_large_rmappable() triggering the bug. Fix this issue by
clearing large_rmappable flag for hugetlb folios. They don't need that
flag set anyway.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: f6a8dd98a2ce ("hugetlb: convert alloc_buddy_hugetlb_folio to use a folio")
Fixes: be9581ea8c05 ("mm: fix crashes from deferred split racing folio migration")
Signed-off-by: Miaohe Lin <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When tries to demote 1G hugetlb folios, a lockdep warning is observed:
============================================
WARNING: possible recursive locking detected
6.10.0-rc6-00452-ga4d0275fa660-dirty #79 Not tainted
--------------------------------------------
bash/710 is trying to acquire lock:
ffffffff8f0a7850 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0x244/0x460
but task is already holding lock:
ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&h->resize_lock);
lock(&h->resize_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by bash/710:
#0: ffff8f118439c3f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
#1: ffff8f11893b9e88 (&of->mutex#2){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
#2: ffff8f1183dc4428 (kn->active#98){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
#3: ffffffff8f0a6f48 (&h->resize_lock){+.+.}-{3:3}, at: demote_store+0xae/0x460
stack backtrace:
CPU: 3 PID: 710 Comm: bash Not tainted 6.10.0-rc6-00452-ga4d0275fa660-dirty #79
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x68/0xa0
__lock_acquire+0x10f2/0x1ca0
lock_acquire+0xbe/0x2d0
__mutex_lock+0x6d/0x400
demote_store+0x244/0x460
kernfs_fop_write_iter+0x12c/0x1d0
vfs_write+0x380/0x540
ksys_write+0x64/0xe0
do_syscall_64+0xb9/0x1d0
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7fa61db14887
RSP: 002b:00007ffc56c48358 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007fa61db14887
RDX: 0000000000000002 RSI: 000055a030050220 RDI: 0000000000000001
RBP: 000055a030050220 R08: 00007fa61dbd1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000002
R13: 00007fa61dc1b780 R14: 00007fa61dc17600 R15: 00007fa61dc16a00
</TASK>
Lockdep considers this an AA deadlock because the different resize_lock
mutexes reside in the same lockdep class, but this is a false positive.
Place them in distinct classes to avoid these warnings.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 8531fc6f52f5 ("hugetlb: add hugetlb demote page support")
Signed-off-by: Miaohe Lin <[email protected]>
Acked-by: Muchun Song <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|