Age | Commit message (Collapse) | Author | Files | Lines |
|
The commit:
cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
has changed the semantics of what is to be considered an idle task in
such a way that CPU boot code preceding the actual idle loop is excluded
from it.
This has however introduced new potential RCU-tasks stalls when either:
1) Grace period is started before init/0 had a chance to set PF_IDLE,
keeping it stuck in the holdout list until idle ever schedules.
2) Grace period is started when some possible CPUs have never been
online, keeping their idle tasks stuck in the holdout list until the
CPU ever boots up.
3) Similar to 1) but with secondary CPUs: Grace period is started
concurrently with secondary CPU booting, putting its idle task in
the holdout list because PF_IDLE isn't yet observed on it. It stays
then stuck in the holdout list until that CPU ever schedules. The
effect is mitigated here by the hotplug AP thread that must run to
bring the CPU up.
Fix this with handling the new semantics of PF_IDLE, keeping in mind
that it may or may not be set on an idle task. Take advantage of that to
strengthen the coverage of an RCU-tasks quiescent state within an idle
task, excluding the CPU boot code from it. Only the code running within
the idle loop is now a quiescent state, along with offline CPUs.
Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
Suggested-by: Joel Fernandes <[email protected]>
Suggested-by: Paul E . McKenney" <[email protected]>
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
Export the RCU point of view as to when a CPU is considered offline
(ie: when does RCU consider that a CPU is sufficiently down in the
hotplug process to not feature any possible read side).
This will be used by RCU-tasks whose vision of an offline CPU should
reasonably match the one of RCU core.
Fixes: cff9b2332ab7 ("kernel/sched: Modify initial boot task idle setup")
Acked-by: Peter Zijlstra (Intel) <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen updates from Juergen Gross:
- two small cleanup patches
- a fix for PCI passthrough under Xen
- a four patch series speeding up virtio under Xen with user space
backends
* tag 'for-linus-6.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen-pciback: Consider INTx disabled when MSI/MSI-X is enabled
xen: privcmd: Add support for ioeventfd
xen: evtchn: Allow shared registration of IRQ handers
xen: irqfd: Use _IOW instead of the internal _IOC() macro
xen: Make struct privcmd_irqfd's layout architecture independent
xen/xenbus: Add __counted_by for struct read_buffer and use struct_size()
xenbus: fix error exit in xenbus_init()
|
|
Commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
do_set_cpus_allowed()") added a kfree() call to free any user
provided affinity mask, if present. It was changed later to use
kfree_rcu() in commit 9a5418bc48ba ("sched/core: Use kfree_rcu()
in do_set_cpus_allowed()") to avoid a circular locking dependency
problem.
It turns out that even kfree_rcu() isn't safe for avoiding
circular locking problem. As reported by kernel test robot,
the following circular locking dependency now exists:
&rdp->nocb_lock --> rcu_node_0 --> &rq->__lock
Solve this by breaking the rcu_node_0 --> &rq->__lock chain by moving
the resched_cpu() out from under rcu_node lock.
[peterz: heavily borrowed from Waiman's Changelog]
[paulmck: applied Z qiang feedback]
Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
Reported-by: kernel test robot <[email protected]>
Acked-by: Waiman Long <[email protected]>
Signed-off-by: Peter Zijlstra (Intel) <[email protected]>
Link: https://lore.kernel.org/oe-lkp/[email protected]
Signed-off-by: Paul E. McKenney <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 TDX updates from Dave Hansen:
"The majority of this is a rework of the assembly and C wrappers that
are used to talk to the TDX module and VMM. This is a nice cleanup in
general but is also clearing the way for using this code when Linux is
the TDX VMM.
There are also some tidbits to make TDX guests play nicer with Hyper-V
and to take advantage the hardware TSC.
Summary:
- Refactor and clean up TDX hypercall/module call infrastructure
- Handle retrying/resuming page conversion hypercalls
- Make sure to use the (shockingly) reliable TSC in TDX guests"
[ TLA reminder: TDX is "Trust Domain Extensions", Intel's guest VM
confidentiality technology ]
* tag 'x86_tdx_for_6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tdx: Mark TSC reliable
x86/tdx: Fix __noreturn build warning around __tdx_hypercall_failed()
x86/virt/tdx: Make TDX_MODULE_CALL handle SEAMCALL #UD and #GP
x86/virt/tdx: Wire up basic SEAMCALL functions
x86/tdx: Remove 'struct tdx_hypercall_args'
x86/tdx: Reimplement __tdx_hypercall() using TDX_MODULE_CALL asm
x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL
x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs
x86/tdx: Pass TDCALL/SEAMCALL input/output registers via a structure
x86/tdx: Rename __tdx_module_call() to __tdcall()
x86/tdx: Make macros of TDCALLs consistent with the spec
x86/tdx: Skip saving output regs when SEAMCALL fails with VMFailInvalid
x86/tdx: Zero out the missing RSI in TDX_HYPERCALL macro
x86/tdx: Retry partially-completed page conversion hypercalls
|
|
Currently, noinc writes are cached as if they were standard incrementing
writes, overwriting unrelated register values in the cache. Instead, we
want to cache the last value written to the register, as is done in the
accelerated noinc handler (regmap_noinc_readwrite).
Fixes: cdf6b11daa77 ("regmap: Add regmap_noinc_write API")
Signed-off-by: Ben Wolsieffer <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Mark Brown <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc updates from Helge Deller:
"Usual fixes and updates:
- Add up to 12 nops after TLB inserts for PA8x00 CPUs as the
specification requires (Dave Anglin)
- Simplify the parisc smp_prepare_boot_cpu() code (Russell King)
- Use 64-bit little-endian values in SBA IOMMU PDIR table for AGP
Since there is upcoming support for booting a 64-bit kernel on QEMU,
some corner cases were fixed and improvements added:
- Fix 64-bit kernel crash in STI (graphics console) font setup code
which miscalculated the font start address as it gets signed vs
unsigned offsets wrong
- Support building an uncompressed Linux kernel
- Add support for soft power-off in qemu"
* tag 'parisc-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
fbdev: stifb: Make the STI next font pointer a 32-bit signed offset
parisc: Show default CPU PSW.W setting as reported by PDC
parisc/pdc: Add width field to struct pdc_model
parisc: Add nop instructions after TLB inserts
parisc: simplify smp_prepare_boot_cpu()
parisc/agp: Use 64-bit LE values in SBA IOMMU PDIR table
parisc/firmware: Use PDC constants for narrow/wide firmware
parisc: Move parisc_narrow_firmware variable to header file
parisc/power: Trivial whitespace cleanups and license update
parisc/power: Add power soft-off when running on qemu
parisc: Allow building uncompressed Linux kernel
parisc: Add some missing PDC functions and constants
parisc: sba-iommu: Fix comment when calculating IOC number
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
Pull m68k updates from Geert Uytterhoeven:
- misc aesthetical improvements for the floating point emulator
- remove the last user of strlcpy()
- use kernel's generic libgcc functions
- misc fixes for W=1 builds
- misc indentation fixes
- misc fixes and improvements
- defconfig updates
* tag 'm68k-for-v6.7-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: (72 commits)
m68k: lib: Include <linux/libgcc.h> for __muldi3()
m68k: fpsp040: Fix indentation by 5 spaces
m68k: Fix indentation by 2 or 5 spaces in <asm/page_mm.h>
m68k: kernel: Fix indentation by 7 spaces in traps.c
m68k: sun3: Fix indentation by 5 or 7 spaces
m68k: Fix indentation by 7 spaces in <asm/io_mm.h>
m68k: defconfig: Update virt_defconfig for v6.6-rc3
m68k: defconfig: Update defconfigs for v6.6-rc1
m68k: io: Mark mmio read addresses as const
m68k: Replace GPL 2.0+ README.legal boilerplate with SPDX
m68k: sun3: Change led_pattern[] to unsigned char
m68k: Add missing types to asm/irq.h
m68k: sun3/3x: Add and use "sun3.h"
m68k: sun3x: Make dvma_print() static
m68k: sun3x: Make sun3x_halt() static
m68k: sun3x: Do not mark dvma_map_iommu() inline
m68k: sun3x: Fix signature of sun3_leds()
m68k: sun3: Make sun3_platform_init() static
m68k: sun3: Make print_pte() static
m68k: sun3: Annotate prom_printf() with __printf()
...
|
|
Prepare for the coming implementation by GCC and Clang of the __counted_by
attribute. Flexible array members annotated with __counted_by can have
their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS
(for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family
functions).
As found with Coccinelle[1], add __counted_by for struct module_notes_attrs.
[1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci
Cc: Luis Chamberlain <[email protected]>
Cc: [email protected]
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
|
|
Delete duplicated word in comment.
Signed-off-by: Zhu Mao <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
|
|
The return value of is_valid_name() is true or false,
so change its type to reflect that.
Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
|
|
The return value of is_mapping_symbol() is true or false,
so change its type to reflect that.
Suggested-by: Xi Zhang <[email protected]>
Signed-off-by: Tiezhu Yang <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
|
|
Use a similar approach as commit a419beac4a07 ("module/decompress: use
vmalloc() for zstd decompression workspace") and replace kmalloc() with
vmalloc() also for the gzip module decompression workspace.
In this case the workspace is represented by struct inflate_workspace
that can be fairly large for kmalloc() and it can potentially lead to
allocation errors on certain systems:
$ pahole inflate_workspace
struct inflate_workspace {
struct inflate_state inflate_state; /* 0 9544 */
/* --- cacheline 149 boundary (9536 bytes) was 8 bytes ago --- */
unsigned char working_window[32768]; /* 9544 32768 */
/* size: 42312, cachelines: 662, members: 2 */
/* last cacheline: 8 bytes */
};
Considering that there is no need to use continuous physical memory,
simply switch to vmalloc() to provide a more reliable in-kernel module
decompression.
Fixes: b1ae6dc41eaa ("module: add in-kernel support for decompressing")
Signed-off-by: Andrea Righi <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
|
|
Use glob include/linux/module*.h to capture all module changes.
Suggested-by: Kees Cook <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
|
|
Commit 9bbb9e5a3310 ("param: use ops in struct kernel_param, rather than
get and set fns directly") added the comment that module_param_call()
was deprecated, during a large scale refactoring to bring sanity to type
casting back then. In 2017 following more cleanups, it became useful
again as it wraps a common pattern of creating an ops struct for a
given get/set pair:
b2f270e87473 ("module: Prepare to convert all module_param_call() prototypes")
ece1996a21ee ("module: Do not paper over type mismatches in module_param_call()")
static const struct kernel_param_ops __param_ops_##name = \
{ .flags = 0, .set = _set, .get = _get }; \
__module_param_call(MODULE_PARAM_PREFIX, \
name, &__param_ops_##name, arg, perm, -1, 0)
__module_param_call(MODULE_PARAM_PREFIX, name, ops, arg, perm, -1, 0)
Many users of module_param_cb() appear to be almost universally
open-coding the same thing that module_param_call() does now. Don't
discourage[1] people from using module_param_call(): clarify the comment
to show that module_param_cb() is useful if you repeatedly use the same
pair of get/set functions.
[1] https://lore.kernel.org/lkml/202308301546.5C789E5EC@keescook/
Cc: Luis Chamberlain <[email protected]>
Cc: Johan Hovold <[email protected]>
Cc: Jessica Yu <[email protected]>
Cc: Sagi Grimberg <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Cc: Miguel Ojeda <[email protected]>
Cc: Joe Perches <[email protected]>
Cc: [email protected]
Reviewed-by: Miguel Ojeda <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Signed-off-by: Luis Chamberlain <[email protected]>
|
|
vmap_area does not exist on no-MMU, therefore the GDB scripts fail to
load:
Traceback (most recent call last):
File "<...>/vmlinux-gdb.py", line 51, in <module>
import linux.vmalloc
File "<...>/scripts/gdb/linux/vmalloc.py", line 14, in <module>
vmap_area_ptr_type = vmap_area_type.get_type().pointer()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "<...>/scripts/gdb/linux/utils.py", line 28, in get_type
self._type = gdb.lookup_type(self._name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
gdb.error: No struct type named vmap_area.
To fix this, disable the command and add an informative error message if
CONFIG_MMU is not defined, following the example of lx-slabinfo.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 852622bf3616 ("scripts/gdb/vmalloc: add vmallocinfo support")
Signed-off-by: Ben Wolsieffer <[email protected]>
Cc: Jan Kiszka <[email protected]>
Cc: Kieran Bingham <[email protected]>
Cc: Kuan-Ying Lee <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
MOD_TEXT is only defined if CONFIG_MODULES=y which lead to loading failure
of the gdb scripts when kernel is built without CONFIG_MODULES=y:
Reading symbols from vmlinux...
Traceback (most recent call last):
File "/foo/vmlinux-gdb.py", line 25, in <module>
import linux.constants
File "/foo/scripts/gdb/linux/constants.py", line 14, in <module>
LX_MOD_TEXT = gdb.parse_and_eval("MOD_TEXT")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
gdb.error: No symbol "MOD_TEXT" in current context.
Add a conditional check on CONFIG_MODULES to fix this error.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: b4aff7513df3 ("scripts/gdb: use mem instead of core_layout to get the module address")
Signed-off-by: Clément Léger <[email protected]>
Tested-by: Daniel Gomez <[email protected]>
Signed-off-by: Daniel Gomez <[email protected]>
Cc: Jan Kiszka <[email protected]>
Cc: Kieran Bingham <[email protected]>
Cc: Luis Chamberlain <[email protected]>
Cc: Pankaj Raghav <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
He's no longer working in Collabora (and his email address there bounces).
Map it to his personal address.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bagas Sanjaya <[email protected]>
Acked-by: Tomeu Vizoso <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Cc: Heiko Stuebner <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Konrad Dybcio <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Claudiu Beznea's Microchip email address is no longer valid.
Map it to a valid one.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Claudiu Beznea <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Cc: Heiko Stuebner <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Konrad Dybcio <[email protected]>
Cc: Oleksij Rempel <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
On Ubuntu and probably other distros, ptrace permissions are tightend a
bit by default; i.e., /proc/sys/kernel/yama/ptrace_score is set to 1.
This cases memfd_secret's ptrace attach test fails with a permission
error. Set it to 0 piror to running the program.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Itaru Kitayama <[email protected]>
Cc: Shuah Khan <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Map out to his gmail address as he had left SUSE some time ago.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Bagas Sanjaya <[email protected]>
Acked-by: Benjamin Poirier <[email protected]>
Cc: Bjorn Andersson <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Stuebner <[email protected]>
Cc: Jakub Kicinski <[email protected]>
Cc: Konrad Dybcio <[email protected]>
Cc: Oleksij Rempel <[email protected]>
Cc: Stephen Hemminger <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
csr_sscratch CSR holds current task_struct address when hart is in user
space. Trap handler on entry spills csr_sscratch into "tp" (x2) register
and zeroes out csr_sscratch CSR. Trap handler on exit reloads "tp" with
expected user mode value and place current task_struct address again in
csr_sscratch CSR.
This patch assumes "tp" is pointing to task_struct. If value in
csr_sscratch is numerically greater than "tp" then it assumes csr_sscratch
is correct address of current task_struct. This logic holds when
- hart is in user space, "tp" will be less than csr_sscratch.
- hart is in kernel space but not in trap handler, "tp" will be more
than csr_sscratch (csr_sscratch being equal to 0).
- hart is executing trap handler
- "tp" is still pointing to user mode but csr_sscratch contains
ptr to task_struct. Thus numerically higher.
- "tp" is pointing to task_struct but csr_sscratch now contains
either 0 or numerically smaller value (transiently holds
user mode tp)
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Deepak Gupta <[email protected]>
Reviewed-by: Andrew Jones <[email protected]>
Reviewed-by: Palmer Dabbelt <[email protected]>
Acked-by: Palmer Dabbelt <[email protected]>
Tested-by: Hsieh-Tseng Shen <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Glenn Washburn <[email protected]>
Cc: Jan Kiszka <[email protected]>
Cc: Jeff Xie <[email protected]>
Cc: Kieran Bingham <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Paul Walmsley <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Fix a spelling typo in comment.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Kunwu Chan <[email protected]>
Acked-by: Joseph Qi <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Check ProtectionKey field in /proc/*/smaps output, if system supports
protection keys feature.
[[email protected]: test support in the beginning of the program, use syscall, not glibc pkey_alloc(3) which may not compile]
Link: https://lkml.kernel.org/r/ac05efa7-d2a0-48ad-b704-ffdd5450582e@p183
Signed-off-by: Swarup Laxman Kotiaklapudi <[email protected]>
Signed-off-by: Alexey Dobriyan <[email protected]>
Reviewed-by: Swarup Laxman Kotikalapudi<[email protected]>
Tested-by: Swarup Laxman Kotikalapudi<[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
* fix embarassing /proc/*/smaps test bug due to a typo in variable name
it tested only the first line of the output if vsyscall is enabled:
ffffffffff600000-ffffffffff601000 r-xp ...
so test passed but tested only VMA location and permissions.
* add "KSM" entry, unnoticed because (1)
* swap "r-xp" and "--xp" vsyscall test strings,
also unnoticed because (1)
Link: https://lkml.kernel.org/r/76f42cce-b1ab-45ec-b6b2-4c64f0dccb90@p183
Signed-off-by: Alexey Dobriyan <[email protected]>
Tested-by: Swarup Laxman Kotikalapudi<[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
./fs/proc/base.c:3829:2-3: Unneeded semicolon
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Li <[email protected]>
Reported-by: Abaci Robot <[email protected]>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7057
Acked-by: Oleg Nesterov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Rather than lock_task_sighand(), sig->stats_lock was specifically designed
for this type of use.
This way the "if (whole)" branch runs lockless in the likely case.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Rather than while_each_thread() which should be avoided when possible.
This makes the code more clear and allows the next change.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Oleg Nesterov <[email protected]>
Cc: "Eric W. Biederman" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The BUG_ON() at ocfs2_num_free_extents() handles the error that
l_tree_deepth of leaf extent block just read form disk is invalid. This
error is mostly caused by file system metadata corruption on the disk.
There is no need to call BUG_ON() to handle such errors. We can return
error code, since the caller can deal with errors from
ocfs2_num_free_extents(). Also, we should make the file system read-only
to avoid the damage from expanding.
Therefore, BUG_ON() is removed and ocfs2_error() is called instead.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Jia Rui <[email protected]>
Reviewed-by: Joseph Qi <[email protected]>
Cc: Mark Fasheh <[email protected]>
Cc: Joel Becker <[email protected]>
Cc: Junxiao Bi <[email protected]>
Cc: Changwei Ge <[email protected]>
Cc: Gang He <[email protected]>
Cc: Jun Piao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Use the folio APIs, saving about four calls to compound_head().
Convert back to a page in each of the individual protocol implementations.
Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
RPC client pipefs dentries cleanup is in separated rpc_remove_pipedir()
workqueue,which takes care about pipefs superblock locking.
In some special scenarios, when kernel frees the pipefs sb of the
current client and immediately alloctes a new pipefs sb,
rpc_remove_pipedir function would misjudge the existence of pipefs
sb which is not the one it used to hold. As a result,
the rpc_remove_pipedir would clean the released freed pipefs dentries.
To fix this issue, rpc_remove_pipedir should check whether the
current pipefs sb is consistent with the original pipefs sb.
This error can be catched by KASAN:
=========================================================
[ 250.497700] BUG: KASAN: slab-use-after-free in dget_parent+0x195/0x200
[ 250.498315] Read of size 4 at addr ffff88800a2ab804 by task kworker/0:18/106503
[ 250.500549] Workqueue: events rpc_free_client_work
[ 250.501001] Call Trace:
[ 250.502880] kasan_report+0xb6/0xf0
[ 250.503209] ? dget_parent+0x195/0x200
[ 250.503561] dget_parent+0x195/0x200
[ 250.503897] ? __pfx_rpc_clntdir_depopulate+0x10/0x10
[ 250.504384] rpc_rmdir_depopulate+0x1b/0x90
[ 250.504781] rpc_remove_client_dir+0xf5/0x150
[ 250.505195] rpc_free_client_work+0xe4/0x230
[ 250.505598] process_one_work+0x8ee/0x13b0
...
[ 22.039056] Allocated by task 244:
[ 22.039390] kasan_save_stack+0x22/0x50
[ 22.039758] kasan_set_track+0x25/0x30
[ 22.040109] __kasan_slab_alloc+0x59/0x70
[ 22.040487] kmem_cache_alloc_lru+0xf0/0x240
[ 22.040889] __d_alloc+0x31/0x8e0
[ 22.041207] d_alloc+0x44/0x1f0
[ 22.041514] __rpc_lookup_create_exclusive+0x11c/0x140
[ 22.041987] rpc_mkdir_populate.constprop.0+0x5f/0x110
[ 22.042459] rpc_create_client_dir+0x34/0x150
[ 22.042874] rpc_setup_pipedir_sb+0x102/0x1c0
[ 22.043284] rpc_client_register+0x136/0x4e0
[ 22.043689] rpc_new_client+0x911/0x1020
[ 22.044057] rpc_create_xprt+0xcb/0x370
[ 22.044417] rpc_create+0x36b/0x6c0
...
[ 22.049524] Freed by task 0:
[ 22.049803] kasan_save_stack+0x22/0x50
[ 22.050165] kasan_set_track+0x25/0x30
[ 22.050520] kasan_save_free_info+0x2b/0x50
[ 22.050921] __kasan_slab_free+0x10e/0x1a0
[ 22.051306] kmem_cache_free+0xa5/0x390
[ 22.051667] rcu_core+0x62c/0x1930
[ 22.051995] __do_softirq+0x165/0x52a
[ 22.052347]
[ 22.052503] Last potentially related work creation:
[ 22.052952] kasan_save_stack+0x22/0x50
[ 22.053313] __kasan_record_aux_stack+0x8e/0xa0
[ 22.053739] __call_rcu_common.constprop.0+0x6b/0x8b0
[ 22.054209] dentry_free+0xb2/0x140
[ 22.054540] __dentry_kill+0x3be/0x540
[ 22.054900] shrink_dentry_list+0x199/0x510
[ 22.055293] shrink_dcache_parent+0x190/0x240
[ 22.055703] do_one_tree+0x11/0x40
[ 22.056028] shrink_dcache_for_umount+0x61/0x140
[ 22.056461] generic_shutdown_super+0x70/0x590
[ 22.056879] kill_anon_super+0x3a/0x60
[ 22.057234] rpc_kill_sb+0x121/0x200
Fixes: 0157d021d23a ("SUNRPC: handle RPC client pipefs dentries by network namespace aware routines")
Signed-off-by: felix <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
If the client is doing pnfs IO and Kerberos is configured and EXCHANGEID
successfully negotiated SP4_MACH_CRED and WRITE/COMMIT are on the
list of state protected operations, then we need to make sure to
choose the DS's rpc_client structure instead of the MDS's one.
Fixes: fb91fb0ee7b2 ("NFS: Move call to nfs4_state_protect_write() to nfs4_write_setup()")
Signed-off-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
This IS_ERR() check was deleted during in a cleanup because, at the time,
the rpcb_call_async() function could not return an error pointer. That
changed in commit 25cf32ad5dba ("SUNRPC: Handle allocation failure in
rpc_new_task()") and now it can return an error pointer. Put the check
back.
A related revert was done in commit 13bd90141804 ("Revert "SUNRPC:
Remove unreachable error condition"").
Fixes: 037e910b52b0 ("SUNRPC: Remove unreachable error condition in rpcb_getport_async()")
Signed-off-by: Dan Carpenter <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Currently when client sends an EXCHANGE_ID for a possible trunked
connection, for any error that happened, the trunk will be thrown
out. However, an NFS4ERR_DELAY is a transient error that should be
retried instead.
Fixes: e818bd085baf ("NFSv4.1 remove xprt from xprt_switch if session trunking test fails")
Signed-off-by: Olga Kornievskaia <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
The flexfiles layout driver depends on NFSv3 module as data servers
might be configure to provide nfsv3 only.
Disabling the nfsv3 protocol completely disables the flexfiles layout driver,
however, the data server still might support v4.1 protocol. Thus the strond
couling betwwen flexfiles and nfsv3 modules should be relaxed, as layout driver
will return UNSUPPORTED if not matching protocol is found.
Signed-off-by: Tigran Mkrtchyan <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
When user input is committed online, DAMON sysfs interface is ignoring the
user input for the monitoring target regions. Such request is valid and
useful for fixed monitoring target regions-based monitoring ops like
'paddr' or 'fvaddr'.
Update the region boundaries as user specified, too. Note that the
monitoring results of the regions that overlap between the latest
monitoring target regions and the new target regions are preserved.
Treat empty monitoring target regions user request as a request to just
make no change to the monitoring target regions. Otherwise, users should
set the monitoring target regions same to current one for every online
input commit, and it could be challenging for dynamic monitoring target
regions update DAMON ops like 'vaddr'. If the user really need to remove
all monitoring target regions, they can simply remove the target and then
create the target again with empty target regions.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: da87878010e5 ("mm/damon/sysfs: support online inputs update")
Signed-off-by: SeongJae Park <[email protected]>
Cc: <[email protected]> [5.19+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
damon_sysfs_set_targets(), which updates the targets of the context for
online commitment, do not remove targets that removed from the
corresponding sysfs files. As a result, more than intended targets of the
context can exist and hence consume memory and monitoring CPU resource
more than expected.
Fix it by removing all targets of the context and fill up again using the
user input. This could cause unnecessary memory dealloc and realloc
operations, but this is not a hot code path. Also, note that damon_target
is stateless, and hence no data is lost.
[[email protected]: fix unnecessary monitoring results removal]
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: da87878010e5 ("mm/damon/sysfs: support online inputs update")
Signed-off-by: SeongJae Park <[email protected]>
Cc: Brendan Higgins <[email protected]>
Cc: <[email protected]> [5.19.x]
Signed-off-by: Andrew Morton <[email protected]>
|
|
We recently encountered a bug that makes all zswap store attempt fail.
Specifically, after:
"141fdeececb3 mm/zswap: delay the initialization of zswap"
if we build a kernel with zswap disabled by default, then enabled after
the swapfile is set up, the zswap tree will not be initialized. As a
result, all zswap store calls will be short-circuited. We have to perform
another swapon to get zswap working properly again.
Fortunately, this issue has since been fixed by the patch that kills
frontswap:
"42c06a0e8ebe mm: kill frontswap"
which performs zswap_swapon() unconditionally, i.e always initializing
the zswap tree.
This test add a sanity check that ensure zswap storing works as
intended.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nhat Pham <[email protected]>
Acked-by: Rik van Riel <[email protected]>
Cc: Domenico Cerasuolo <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Shuah Khan <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Zefan Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
The "first" is spelled "fist".
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Tom Yang <[email protected]>
Reviewed-by: Liam R. Howlett <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
LKP reported smatch warning as below:
===================
smatch warnings:
mm/vmalloc.c:3689 vread_iter() error: we previously assumed 'vm' could be null (see line 3667)
......
06c8994626d1b7 @3667 size = vm ? get_vm_area_size(vm) : va_size(va);
......
06c8994626d1b7 @3689 else if (!(vm->flags & VM_IOREMAP))
^^^^^^^^^
Unchecked dereference
=====================
This is not a runtime bug because the possible null 'vm' in the
pointed place could only happen when flags == VMAP_BLOCK. However, the
case 'flags == VMAP_BLOCK' should never happen and has been detected
with WARN_ON. Please check vm_map_ram() implementation and the earlier
checking in vread_iter() at below:
~~~~~~~~~~~~~~~~~~~~~~~~~~
/*
* VMAP_BLOCK indicates a sub-type of vm_map_ram area, need
* be set together with VMAP_RAM.
*/
WARN_ON(flags == VMAP_BLOCK);
if (!vm && !flags)
continue;
~~~~~~~~~~~~~~~~~~~~~~~~~~
So add checking on whether 'vm' could be null when dereferencing it in
vread_iter(). This mutes smatch complaint.
Link: https://lkml.kernel.org/r/ZTCURc8ZQE+KrTvS@MiWiFi-R3L-srv
Link: https://lkml.kernel.org/r/ZS/2k6DIMd0tZRgK@MiWiFi-R3L-srv
Signed-off-by: Baoquan He <[email protected]>
Reported-by: kernel test robot <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
Closes: https://lore.kernel.org/r/[email protected]/
Cc: Lorenzo Stoakes <[email protected]>
Cc: Philip Li <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
During a zswap store attempt, the compression algorithm could fail (for
e.g due to the page containing incompressible random data). This is not
tracked in any of existing zswap counters, making it hard to monitor for
and investigate. We have run into this problem several times in our
internal investigations on zswap store failures.
This patch adds a dedicated debugfs counter for compression algorithm
failures.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Nhat Pham <[email protected]>
Reviewed-by: Sergey Senozhatsky <[email protected]>
Cc: Dan Streetman <[email protected]>
Cc: Domenico Cerasuolo <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Roman Gushchin <[email protected]>
Cc: Seth Jennings <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Vitaly Wool <[email protected]>
Cc: Yosry Ahmed <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Drop "the" from the title of the documentation article for UBSAN, as it is
redundant.
Also add SPDX-License-Identifier for ubsan.rst.
Link: https://lkml.kernel.org/r/5fb11a4743eea9d9232a5284dea0716589088fec.1698161845.git.andreyknvl@google.com
Signed-off-by: Andrey Konovalov <[email protected]>
Reviewed-by: Marco Elver <[email protected]>
Cc: Alexander Potapenko <[email protected]>
Cc: Andrey Ryabinin <[email protected]>
Cc: Dmitry Vyukov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"No major architecture features this time around, just some new HWCAP
definitions, support for the Ampere SoC PMUs and a few fixes/cleanups.
The bulk of the changes is reworking of the CPU capability checking
code (cpus_have_cap() etc).
- Major refactoring of the CPU capability detection logic resulting
in the removal of the cpus_have_const_cap() function and migrating
the code to "alternative" branches where possible
- Backtrace/kgdb: use IPIs and pseudo-NMI
- Perf and PMU:
- Add support for Ampere SoC PMUs
- Multi-DTC improvements for larger CMN configurations with
multiple Debug & Trace Controllers
- Rework the Arm CoreSight PMU driver to allow separate
registration of vendor backend modules
- Fixes: add missing MODULE_DEVICE_TABLE to the amlogic perf
driver; use device_get_match_data() in the xgene driver; fix
NULL pointer dereference in the hisi driver caused by calling
cpuhp_state_remove_instance(); use-after-free in the hisi driver
- HWCAP updates:
- FEAT_SVE_B16B16 (BFloat16)
- FEAT_LRCPC3 (release consistency model)
- FEAT_LSE128 (128-bit atomic instructions)
- SVE: remove a couple of pseudo registers from the cpufeature code.
There is logic in place already to detect mismatched SVE features
- Miscellaneous:
- Reduce the default swiotlb size (currently 64MB) if no ZONE_DMA
bouncing is needed. The buffer is still required for small
kmalloc() buffers
- Fix module PLT counting with !RANDOMIZE_BASE
- Restrict CPU_BIG_ENDIAN to LLVM IAS 15.x or newer move
synchronisation code out of the set_ptes() loop
- More compact cpufeature displaying enabled cores
- Kselftest updates for the new CPU features"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits)
arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer
arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n
arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper
perf: hisi: Fix use-after-free when register pmu fails
drivers/perf: hisi_pcie: Initialize event->cpu only on success
drivers/perf: hisi_pcie: Check the type first in pmu::event_init()
arm64: cpufeature: Change DBM to display enabled cores
arm64: cpufeature: Display the set of cores with a feature
perf/arm-cmn: Enable per-DTC counter allocation
perf/arm-cmn: Rework DTC counters (again)
perf/arm-cmn: Fix DTC domain detection
drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init()
drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally
drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process
clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
arm64: Remove system_uses_lse_atomics()
arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
drivers/perf: xgene: Use device_get_match_data()
perf/amlogic: add missing MODULE_DEVICE_TABLE
arm64/mm: Hoist synchronization out of set_ptes() loop
...
|
|
When the client is required to use TEST_STATEID to discover which
delegation(s) have been revoked, it may continually test delegations at the
head of the list if the server continues to be unsatisfied and send
SEQ4_STATUS_RECALLABLE_STATE_REVOKED. For a large number of delegations
this behavior is prone to live-lock because the client may never be able to
test and free revoked state at the end of the list since the
SEQ4_STATUS_RECALLABLE_STATE_REVOKED will cause us to flag delegations at
the head of the list to be tested. This problem is further exacerbated by
the state manager's willingness to be scheduled out on a busy system while
testing the list of delegations.
Keep a generation counter for each attempt to test all delegations, and
skip delegations that have already been tested in the current pass.
Signed-off-by: Benjamin Coddington <[email protected]>
Tested-by: Torkil Svensgaard <[email protected]>
Tested-by: Ruben Vestergaard <[email protected]>
Signed-off-by: Trond Myklebust <[email protected]>
|
|
Setting softlockup_panic from do_sysctl_args() causes it to take effect
later in boot. The lockup detector is enabled before SMP is brought
online, but do_sysctl_args runs afterwards. If a user wants to set
softlockup_panic on boot and have it trigger should a softlockup occur
during onlining of the non-boot processors, they could do this prior to
commit f117955a2255 ("kernel/watchdog.c: convert {soft/hard}lockup boot
parameters to sysctl aliases"). However, after this commit the value
of softlockup_panic is set too late to be of help for this type of
problem. Restore the prior behavior.
Signed-off-by: Krister Johansen <[email protected]>
Cc: [email protected]
Fixes: f117955a2255 ("kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases")
Signed-off-by: Luis Chamberlain <[email protected]>
|
|
The code that checks for unknown boot options is unaware of the sysctl
alias facility, which maps bootparams to sysctl values. If a user sets
an old value that has a valid alias, a message about an invalid
parameter will be printed during boot, and the parameter will get passed
to init. Fix by checking for the existence of aliased parameters in the
unknown boot parameter code. If an alias exists, don't return an error
or pass the value to init.
Signed-off-by: Krister Johansen <[email protected]>
Cc: [email protected]
Fixes: 0a477e1ae21b ("kernel/sysctl: support handling command line aliases")
Signed-off-by: Luis Chamberlain <[email protected]>
|
|
Merge series from [email protected]:
The maximum value that calib can set should be
consistent with the maximum value of re.
An error code should be return when the re is greater
than the maximum value or less than the minimum value
The value of vsense_select should be either 32
or 0 in both cases, so modify the
AW88399_DEV_VDSEL_VSENSE macro to 32.
|
|
Pull drm updates from Dave Airlie:
"Highlights:
- AMD adds some more upcoming HW platforms
- Intel made Meteorlake stable and started adding Lunarlake
- nouveau has a bunch of display rework in prepartion for the NVIDIA
GSP firmware support
- msm adds a7xx support
- habanalabs has finished migration to accel subsystem
Detail summary:
kernel:
- add initial vmemdup-user-array
core:
- fix platform remove() to return void
- drm_file owner updated to reflect owner
- move size calcs to drm buddy allocator
- let GPUVM build as a module
- allow variable number of run-queues in scheduler
edid:
- handle bad h/v sync_end in EDIDs
panfrost:
- add Boris as maintainer
fbdev:
- use fb_ops helpers more
- only allow logo use from fbcon
- rename fb_pgproto to pgprot_framebuffer
- add HPD state to drm_connector_oob_hotplug_event
- convert to fbdev i/o mem helpers
i915:
- Enable meteorlake by default
- Early Xe2 LPD/Lunarlake display enablement
- Rework subplatforms into IP version checks
- GuC based TLB invalidation for Meteorlake
- Display rework for future Xe driver integration
- LNL FBC features
- LNL display feature capability reads
- update recommended fw versions for DG2+
- drop fastboot module parameter
- added deviceid for Arrowlake-S
- drop preproduction workarounds
- don't disable preemption for resets
- cleanup inlines in headers
- PXP firmware loading fix
- Fix sg list lengths
- DSC PPS state readout/verification
- Add more RPL P/U PCI IDs
- Add new DG2-G12 stepping
- DP enhanced framing support to state checker
- Improve shared link bandwidth management
- stop using GEM macros in display code
- refactor related code into display code
- locally enable W=1 warnings
- remove PSR watchdog timers on LNL
amdgpu:
- RAS/FRU EEPROM updatse
- IP discovery updatses
- GC 11.5 support
- DCN 3.5 support
- VPE 6.1 support
- NBIO 7.11 support
- DML2 support
- lots of IP updates
- use flexible arrays for bo list handling
- W=1 fixes
- Enable seamless boot in more cases
- Enable context type property for HDMI
- Rework GPUVM TLB flushing
- VCN IB start/size alignment fixes
amdkfd:
- GC 10/11 fixes
- GC 11.5 support
- use partial migration in GPU faults
radeon:
- W=1 Fixes
- fix some possible buffer overflow/NULL derefs
nouveau:
- update uapi for NO_PREFETCH
- scheduler/fence fixes
- rework suspend/resume for GSP-RM
- rework display in preparation for GSP-RM
habanalabs:
- uapi: expose tsc clock
- uapi: block access to eventfd through control device
- uapi: force dma-buf export to PAGE_SIZE alignments
- complete move to accel subsystem
- move firmware interface include files
- perform hard reset on PCIe AXI drain event
- optimise user interrupt handling
msm:
- DP: use existing helpers for DPCD
- DPU: interrupts reworked
- gpu: a7xx (a730/a740) support
- decouple msm_drv from kms for headless devices
mediatek:
- MT8188 dsi/dp/edp support
- DDP GAMMA - 12 bit LUT support
- connector dynamic selection capability
rockchip:
- rv1126 mipi-dsi/vop support
- add planar formats
ast:
- rename constants
panels:
- Mitsubishi AA084XE01
- JDI LPM102A188A
- LTK050H3148W-CTA6
ivpu:
- power management fixes
qaic:
- add detach slice bo api
komeda:
- add NV12 writeback
tegra:
- support NVSYNC/NHSYNC
- host1x suspend fixes
ili9882t:
- separate into own driver"
* tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm: (1803 commits)
drm/amdgpu: Remove unused variables from amdgpu_show_fdinfo
drm/amdgpu: Remove duplicate fdinfo fields
drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver is unloaded
drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems
drm/amdgpu: Retrieve CE count from ce_count_lo_chip in EccInfo table
drm/amdgpu: Identify data parity error corrected in replay mode
drm/amdgpu: Fix typo in IP discovery parsing
drm/amd/display: fix S/G display enablement
drm/amdxcp: fix amdxcp unloads incompletely
drm/amd/amdgpu: fix the GPU power print error in pm info
drm/amdgpu: Use pcie domain of xcc acpi objects
drm/amd: check num of link levels when update pcie param
drm/amdgpu: Add a read to GFX v9.4.3 ring test
drm/amd/pm: call smu_cmn_get_smc_version in is_mode1_reset_supported.
drm/amdgpu: get RAS poison status from DF v4_6_2
drm/amdgpu: Use discovery table's subrevision
drm/amd/display: 3.2.256
drm/amd/display: add interface to query SubVP status
drm/amd/display: Read before writing Backlight Mode Set Register
drm/amd/display: Disable SYMCLK32_SE RCO on DCN314
...
|
|
Per a discussion last week, let's improve coordination between fs/iomap/
and the rest of the VFS by shifting Christian into the role of git tree
maintainer. I'll stay on as reviewer and main developer, which will
free up some more time to clean up the code base a bit and help
filesystem maintainers port off of bufferheads and onto iomap.
Link: https://lore.kernel.org/linux-fsdevel/20231026-gehofft-vorfreude-a5079bff7373@brauner/
Signed-off-by: Darrick J. Wong <[email protected]>
Link: https://lore.kernel.org/r/20231031234820.GB1205221@frogsfrogsfrogs
Signed-off-by: Christian Brauner <[email protected]>
|
|
Now that trap support is ready to handle misalignment errors in S-mode,
allow the user to control the behavior of misaligned accesses using
prctl(PR_SET_UNALIGN). Add an align_ctl flag in thread_struct which
will be used to determine if we should SIGBUS the process or not on
such fault.
Signed-off-by: Clément Léger <[email protected]>
Reviewed-by: Björn Töpel <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Palmer Dabbelt <[email protected]>
|