Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
Pull dmi update from Jean Delvare.
Tiny cleanup.
* 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
firmware: dmi: Use the proper accessor for the version field
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- The psi data structure was changed to be allocated dynamically but
it wasn't being cleared leading to it reporting garbage values and
triggering spurious oom kills.
- A deadlock involving cpuset and cpu hotplug.
- When a controller is moved across cgroup hierarchies,
css->rstat_css_node didn't get RCU drained properly from the previous
list.
* tag 'cgroup-for-6.0-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: Fix race condition at rebind_subsystems()
cgroup: Fix threadgroup_rwsem <-> cpus_read_lock() deadlock
sched/psi: Remove redundant cgroup_psi() when !CONFIG_CGROUPS
sched/psi: Remove unused parameter nbytes of psi_trigger_create()
sched/psi: Zero the memory of struct psi_group
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit
Pull audit fix from Paul Moore:
"A single fix for a potential double-free on a fsnotify error path"
* tag 'audit-pr-20220823' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
audit: fix potential double free on error path from fsnotify_add_inode_mark
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull file_remove_privs() fix from Christian Brauner:
"As part of Stefan's and Jens' work to add async buffered write
support to xfs we refactored file_remove_privs() and added
__file_remove_privs() to avoid calling __remove_privs() when
IOCB_NOWAIT is passed.
While debugging a recent performance regression report I found that
during review we missed that commit faf99b563558 ("fs: add
__remove_file_privs() with flags parameter") accidently changed
behavior when dentry_needs_remove_privs() returns zero.
Before the commit it would still call inode_has_no_xattr() setting
the S_NOSEC bit and thereby avoiding even calling into
dentry_needs_remove_privs() the next time this function is called.
After that commit inode_has_no_xattr() would only be called if
__remove_privs() had to be called.
Restore the old behavior. This is likely the cause of the performance
regression"
* tag 'fs.fixes.v6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
fs: __file_remove_privs(): restore call to inode_has_no_xattr()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux
Pull parisc fixes from Helge Deller:
"Some interesting background to the current patchset:
It turned out that the fldw instruction (which loads a 32-bit word
from memory into one half of a FP register) failed on unaligned
addresses and even trashed some other random FP register instead. It's
a trivial one-liner fix in the exception handler but this failure
dates back to the very beginnings of the parisc-port. It's strange
that it was never noticed before.
Another patch fixes an annoyance noticed by Randy Dunlap. Running
"make ARCH=parisc64 randconfig" always returned a 32-bit config,
although one would expect a 64-bit config. Masahiro Yamada suggested
to mimik sparc Kconfig code, which fixed the issue nicely. This
allowed to drop some compiler build checks too.
Third, it's possible to build an optimized 32-bit kernel for PA8X00
(64-bit) CPUs, which then wouldn't start on 32-bit-only (PA1.x)
machines. I've added a bootup check which prevents that and which
prints a message to the console. This can be tested with qemu, which
currently only supports 32-bit emulation.
The other patches are usual clean-up stuff like added return value
checks and typo fixes in comments.
Summary:
- Fix emulation of fldw instruction on unaligned addresses
- Fix "make ARCH=parisc64 randconfig" to return a 64-bit config
- Prevent boot if trying to boot a 32-bit kernel compiled for PA8X00
CPUs on 32-bit only machines
- ccio-dma: Handle kmalloc failure in ccio_init_resources()"
* tag 'parisc-for-6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines
parisc: ccio-dma: Handle kmalloc failure in ccio_init_resources()
parisc: led: Move from strlcpy with unused retval to strscpy
parisc: ccio-dma: Fix typo in comment
Revert "parisc: Show error if wrong 32/64-bit compiler is being used"
parisc: Make CONFIG_64BIT available for ARCH=parisc64 only
parisc: Fix exception handler for fldw and fstw instructions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Thirteen fixes, almost all for MM.
Seven of these are cc:stable and the remainder fix up the changes
which went into this -rc cycle"
* tag 'mm-hotfixes-stable-2022-08-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
kprobes: don't call disarm_kprobe() for disabled kprobes
mm/shmem: shmem_replace_page() remember NR_SHMEM
mm/shmem: tmpfs fallocate use file_modified()
mm/shmem: fix chattr fsflags support in tmpfs
mm/hugetlb: support write-faults in shared mappings
mm/hugetlb: fix hugetlb not supporting softdirty tracking
mm/uffd: reset write protection when unregister with wp-mode
mm/smaps: don't access young/dirty bit if pte unpresent
mm: add DEVICE_ZONE to FOR_ALL_ZONES
kernel/sys_ni: add compat entry for fadvise64_64
mm/gup: fix FOLL_FORCE COW security issue and remove FOLL_COW
Revert "zram: remove double compression logic"
get_maintainer: add Alan to .get_maintainer.ignore
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull KUnit fixes from Shuah Khan:
"Fix for a mmc test and to load .kunit_test_suites section when
CONFIG_KUNIT=m, and not just when KUnit is built-in"
* tag 'linux-kselftest-kunit-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
module: kunit: Load .kunit_test_suites section when CONFIG_KUNIT=m
mmc: sdhci-of-aspeed: test: Fix dependencies when KUNIT=m
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fixes from Shuah Khan:
"Fixes to vm and sgx test builds"
* tag 'linux-kselftest-fixes-6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/vm: fix inability to build any vm tests
selftests/sgx: Ignore OpenSSL 3.0 deprecated functions warning
|
|
Root cause:
The rebind_subsystems() is no lock held when move css object from A
list to B list,then let B's head be treated as css node at
list_for_each_entry_rcu().
Solution:
Add grace period before invalidating the removed rstat_css_node.
Reported-by: Jing-Ting Wu <[email protected]>
Suggested-by: Michal Koutný <[email protected]>
Signed-off-by: Jing-Ting Wu <[email protected]>
Tested-by: Jing-Ting Wu <[email protected]>
Link: https://lore.kernel.org/linux-arm-kernel/[email protected]/T/
Acked-by: Mukesh Ojha <[email protected]>
Fixes: a7df69b81aac ("cgroup: rstat: support cgroup1")
Cc: [email protected] # v5.13+
Signed-off-by: Tejun Heo <[email protected]>
|
|
Audit_alloc_mark() assign pathname to audit_mark->path, on error path
from fsnotify_add_inode_mark(), fsnotify_put_mark will free memory
of audit_mark->path, but the caller of audit_alloc_mark will free
the pathname again, so there will be double free problem.
Fix this by resetting audit_mark->path to NULL pointer on error path
from fsnotify_add_inode_mark().
Cc: [email protected]
Fixes: 7b1293234084d ("fsnotify: Add group pointer in fsnotify_init_mark()")
Signed-off-by: Gaosheng Cui <[email protected]>
Reviewed-by: Jan Kara <[email protected]>
Signed-off-by: Paul Moore <[email protected]>
|
|
Pull NFS client fixes from Trond Myklebust:
"Stable fixes:
- NFS: Fix another fsync() issue after a server reboot
Bugfixes:
- NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT
- NFS: Fix missing unlock in nfs_unlink()
- Add sanity checking of the file type used by __nfs42_ssc_open
- Fix a case where we're failing to set task->tk_rpc_status
Cleanups:
- Remove the NFS_CONTEXT_RESEND_WRITES flag that got obsoleted by the
fsync() fix"
* tag 'nfs-for-5.20-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
SUNRPC: RPC level errors should set task->tk_rpc_status
NFSv4.2 fix problems with __nfs42_ssc_open
NFS: unlink/rmdir shouldn't call d_delete() twice on ENOENT
NFS: Cleanup to remove unused flag NFS_CONTEXT_RESEND_WRITES
NFS: Remove a bogus flag setting in pnfs_write_done_resend_to_mds
NFS: Fix another fsync() issue after a server reboot
NFS: Fix missing unlock in nfs_unlink()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping
Pull idmapping fixes from Christian Brauner:
- Since Seth joined as co-maintainer for idmapped mounts we decided to
use a shared git tree. Konstantin suggested we use vfs/idmapping.git
on kernel.org under the vfs/ namespace. So this updates the tree in
the maintainers file.
- Ensure that POSIX ACLs checking, getting, and setting works correctly
for filesystems mountable with a filesystem idmapping that want to
support idmapped mounts.
Since no filesystems mountable with an fs_idmapping do yet support
idmapped mounts there is no problem. But this could change in the
future, so add a check to refuse to create idmapped mounts when the
mounter is not privileged over the mount's idmapping.
- Check that caller is privileged over the idmapping that will be
attached to a mount.
Currently no FS_USERNS_MOUNT filesystems support idmapped mounts,
thus this is not a problem as only CAP_SYS_ADMIN in init_user_ns is
allowed to set up idmapped mounts. But this could change in the
future, so add a check to refuse to create idmapped mounts when the
mounter is not privileged over the mount's idmapping.
- Fix POSIX ACLs for ntfs3. While looking at our current POSIX ACL
handling in the context of some overlayfs work I went through a range
of other filesystems checking how they handle them currently and
encountered a few bugs in ntfs3.
I've sent this some time ago and the fixes haven't been picked up
even though the pull request for other ntfs3 fixes got sent after.
This should really be fixed as right now POSIX ACLs are broken in
certain circumstances for ntfs3.
* tag 'fs.idmapped.fixes.v6.0-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping:
ntfs: fix acl handling
fs: require CAP_SYS_ADMIN in target namespace for idmapped mounts
MAINTAINERS: update idmapping tree
acl: handle idmapped mounts for idmapped filesystems
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux
Pull file locking fix from Jeff Layton:
"Just a single patch for a bugfix in the flock() codepath, introduced
by a patch that went in recently"
* tag 'filelock-v6.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: Fix dropped call to ->fl_release_private()
|
|
Commit a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO") eradicates
CC_HAS_ASM_GOTO, and in the process also causes the perf tool on x86 to
use asm_volatile_goto when compiling __GEN_RMWcc.
However, asm_volatile_goto is not declared in the perf tool headers,
which causes a compilation error:
In file included from tools/arch/x86/include/asm/atomic.h:7,
from tools/include/asm/atomic.h:6,
from tools/include/linux/atomic.h:5,
from tools/include/linux/refcount.h:41,
from tools/lib/perf/include/internal/cpumap.h:5,
from tools/perf/util/cpumap.h:7,
from tools/perf/util/env.h:7,
from tools/perf/util/header.h:12,
from pmu-events/pmu-events.c:9:
tools/arch/x86/include/asm/atomic.h: In function ‘atomic_dec_and_test’:
tools/arch/x86/include/asm/rmwcc.h:7:2: error: implicit declaration of function ‘asm_volatile_goto’ [-Werror=implicit-function-declaration]
asm_volatile_goto (fullop "; j" cc " %l[cc_label]" \
^~~~~~~~~~~~~~~~~
Define asm_volatile_goto in compiler_types.h if not declared, like the
main kernel header files do.
Fixes: a0a12c3ed057 ("asm goto: eradicate CC_HAS_ASM_GOTO")
Signed-off-by: Yang Jihong <[email protected]>
Tested-by: Arnaldo Carvalho de Melo <[email protected]>
Tested-by: Ingo Molnar <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
While looking at our current POSIX ACL handling in the context of some
overlayfs work I went through a range of other filesystems checking how they
handle them currently and encountered ntfs3.
The posic_acl_{from,to}_xattr() helpers always need to operate on the
filesystem idmapping. Since ntfs3 can only be mounted in the initial user
namespace the relevant idmapping is init_user_ns.
The posix_acl_{from,to}_xattr() helpers are concerned with translating between
the kernel internal struct posix_acl{_entry} and the uapi struct
posix_acl_xattr_{header,entry} and the kernel internal data structure is cached
filesystem wide.
Additional idmappings such as the caller's idmapping or the mount's idmapping
are handled higher up in the VFS. Individual filesystems usually do not need to
concern themselves with these.
The posix_acl_valid() helper is concerned with checking whether the values in
the kernel internal struct posix_acl can be represented in the filesystem's
idmapping. IOW, if they can be written to disk. So this helper too needs to
take the filesystem's idmapping.
Fixes: be71b5cba2e6 ("fs/ntfs3: Add attrib operations")
Cc: Konstantin Komarov <[email protected]>
Cc: [email protected]
Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
|
|
If a 32-bit kernel was compiled for PA2.0 CPUs, it won't be able to run
on machines with PA1.x CPUs. Add a check and bail out early if a PA1.x
machine is detected.
Signed-off-by: Helge Deller <[email protected]>
|
|
As the possible failure of the kmalloc(), it should be better
to fix this error path, check and return '-ENOMEM' error code.
Signed-off-by: Li Qiong <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
Follow the advice of the below link and prefer 'strscpy' in this
subsystem. Conversion is 1:1 because the return value is not used.
Generated by a coccinelle script.
Link: https://lore.kernel.org/r/CAHk-=wgfRnXz0W3D37d01q3JFkr_i_uTL=V6A6G1oUZcprmknw@mail.gmail.com/
Signed-off-by: Wolfram Sang <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
The double `was' is duplicated in the comment, remove one.
Signed-off-by: Jason Wang <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
|
|
This reverts commit b160628e9ebcdc85d0db9d7f423c26b3c7c179d0.
There is no need any longer to have this sanity check, because the
previous commit ("parisc: Make CONFIG_64BIT available for ARCH=parisc64
only") prevents that CONFIG_64BIT is set if ARCH==parisc.
Signed-off-by: Helge Deller <[email protected]>
|
|
With this patch the ARCH= parameter decides if the
CONFIG_64BIT option will be set or not. This means, the
ARCH= parameter will give:
ARCH=parisc -> 32-bit kernel
ARCH=parisc64 -> 64-bit kernel
This simplifies the usage of the other config options like
randconfig, allmodconfig and allyesconfig a lot and produces
the output which is expected for parisc64 (64-bit) vs. parisc (32-bit).
Suggested-by: Masahiro Yamada <[email protected]>
Signed-off-by: Helge Deller <[email protected]>
Tested-by: Randy Dunlap <[email protected]>
Reviewed-by: Randy Dunlap <[email protected]>
Cc: <[email protected]> # 5.15+
|
|
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Ingo Molnar:
"Misc irqchip fixes: LoongArch driver fixes and a Hyper-V IOMMU fix"
* tag 'irq-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/loongson-liointc: Fix an error handling path in liointc_init()
irqchip/loongarch: Fix irq_domain_alloc_fwnode() abuse
irqchip/loongson-pch-pic: Move find_pch_pic() into CONFIG_ACPI
irqchip/loongson-eiointc: Fix a build warning
irqchip/loongson-eiointc: Fix irq affinity setting
iommu/hyper-v: Use helper instead of directly accessing affinity
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 kprobes fix from Ingo Molnar:
"Fix a kprobes bug in JNG/JNLE emulation when a kprobe is installed at
such instructions, possibly resulting in incorrect execution (the
wrong branch taken)"
* tag 'perf-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kprobes: Fix JNG/JNLE emulation
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Various fixes for tracing:
- Fix a return value of traceprobe_parse_event_name()
- Fix NULL pointer dereference from failed ftrace enabling
- Fix NULL pointer dereference when asking for registers from eprobes
- Make eprobes consistent with kprobes/uprobes, filters and
histograms"
* tag 'trace-v6.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Have filter accept "common_cpu" to be consistent
tracing/probes: Have kprobes and uprobes use $COMM too
tracing/eprobes: Have event probes be consistent with kprobes and uprobes
tracing/eprobes: Fix reading of string fields
tracing/eprobes: Do not hardcode $comm as a string
tracing/eprobes: Do not allow eprobes to use $stack, or % for regs
ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead
tracing/perf: Fix double put of trace event when init fails
tracing: React to error return from traceprobe_parse_event_name()
|
|
Make filtering consistent with histograms. As "cpu" can be a field of an
event, allow for "common_cpu" to keep it from being confused with the
"cpu" field of the event.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected]
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Tzvetomir Stoyanov <[email protected]>
Cc: Tom Zanussi <[email protected]>
Fixes: 1e3bac71c5053 ("tracing/histogram: Rename "cpu" to "common_cpu"")
Suggested-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Both $comm and $COMM can be used to get current->comm in eprobes and the
filtering and histogram logic. Make kprobes and uprobes consistent in this
regard and allow both $comm and $COMM as well. Currently kprobes and
uprobes only handle $comm, which is inconsistent with the other utilities,
and can be confusing to users.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lore.kernel.org/all/[email protected]/
Cc: [email protected]
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Tzvetomir Stoyanov <[email protected]>
Cc: Tom Zanussi <[email protected]>
Fixes: 533059281ee5 ("tracing: probeevent: Introduce new argument fetching code")
Suggested-by: Masami Hiramatsu (Google) <[email protected]>
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Currently, if a symbol "@" is attempted to be used with an event probe
(eprobes), it will cause a NULL pointer dereference crash.
Both kprobes and uprobes can reference data other than the main registers.
Such as immediate address, symbols and the current task name. Have eprobes
do the same thing.
For "comm", if "comm" is used and the event being attached to does not
have the "comm" field, then make it the "$comm" that kprobes has. This is
consistent to the way histograms and filters work.
Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Tzvetomir Stoyanov <[email protected]>
Cc: Tom Zanussi <[email protected]>
Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
Currently when an event probe (eprobe) hooks to a string field, it does
not display it as a string, but instead as a number. This makes the field
rather useless. Handle the different kinds of strings, dynamic, static,
relational/dynamic etc.
Now when a string field is used, the ":string" type can be used to display
it:
echo "e:sw sched/sched_switch comm=$next_comm:string" > dynamic_events
Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Tzvetomir Stoyanov <[email protected]>
Cc: Tom Zanussi <[email protected]>
Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
The variable $comm is hard coded as a string, which is true for both
kprobes and uprobes, but for event probes (eprobes) it is a field name. In
most cases the "comm" field would be a string, but there's no guarantee of
that fact.
Do not assume that comm is a string. Not to mention, it currently forces
comm fields to fault, as string processing for event probes is currently
broken.
Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Tzvetomir Stoyanov <[email protected]>
Cc: Tom Zanussi <[email protected]>
Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
While playing with event probes (eprobes), I tried to see what would
happen if I attempted to retrieve the instruction pointer (%rip) knowing
that event probes do not use pt_regs. The result was:
BUG: kernel NULL pointer dereference, address: 0000000000000024
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 1 PID: 1847 Comm: trace-cmd Not tainted 5.19.0-rc5-test+ #309
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01
v03.03 07/14/2016
RIP: 0010:get_event_field.isra.0+0x0/0x50
Code: ff 48 c7 c7 c0 8f 74 a1 e8 3d 8b f5 ff e8 88 09 f6 ff 4c 89 e7 e8
50 6a 13 00 48 89 ef 5b 5d 41 5c 41 5d e9 42 6a 13 00 66 90 <48> 63 47 24
8b 57 2c 48 01 c6 8b 47 28 83 f8 02 74 0e 83 f8 04 74
RSP: 0018:ffff916c394bbaf0 EFLAGS: 00010086
RAX: ffff916c854041d8 RBX: ffff916c8d9fbf50 RCX: ffff916c255d2000
RDX: 0000000000000000 RSI: ffff916c255d2008 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff916c3a2a0c08 R09: ffff916c394bbda8
R10: 0000000000000000 R11: 0000000000000000 R12: ffff916c854041d8
R13: ffff916c854041b0 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff916c9ea40000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000024 CR3: 000000011b60a002 CR4: 00000000001706e0
Call Trace:
<TASK>
get_eprobe_size+0xb4/0x640
? __mod_node_page_state+0x72/0xc0
__eprobe_trace_func+0x59/0x1a0
? __mod_lruvec_page_state+0xaa/0x1b0
? page_remove_file_rmap+0x14/0x230
? page_remove_rmap+0xda/0x170
event_triggers_call+0x52/0xe0
trace_event_buffer_commit+0x18f/0x240
trace_event_raw_event_sched_wakeup_template+0x7a/0xb0
try_to_wake_up+0x260/0x4c0
__wake_up_common+0x80/0x180
__wake_up_common_lock+0x7c/0xc0
do_notify_parent+0x1c9/0x2a0
exit_notify+0x1a9/0x220
do_exit+0x2ba/0x450
do_group_exit+0x2d/0x90
__x64_sys_exit_group+0x14/0x20
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Obviously this is not the desired result.
Move the testing for TPARG_FL_TPOINT which is only used for event probes
to the top of the "$" variable check, as all the other variables are not
used for event probes. Also add a check in the register parsing "%" to
fail if an event probe is used.
Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Tzvetomir Stoyanov <[email protected]>
Cc: Tom Zanussi <[email protected]>
Fixes: 7491e2c44278 ("tracing: Add a probe that attaches to trace events")
Acked-by: Masami Hiramatsu (Google) <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
ftrace_startup does not remove ops from ftrace_ops_list when
ftrace_startup_enable fails:
register_ftrace_function
ftrace_startup
__register_ftrace_function
...
add_ftrace_ops(&ftrace_ops_list, ops)
...
...
ftrace_startup_enable // if ftrace failed to modify, ftrace_disabled is set to 1
...
return 0 // ops is in the ftrace_ops_list.
When ftrace_disabled = 1, unregister_ftrace_function simply returns without doing anything:
unregister_ftrace_function
ftrace_shutdown
if (unlikely(ftrace_disabled))
return -ENODEV; // return here, __unregister_ftrace_function is not executed,
// as a result, ops is still in the ftrace_ops_list
__unregister_ftrace_function
...
If ops is dynamically allocated, it will be free later, in this case,
is_ftrace_trampoline accesses NULL pointer:
is_ftrace_trampoline
ftrace_ops_trampoline
do_for_each_ftrace_op(op, ftrace_ops_list) // OOPS! op may be NULL!
Syzkaller reports as follows:
[ 1203.506103] BUG: kernel NULL pointer dereference, address: 000000000000010b
[ 1203.508039] #PF: supervisor read access in kernel mode
[ 1203.508798] #PF: error_code(0x0000) - not-present page
[ 1203.509558] PGD 800000011660b067 P4D 800000011660b067 PUD 130fb8067 PMD 0
[ 1203.510560] Oops: 0000 [#1] SMP KASAN PTI
[ 1203.511189] CPU: 6 PID: 29532 Comm: syz-executor.2 Tainted: G B W 5.10.0 #8
[ 1203.512324] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 1203.513895] RIP: 0010:is_ftrace_trampoline+0x26/0xb0
[ 1203.514644] Code: ff eb d3 90 41 55 41 54 49 89 fc 55 53 e8 f2 00 fd ff 48 8b 1d 3b 35 5d 03 e8 e6 00 fd ff 48 8d bb 90 00 00 00 e8 2a 81 26 00 <48> 8b ab 90 00 00 00 48 85 ed 74 1d e8 c9 00 fd ff 48 8d bb 98 00
[ 1203.518838] RSP: 0018:ffffc900012cf960 EFLAGS: 00010246
[ 1203.520092] RAX: 0000000000000000 RBX: 000000000000007b RCX: ffffffff8a331866
[ 1203.521469] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000000010b
[ 1203.522583] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8df18b07
[ 1203.523550] R10: fffffbfff1be3160 R11: 0000000000000001 R12: 0000000000478399
[ 1203.524596] R13: 0000000000000000 R14: ffff888145088000 R15: 0000000000000008
[ 1203.525634] FS: 00007f429f5f4700(0000) GS:ffff8881daf00000(0000) knlGS:0000000000000000
[ 1203.526801] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1203.527626] CR2: 000000000000010b CR3: 0000000170e1e001 CR4: 00000000003706e0
[ 1203.528611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1203.529605] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Therefore, when ftrace_startup_enable fails, we need to rollback registration
process and remove ops from ftrace_ops_list.
Link: https://lkml.kernel.org/r/[email protected]
Suggested-by: Steven Rostedt <[email protected]>
Signed-off-by: Yang Jihong <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
If in perf_trace_event_init(), the perf_trace_event_open() fails, then it
will call perf_trace_event_unreg() which will not only unregister the perf
trace event, but will also call the put() function of the tp_event.
The problem here is that the trace_event_try_get_ref() is called by the
caller of perf_trace_event_init() and if perf_trace_event_init() returns a
failure, it will then call trace_event_put(). But since the
perf_trace_event_unreg() already called the trace_event_put() function, it
triggers a WARN_ON().
WARNING: CPU: 1 PID: 30309 at kernel/trace/trace_dynevent.c:46 trace_event_dyn_put_ref+0x15/0x20
If perf_trace_event_reg() does not call the trace_event_try_get_ref() then
the perf_trace_event_unreg() should not be calling trace_event_put(). This
breaks symmetry and causes bugs like these.
Pull out the trace_event_put() from perf_trace_event_unreg() and call it
in the locations that perf_trace_event_unreg() is called. This not only
fixes this bug, but also brings back the proper symmetry of the reg/unreg
vs get/put logic.
Link: https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Cc: [email protected]
Fixes: 1d18538e6a092 ("tracing: Have dynamic events have a ref counter")
Reported-by: Krister Johansen <[email protected]>
Reviewed-by: Krister Johansen <[email protected]>
Tested-by: Krister Johansen <[email protected]>
Acked-by: Jiri Olsa <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
The function traceprobe_parse_event_name() may set the first two function
arguments to a non-null value and still return -EINVAL to indicate an
unsuccessful completion of the function. Hence, it is not sufficient to
just check the result of the two function arguments for being not null,
but the return value also needs to be checked.
Commit 95c104c378dc ("tracing: Auto generate event name when creating a
group of events") changed the error-return-value checking of the second
traceprobe_parse_event_name() invocation in __trace_eprobe_create() and
removed checking the return value to jump to the error handling case.
Reinstate using the return value in the error-return-value checking.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 95c104c378dc ("tracing: Auto generate event name when creating a group of events")
Acked-by: Linyu Yuan <[email protected]>
Signed-off-by: Lukas Bulwahn <[email protected]>
Signed-off-by: Steven Rostedt (Google) <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
"A revert to fix a regression introduced this merge window and a fix
for proper error handling in the remove path of the iMX driver"
* tag 'i2c-for-6.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: imx: Make sure to unregister adapter on remove()
Revert "i2c: scmi: Replace open coded device_get_match_data()"
|
|
Pull cifs client fixes from Steve French:
- memory leak fix
- two small cleanups
- trivial strlcpy removal
- update missing entry for cifs headers in MAINTAINERS file
* tag '6.0-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: move from strlcpy with unused retval to strscpy
cifs: Fix memory leak on the deferred close
cifs: remove useless parameter 'is_fsctl' from SMB2_ioctl()
cifs: remove unused server parameter from calc_smb_size()
cifs: missing directory in MAINTAINERS file
|
|
GCC has supported asm goto since 4.5, and Clang has since version 9.0.0.
The minimum supported versions of these tools for the build according to
Documentation/process/changes.rst are 5.1 and 11.0.0 respectively.
Remove the feature detection script, Kconfig option, and clean up some
fallback code that is no longer supported.
The removed script was also testing for a GCC specific bug that was
fixed in the 4.7 release.
Also remove workarounds for bpftrace using clang older than 9.0.0, since
other BPF backend fixes are required at this point.
Link: https://lore.kernel.org/lkml/CAK7LNATSr=BXKfkdW8f-H5VT_w=xBpT2ZQcZ7rm6JfkdE+QnmA@mail.gmail.com/
Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48637
Acked-by: Borislav Petkov <[email protected]>
Suggested-by: Masahiro Yamada <[email protected]>
Suggested-by: Alexei Starovoitov <[email protected]>
Signed-off-by: Nick Desaulniers <[email protected]>
Reviewed-by: Ingo Molnar <[email protected]>
Reviewed-by: Nathan Chancellor <[email protected]>
Reviewed-by: Alexandre Belloni <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
If for whatever reasons pm_runtime_resume_and_get() fails and .remove() is
exited early, the i2c adapter stays around and the irq still calls its
handler, while the driver data and the register mapping go away. So if
later the i2c adapter is accessed or the irq triggers this results in
havoc accessing freed memory and unmapped registers.
So unregister the software resources even if resume failed, and only skip
the hardware access in that case.
Fixes: 588eb93ea49f ("i2c: imx: add runtime pm support to improve the performance")
Signed-off-by: Uwe Kleine-König <[email protected]>
Acked-by: Oleksij Rempel <[email protected]>
Signed-off-by: Wolfram Sang <[email protected]>
|
|
This reverts commit 9ae551ded5ba55f96a83cd0811f7ef8c2f329d0c. We got a
regression report, so ensure this machine boots again. We will come back
with a better version hopefully.
Reported-by: Josef Johansson <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Wolfram Sang <[email protected]>
|
|
The exception handler is broken for unaligned memory acceses with fldw
and fstw instructions, because it trashes or uses randomly some other
floating point register than the one specified in the instruction word
on loads and stores.
The instruction "fldw 0(addr),%fr22L" (and the other fldw/fstw
instructions) encode the target register (%fr22) in the rightmost 5 bits
of the instruction word. The 7th rightmost bit of the instruction word
defines if the left or right half of %fr22 should be used.
While processing unaligned address accesses, the FR3() define is used to
extract the offset into the local floating-point register set. But the
calculation in FR3() was buggy, so that for example instead of %fr22,
register %fr12 [((22 * 2) & 0x1f) = 12] was used.
This bug has been since forever in the parisc kernel and I wonder why it
wasn't detected earlier. Interestingly I noticed this bug just because
the libime debian package failed to build on *native* hardware, while it
successfully built in qemu.
This patch corrects the bitshift and masking calculation in FR3().
Signed-off-by: Helge Deller <[email protected]>
Cc: <[email protected]>
|
|
The assumption in __disable_kprobe() is wrong, and it could try to disarm
an already disarmed kprobe and fire the WARN_ONCE() below. [0] We can
easily reproduce this issue.
1. Write 0 to /sys/kernel/debug/kprobes/enabled.
# echo 0 > /sys/kernel/debug/kprobes/enabled
2. Run execsnoop. At this time, one kprobe is disabled.
# /usr/share/bcc/tools/execsnoop &
[1] 2460
PCOMM PID PPID RET ARGS
# cat /sys/kernel/debug/kprobes/list
ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE]
ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE]
3. Write 1 to /sys/kernel/debug/kprobes/enabled, which changes
kprobes_all_disarmed to false but does not arm the disabled kprobe.
# echo 1 > /sys/kernel/debug/kprobes/enabled
# cat /sys/kernel/debug/kprobes/list
ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE]
ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE]
4. Kill execsnoop, when __disable_kprobe() calls disarm_kprobe() for the
disabled kprobe and hits the WARN_ONCE() in __disarm_kprobe_ftrace().
# fg
/usr/share/bcc/tools/execsnoop
^C
Actually, WARN_ONCE() is fired twice, and __unregister_kprobe_top() misses
some cleanups and leaves the aggregated kprobe in the hash table. Then,
__unregister_trace_kprobe() initialises tk->rp.kp.list and creates an
infinite loop like this.
aggregated kprobe.list -> kprobe.list -.
^ |
'.__.'
In this situation, these commands fall into the infinite loop and result
in RCU stall or soft lockup.
cat /sys/kernel/debug/kprobes/list : show_kprobe_addr() enters into the
infinite loop with RCU.
/usr/share/bcc/tools/execsnoop : warn_kprobe_rereg() holds kprobe_mutex,
and __get_valid_kprobe() is stuck in
the loop.
To avoid the issue, make sure we don't call disarm_kprobe() for disabled
kprobes.
[0]
Failed to disarm kprobe-ftrace at __x64_sys_execve+0x0/0x40 (error -2)
WARNING: CPU: 6 PID: 2460 at kernel/kprobes.c:1130 __disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129)
Modules linked in: ena
CPU: 6 PID: 2460 Comm: execsnoop Not tainted 5.19.0+ #28
Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:__disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129)
Code: 24 8b 02 eb c1 80 3d c4 83 f2 01 00 75 d4 48 8b 75 00 89 c2 48 c7 c7 90 fa 0f 92 89 04 24 c6 05 ab 83 01 e8 e4 94 f0 ff <0f> 0b 8b 04 24 eb b1 89 c6 48 c7 c7 60 fa 0f 92 89 04 24 e8 cc 94
RSP: 0018:ffff9e6ec154bd98 EFLAGS: 00010282
RAX: 0000000000000000 RBX: ffffffff930f7b00 RCX: 0000000000000001
RDX: 0000000080000001 RSI: ffffffff921461c5 RDI: 00000000ffffffff
RBP: ffff89c504286da8 R08: 0000000000000000 R09: c0000000fffeffff
R10: 0000000000000000 R11: ffff9e6ec154bc28 R12: ffff89c502394e40
R13: ffff89c502394c00 R14: ffff9e6ec154bc00 R15: 0000000000000000
FS: 00007fe800398740(0000) GS:ffff89c812d80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000c00057f010 CR3: 0000000103b54006 CR4: 00000000007706e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<TASK>
__disable_kprobe (kernel/kprobes.c:1716)
disable_kprobe (kernel/kprobes.c:2392)
__disable_trace_kprobe (kernel/trace/trace_kprobe.c:340)
disable_trace_kprobe (kernel/trace/trace_kprobe.c:429)
perf_trace_event_unreg.isra.2 (./include/linux/tracepoint.h:93 kernel/trace/trace_event_perf.c:168)
perf_kprobe_destroy (kernel/trace/trace_event_perf.c:295)
_free_event (kernel/events/core.c:4971)
perf_event_release_kernel (kernel/events/core.c:5176)
perf_release (kernel/events/core.c:5186)
__fput (fs/file_table.c:321)
task_work_run (./include/linux/sched.h:2056 (discriminator 1) kernel/task_work.c:179 (discriminator 1))
exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:169 kernel/entry/common.c:201)
syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:384 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296)
do_syscall_64 (arch/x86/entry/common.c:87)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120)
RIP: 0033:0x7fe7ff210654
Code: 15 79 89 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb be 0f 1f 00 8b 05 9a cd 20 00 48 63 ff 85 c0 75 11 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3a f3 c3 48 83 ec 18 48 89 7c 24 08 e8 34 fc
RSP: 002b:00007ffdbd1d3538 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007fe7ff210654
RDX: 0000000000000000 RSI: 0000000000002401 RDI: 0000000000000008
RBP: 0000000000000000 R08: 94ae31d6fda838a4 R0900007fe8001c9d30
R10: 00007ffdbd1d34b0 R11: 0000000000000246 R12: 00007ffdbd1d3600
R13: 0000000000000000 R14: fffffffffffffffc R15: 00007ffdbd1d3560
</TASK>
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 69d54b916d83 ("kprobes: makes kprobes/enabled works correctly for optimized kprobes.")
Signed-off-by: Kuniyuki Iwashima <[email protected]>
Reported-by: Ayushman Dutta <[email protected]>
Cc: "Naveen N. Rao" <[email protected]>
Cc: Anil S Keshavamurthy <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Masami Hiramatsu <[email protected]>
Cc: Wang Nan <[email protected]>
Cc: Kuniyuki Iwashima <[email protected]>
Cc: Kuniyuki Iwashima <[email protected]>
Cc: Ayushman Dutta <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
Elsewhere, NR_SHMEM is updated at the same time as shmem NR_FILE_PAGES;
but shmem_replace_page() was forgetting to do that - so NR_SHMEM stats
could grow too big or too small, in those unusual cases when it's used.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Reviewed-by: Matthew Wilcox (Oracle) <[email protected]>
Cc: "Darrick J. Wong" <[email protected]>
Cc: Radoslaw Burny <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
5.18 fixed the btrfs and ext4 fallocates to use file_modified(), as xfs
was already doing, to drop privileges: and fstests generic/{683,684,688}
expect this. There's no need to argue over keep-size allocation (which
could just update ctime): fix shmem_fallocate() to behave the same way.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hugh Dickins <[email protected]>
Acked-by: Christian Brauner (Microsoft) <[email protected]>
Cc: "Darrick J. Wong" <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: Radoslaw Burny <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
ext[234] have always allowed unimplemented chattr flags to be set, but
other filesystems have tended to be stricter. Follow the stricter
approach for tmpfs: I don't want to have to explain why csu attributes
don't actually work, and we won't need to update the chattr(1) manpage;
and it's never wrong to start off strict, relaxing later if persuaded.
Allow only a (append only) i (immutable) A (no atime) and d (no dump).
Although lsattr showed 'A' inherited, the NOATIME behavior was not being
inherited: because nothing sync'ed FS_NOATIME_FL to S_NOATIME. Add
shmem_set_inode_flags() to sync the flags, using inode_set_flags() to
avoid that instant of lost immutablility during fileattr_set().
But that change switched generic/079 from passing to failing: because
FS_IMMUTABLE_FL and FS_APPEND_FL had been unconventionally included in the
INHERITED fsflags: remove them and generic/079 is back to passing.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: e408e695f5f1 ("mm/shmem: support FS_IOC_[SG]ETFLAGS in tmpfs")
Signed-off-by: Hugh Dickins <[email protected]>
Cc: "Theodore Ts'o" <[email protected]>
Cc: Radoslaw Burny <[email protected]>
Cc: "Darrick J. Wong" <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
If we ever get a write-fault on a write-protected page in a shared
mapping, we'd be in trouble (again). Instead, we can simply map the page
writable.
And in fact, there is even a way right now to trigger that code via
uffd-wp ever since we stared to support it for shmem in 5.19:
--------------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/ioctl.h>
#include <linux/userfaultfd.h>
#define HUGETLB_SIZE (2 * 1024 * 1024u)
static char *map;
int uffd;
static int temp_setup_uffd(void)
{
struct uffdio_api uffdio_api;
struct uffdio_register uffdio_register;
struct uffdio_writeprotect uffd_writeprotect;
struct uffdio_range uffd_range;
uffd = syscall(__NR_userfaultfd,
O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY);
if (uffd < 0) {
fprintf(stderr, "syscall() failed: %d\n", errno);
return -errno;
}
uffdio_api.api = UFFD_API;
uffdio_api.features = UFFD_FEATURE_PAGEFAULT_FLAG_WP;
if (ioctl(uffd, UFFDIO_API, &uffdio_api) < 0) {
fprintf(stderr, "UFFDIO_API failed: %d\n", errno);
return -errno;
}
if (!(uffdio_api.features & UFFD_FEATURE_PAGEFAULT_FLAG_WP)) {
fprintf(stderr, "UFFD_FEATURE_WRITEPROTECT missing\n");
return -ENOSYS;
}
/* Register UFFD-WP */
uffdio_register.range.start = (unsigned long) map;
uffdio_register.range.len = HUGETLB_SIZE;
uffdio_register.mode = UFFDIO_REGISTER_MODE_WP;
if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) < 0) {
fprintf(stderr, "UFFDIO_REGISTER failed: %d\n", errno);
return -errno;
}
/* Writeprotect a single page. */
uffd_writeprotect.range.start = (unsigned long) map;
uffd_writeprotect.range.len = HUGETLB_SIZE;
uffd_writeprotect.mode = UFFDIO_WRITEPROTECT_MODE_WP;
if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffd_writeprotect)) {
fprintf(stderr, "UFFDIO_WRITEPROTECT failed: %d\n", errno);
return -errno;
}
/* Unregister UFFD-WP without prior writeunprotection. */
uffd_range.start = (unsigned long) map;
uffd_range.len = HUGETLB_SIZE;
if (ioctl(uffd, UFFDIO_UNREGISTER, &uffd_range)) {
fprintf(stderr, "UFFDIO_UNREGISTER failed: %d\n", errno);
return -errno;
}
return 0;
}
int main(int argc, char **argv)
{
int fd;
fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
if (!fd) {
fprintf(stderr, "open() failed\n");
return -errno;
}
if (ftruncate(fd, HUGETLB_SIZE)) {
fprintf(stderr, "ftruncate() failed\n");
return -errno;
}
map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) {
fprintf(stderr, "mmap() failed\n");
return -errno;
}
*map = 0;
if (temp_setup_uffd())
return 1;
*map = 0;
return 0;
}
--------------------------------------------------------------------------
Above test fails with SIGBUS when there is only a single free hugetlb page.
# echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./test
Bus error (core dumped)
And worse, with sufficient free hugetlb pages it will map an anonymous page
into a shared mapping, for example, messing up accounting during unmap
and breaking MAP_SHARED semantics:
# echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./test
# cat /proc/meminfo | grep HugePages_
HugePages_Total: 2
HugePages_Free: 1
HugePages_Rsvd: 18446744073709551615
HugePages_Surp: 0
Reason is that uffd-wp doesn't clear the uffd-wp PTE bit when
unregistering and consequently keeps the PTE writeprotected. Reason for
this is to avoid the additional overhead when unregistering. Note that
this is the case also for !hugetlb and that we will end up with writable
PTEs that still have the uffd-wp PTE bit set once we return from
hugetlb_wp(). I'm not touching the uffd-wp PTE bit for now, because it
seems to be a generic thing -- wp_page_reuse() also doesn't clear it.
VM_MAYSHARE handling in hugetlb_fault() for FAULT_FLAG_WRITE indicates
that MAP_SHARED handling was at least envisioned, but could never have
worked as expected.
While at it, make sure that we never end up in hugetlb_wp() on write
faults without VM_WRITE, because we don't support maybe_mkwrite()
semantics as commonly used in the !hugetlb case -- for example, in
wp_page_reuse().
Note that there is no need to do any kind of reservation in
hugetlb_fault() in this case ... because we already have a hugetlb page
mapped R/O that we will simply map writable and we are not dealing with
COW/unsharing.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Jamie Liu <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: Peter Feiner <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: <[email protected]> [5.19]
Signed-off-by: Andrew Morton <[email protected]>
|
|
Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.
I observed that hugetlb does not support/expect write-faults in shared
mappings that would have to map the R/O-mapped page writable -- and I
found two case where we could currently get such faults and would
erroneously map an anon page into a shared mapping.
Reproducers part of the patches.
I propose to backport both fixes to stable trees. The first fix needs a
small adjustment.
This patch (of 2):
Staring at hugetlb_wp(), one might wonder where all the logic for shared
mappings is when stumbling over a write-protected page in a shared
mapping. In fact, there is none, and so far we thought we could get away
with that because e.g., mprotect() should always do the right thing and
map all pages directly writable.
Looks like we were wrong:
--------------------------------------------------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <fcntl.h>
#include <unistd.h>
#include <errno.h>
#include <sys/mman.h>
#define HUGETLB_SIZE (2 * 1024 * 1024u)
static void clear_softdirty(void)
{
int fd = open("/proc/self/clear_refs", O_WRONLY);
const char *ctrl = "4";
int ret;
if (fd < 0) {
fprintf(stderr, "open(clear_refs) failed\n");
exit(1);
}
ret = write(fd, ctrl, strlen(ctrl));
if (ret != strlen(ctrl)) {
fprintf(stderr, "write(clear_refs) failed\n");
exit(1);
}
close(fd);
}
int main(int argc, char **argv)
{
char *map;
int fd;
fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
if (!fd) {
fprintf(stderr, "open() failed\n");
return -errno;
}
if (ftruncate(fd, HUGETLB_SIZE)) {
fprintf(stderr, "ftruncate() failed\n");
return -errno;
}
map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (map == MAP_FAILED) {
fprintf(stderr, "mmap() failed\n");
return -errno;
}
*map = 0;
if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
fprintf(stderr, "mmprotect() failed\n");
return -errno;
}
clear_softdirty();
if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
fprintf(stderr, "mmprotect() failed\n");
return -errno;
}
*map = 0;
return 0;
}
--------------------------------------------------------------------------
Above test fails with SIGBUS when there is only a single free hugetlb page.
# echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./test
Bus error (core dumped)
And worse, with sufficient free hugetlb pages it will map an anonymous page
into a shared mapping, for example, messing up accounting during unmap
and breaking MAP_SHARED semantics:
# echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
# ./test
# cat /proc/meminfo | grep HugePages_
HugePages_Total: 2
HugePages_Free: 1
HugePages_Rsvd: 18446744073709551615
HugePages_Surp: 0
Reason in this particular case is that vma_wants_writenotify() will
return "true", removing VM_SHARED in vma_set_page_prot() to map pages
write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
support softdirty tracking.
Link: https://lkml.kernel.org/r/[email protected]
Link: https://lkml.kernel.org/r/[email protected]
Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Mike Kravetz <[email protected]>
Cc: Peter Feiner <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Pavel Emelyanov <[email protected]>
Cc: Jamie Liu <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Muchun Song <[email protected]>
Cc: Peter Xu <[email protected]>
Cc: <[email protected]> [3.18+]
Signed-off-by: Andrew Morton <[email protected]>
|
|
The motivation of this patch comes from a recent report and patchfix from
David Hildenbrand on hugetlb shared handling of wr-protected page [1].
With the reproducer provided in commit message of [1], one can leverage
the uffd-wp lazy-reset of ptes to trigger a hugetlb issue which can affect
not only the attacker process, but also the whole system.
The lazy-reset mechanism of uffd-wp was used to make unregister faster,
meanwhile it has an assumption that any leftover pgtable entries should
only affect the process on its own, so not only the user should be aware
of anything it does, but also it should not affect outside of the process.
But it seems that this is not true, and it can also be utilized to make
some exploit easier.
So far there's no clue showing that the lazy-reset is important to any
userfaultfd users because normally the unregister will only happen once
for a specific range of memory of the lifecycle of the process.
Considering all above, what this patch proposes is to do explicit pte
resets when unregister an uffd region with wr-protect mode enabled.
It should be the same as calling ioctl(UFFDIO_WRITEPROTECT, wp=false)
right before ioctl(UFFDIO_UNREGISTER) for the user. So potentially it'll
make the unregister slower. From that pov it's a very slight abi change,
but hopefully nothing should break with this change either.
Regarding to the change itself - core of uffd write [un]protect operation
is moved into a separate function (uffd_wp_range()) and it is reused in
the unregister code path.
Note that the new function will not check for anything, e.g. ranges or
memory types, because they should have been checked during the previous
UFFDIO_REGISTER or it should have failed already. It also doesn't check
mmap_changing because we're with mmap write lock held anyway.
I added a Fixes upon introducing of uffd-wp shmem+hugetlbfs because that's
the only issue reported so far and that's the commit David's reproducer
will start working (v5.19+). But the whole idea actually applies to not
only file memories but also anonymous. It's just that we don't need to
fix anonymous prior to v5.19- because there's no known way to exploit.
IOW, this patch can also fix the issue reported in [1] as the patch 2 does.
[1] https://lore.kernel.org/all/[email protected]/
Link: https://lkml.kernel.org/r/[email protected]
Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs")
Signed-off-by: Peter Xu <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Mike Kravetz <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Nadav Amit <[email protected]>
Cc: Axel Rasmussen <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
These bits should only be valid when the ptes are present. Introducing
two booleans for it and set it to false when !pte_present() for both pte
and pmd accountings.
The bug is found during code reading and no real world issue reported, but
logically such an error can cause incorrect readings for either smaps or
smaps_rollup output on quite a few fields.
For example, it could cause over-estimate on values like Shared_Dirty,
Private_Dirty, Referenced. Or it could also cause under-estimate on
values like LazyFree, Shared_Clean, Private_Clean.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: b1d4d9e0cbd0 ("proc/smaps: carefully handle migration entries")
Fixes: c94b6923fa0a ("/proc/PID/smaps: Add PMD migration entry parsing")
Signed-off-by: Peter Xu <[email protected]>
Reviewed-by: Vlastimil Babka <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Yang Shi <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Cc: Huang Ying <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
FOR_ALL_ZONES should be consistent with enum zone_type. Otherwise,
__count_zid_vm_events have the potential to add count to wrong item when
zid is ZONE_DEVICE.
Link: https://lkml.kernel.org/r/[email protected]
Signed-off-by: Hao Lee <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Johannes Weiner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|
|
When CONFIG_ADVISE_SYSCALLS is not set/enabled and CONFIG_COMPAT is
set/enabled, the riscv compat_syscall_table references
'compat_sys_fadvise64_64', which is not defined:
riscv64-linux-ld: arch/riscv/kernel/compat_syscall_table.o:(.rodata+0x6f8):
undefined reference to `compat_sys_fadvise64_64'
Add 'fadvise64_64' to kernel/sys_ni.c as a conditional COMPAT function so
that when CONFIG_ADVISE_SYSCALLS is not set, there is a fallback function
available.
Link: https://lkml.kernel.org/r/[email protected]
Fixes: d3ac21cacc24 ("mm: Support compiling out madvise and fadvise")
Signed-off-by: Randy Dunlap <[email protected]>
Suggested-by: Arnd Bergmann <[email protected]>
Reviewed-by: Arnd Bergmann <[email protected]>
Cc: Josh Triplett <[email protected]>
Cc: Paul Walmsley <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
|