Age | Commit message (Collapse) | Author | Files | Lines |
|
For now, the io_uring mediation is limited to sqpoll and
override_creds.
Signed-off-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Unprivileged user namespace creation is often used as a first step
in privilege escalation attacks. Instead of disabling it at the
sysrq level, which blocks its legitimate use as for setting up a sandbox,
allow control on a per domain basis.
This allows an admin to quickly lock down a system while also still
allowing legitimate use.
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
unprivileged unconfined can use change_profile to alter the confinement
set by the mac admin.
Allow restricting unprivileged unconfined by still allowing change_profile
but stacking the change against unconfined. This allows unconfined to
still apply system policy but allows the task to enter the new confinement.
If unprivileged unconfined is required a sysctl is provided to switch
to the previous behavior.
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
While disconnected.path has been available for a while it was never
properly advertised as a feature. Fix this so that userspace doesn't
need special casing to handle it.
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
With the move to permission tables the dfa is no longer a stand
alone entity when used, needing a minimum of a permission table.
However it still could be shared among different pdbs each using
a different permission table.
Instead of duping the permission table when sharing a pdb, add a
refcount to the pdb so it can be easily shared.
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Improve policy load failure messages by identifying which dfa the
verification check failed in.
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
The cred is needed to properly audit some messages, and will be needed
in the future for uid conditional mediation. So pass it through to
where the apparmor_audit_data struct gets defined.
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
rename audit_data's label field to subj_label to better reflect its
use. Also at the same time drop unneeded assignments to ->subj_label
as the later call to aa_check_perms will do the assignment if needed.
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Everywhere where common_audit_data is used apparmor audit_data is also
used. We can simplify the code and drop the use of the aad macro
everywhere by combining the two structures.
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
In preparation for LSM stacking rework the macro to an inline fn
Reviewed-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
When running will-it-scale[1] open2_process testcase, in a system with a
large number of cores, a bottleneck in retrieving the current task
secid was detected:
27.73% ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined)
27.72% 0.01% [kernel.vmlinux] [k] security_current_getsecid_subj - -
27.71% security_current_getsecid_subj;ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined)
27.71% 27.68% [kernel.vmlinux] [k] apparmor_current_getsecid_subj - -
19.94% __refcount_add (inlined);__refcount_inc (inlined);refcount_inc (inlined);kref_get (inlined);aa_get_label (inlined);aa_get_label (inlined);aa_get_current_label (inlined);apparmor_current_getsecid_subj;security_current_getsecid_subj;ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined)
7.72% __refcount_sub_and_test (inlined);__refcount_dec_and_test (inlined);refcount_dec_and_test (inlined);kref_put (inlined);aa_put_label (inlined);aa_put_label (inlined);apparmor_current_getsecid_subj;security_current_getsecid_subj;ima_file_check;do_open (inlined);path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_x64 (inlined);do_syscall_64;entry_SYSCALL_64_after_hwframe (inlined);__libc_open64 (inlined)
A large amount of time was spent in the refcount.
The most common case is that the current task label is available, and
no need to take references for that one. That is exactly what the
critical section helpers do, make use of them.
New perf output:
39.12% vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined)
39.07% 0.13% [kernel.vmlinux] [k] do_dentry_open - -
39.05% do_dentry_open;vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined)
38.71% 0.01% [kernel.vmlinux] [k] security_file_open - -
38.70% security_file_open;do_dentry_open;vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined)
38.65% 38.60% [kernel.vmlinux] [k] apparmor_file_open - -
38.65% apparmor_file_open;security_file_open;do_dentry_open;vfs_open;path_openat;do_filp_open;do_sys_openat2;__x64_sys_openat;do_syscall_64;entry_SYSCALL_64_after_hwframe;__libc_open64 (inlined)
The result is a throughput improvement of around 20% across the board
on the open2 testcase. On more realistic workloads the impact should
be much less.
[1] https://github.com/antonblanchard/will-it-scale
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
These functions are not used now, remove them.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
The whole function is guarded by CONFIG_SECURITY_APPARMOR_EXPORT_BINARY,
so the #ifdef here is redundant, remove it.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
profile->disconnected was storing an invalid reference to the
disconnected path. Fix it by duplicating the string using
aa_unpack_strdup and freeing accordingly.
Fixes: 72c8a768641d ("apparmor: allow profiles to provide info to disconnected paths")
Signed-off-by: Georgia Garcia <georgia.garcia@canonical.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
The last usage of PROF_{ADD,REPLACE} were removed by commit 18e99f191a8e
("apparmor: provide finer control over policy management"). So remove
these two unused macros.
Signed-off-by: GONG, Ruiqi <gongruiqi1@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
After changes in commit 33bf60cabcc7 ("LSM: Infrastructure management of
the file security"), aa_alloc_file_ctx() and aa_free_file_ctx() are no
longer used, so remove them, and also remove aa_get_file_label() because
it seems that it's never been used before.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
The implementions of these declarations do not exist, remove them all.
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
[PATCH -next 05/11] apparmor: Fix kernel-doc warnings in apparmor/label.c
missed updating the Returns comment for the new parameter names
[PATCH -next 05/11] apparmor: Fix kernel-doc warnings in apparmor/label.c
Added the @size parameter comment without mentioning it is a return
value.
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Fix kernel-doc warnings:
security/apparmor/policy.c:294: warning: Function parameter or
member 'proxy' not described in 'aa_alloc_profile'
security/apparmor/policy.c:785: warning: Function parameter or
member 'label' not described in 'aa_policy_view_capable'
security/apparmor/policy.c:785: warning: Function parameter or
member 'ns' not described in 'aa_policy_view_capable'
security/apparmor/policy.c:847: warning: Function parameter or
member 'ns' not described in 'aa_may_manage_policy'
security/apparmor/policy.c:964: warning: Function parameter or
member 'hname' not described in '__lookup_replace'
security/apparmor/policy.c:964: warning: Function parameter or
member 'info' not described in '__lookup_replace'
security/apparmor/policy.c:964: warning: Function parameter or
member 'noreplace' not described in '__lookup_replace'
security/apparmor/policy.c:964: warning: Function parameter or
member 'ns' not described in '__lookup_replace'
security/apparmor/policy.c:964: warning: Function parameter or
member 'p' not described in '__lookup_replace'
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Fix kernel-doc warnings:
security/apparmor/policy_compat.c:151: warning: Function parameter
or member 'size' not described in 'compute_fperms'
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Fix kernel-doc warnings:
security/apparmor/policy_unpack.c:1173: warning: Function parameter
or member 'table_size' not described in 'verify_dfa_accept_index'
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Fix kernel-doc warnings:
security/apparmor/resource.c:111: warning: Function parameter or
member 'label' not described in 'aa_task_setrlimit'
security/apparmor/resource.c:111: warning: Function parameter or
member 'new_rlim' not described in 'aa_task_setrlimit'
security/apparmor/resource.c:111: warning: Function parameter or
member 'resource' not described in 'aa_task_setrlimit'
security/apparmor/resource.c:111: warning: Function parameter or
member 'task' not described in 'aa_task_setrlimit'
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Fix kernel-doc warnings:
security/apparmor/match.c:148: warning: Function parameter or member
'tables' not described in 'verify_table_headers'
security/apparmor/match.c:289: warning: Excess function parameter
'kr' description in 'aa_dfa_free_kref'
security/apparmor/match.c:289: warning: Function parameter or member
'kref' not described in 'aa_dfa_free_kref'
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Fix kernel-doc warnings:
security/apparmor/lib.c:33: warning: Excess function parameter
'str' description in 'aa_free_str_table'
security/apparmor/lib.c:33: warning: Function parameter or member
't' not described in 'aa_free_str_table'
security/apparmor/lib.c:94: warning: Function parameter or
member 'n' not described in 'skipn_spaces'
security/apparmor/lib.c:390: warning: Excess function parameter
'deny' description in 'aa_check_perms'
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Fix kernel-doc warnings:
security/apparmor/label.c:166: warning: Excess function parameter
'n' description in 'vec_cmp'
security/apparmor/label.c:166: warning: Excess function parameter
'vec' description in 'vec_cmp'
security/apparmor/label.c:166: warning: Function parameter or member
'an' not described in 'vec_cmp'
security/apparmor/label.c:166: warning: Function parameter or member
'bn' not described in 'vec_cmp'
security/apparmor/label.c:166: warning: Function parameter or member
'b' not described in 'vec_cmp'
security/apparmor/label.c:2051: warning: Function parameter or member
'label' not described in '__label_update'
security/apparmor/label.c:266: warning: Function parameter or member
'flags' not described in 'aa_vec_unique'
security/apparmor/label.c:594: warning: Excess function parameter
'l' description in '__label_remove'
security/apparmor/label.c:594: warning: Function parameter or member
'label' not described in '__label_remove'
security/apparmor/label.c:929: warning: Function parameter or member
'label' not described in 'aa_label_insert'
security/apparmor/label.c:929: warning: Function parameter or member
'ls' not described in 'aa_label_insert'
security/apparmor/label.c:1221: warning: Excess function parameter
'ls' description in 'aa_label_merge'
security/apparmor/label.c:1302: warning: Excess function parameter
'start' description in 'label_compound_match'
security/apparmor/label.c:1302: warning: Function parameter or member
'rules' not described in 'label_compound_match'
security/apparmor/label.c:1302: warning: Function parameter or member
'state' not described in 'label_compound_match'
security/apparmor/label.c:2051: warning: Function parameter or member
'label' not described in '__label_update'
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Fix kernel-doc warnings:
security/apparmor/file.c:177: warning: Excess function parameter
'dfa' description in 'aa_lookup_fperms'
security/apparmor/file.c:177: warning: Function parameter or member
'file_rules' not described in 'aa_lookup_fperms'
security/apparmor/file.c:202: warning: Excess function parameter
'dfa' description in 'aa_str_perms'
security/apparmor/file.c:202: warning: Excess function parameter
'state' description in 'aa_str_perms'
security/apparmor/file.c:202: warning: Function parameter or member
'file_rules' not described in 'aa_str_perms'
security/apparmor/file.c:202: warning: Function parameter or member
'start' not described in 'aa_str_perms'
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Fix kernel-doc warnings:
security/apparmor/domain.c:279: warning: Function parameter or
member 'perms' not described in 'change_profile_perms'
security/apparmor/domain.c:380: warning: Function parameter or
member 'bprm' not described in 'find_attach'
security/apparmor/domain.c:380: warning: Function parameter or
member 'head' not described in 'find_attach'
security/apparmor/domain.c:380: warning: Function parameter or
member 'info' not described in 'find_attach'
security/apparmor/domain.c:380: warning: Function parameter or
member 'name' not described in 'find_attach'
security/apparmor/domain.c:558: warning: Function parameter or
member 'info' not described in 'x_to_label'
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Fix kernel-doc warnings:
security/apparmor/capability.c:45: warning: Function parameter
or member 'ab' not described in 'audit_cb'
security/apparmor/capability.c:45: warning: Function parameter
or member 'va' not described in 'audit_cb'
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Fix kernel-doc warnings:
security/apparmor/audit.c:150: warning: Function parameter or
member 'type' not described in 'aa_audit_msg'
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
These allocations should use the gfp flags from the caller instead of
GFP_KERNEL. But from what I can see, all the callers pass in GFP_KERNEL
so this does not affect runtime.
Fixes: e31dd6e412f7 ("apparmor: fix: kzalloc perms tables for shared dfas")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Userspace won't load policy using extended perms unless it knows the
kernel can handle them. Advertise that extended perms are supported in
the feature set.
Signed-off-by: John Johansen <john.johansen@canonical.com>
Reviewed-by: Jon Tourville <jontourville@me.com>
|
|
SOCK_ctx() doesn't seem to be used anywhere in the code, so remove it.
Signed-off-by: GONG, Ruiqi <gongruiqi@huaweicloud.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
Change the return type to void since it always return 0, and no need
to do the checking in aa_set_current_onexec.
Signed-off-by: Quanfa Fu <quanfafu@gmail.com>
Reviewed-by: "Tyler Hicks (Microsoft)" <code@tyhicks.com>
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
|
|
We just sorted the entries and fields last release, so just out of a
perverse sense of curiosity, I decided to see if we can keep things
ordered for even just one release.
The answer is "No. No we cannot".
I suggest that all kernel developers will need weekly training sessions,
involving a lot of Big Bird and Sesame Street. And at the yearly
maintainer summit, we will all sing the alphabet song together.
I doubt I will keep doing this. At some point "perverse sense of
curiosity" turns into just a cold dark place filled with sadness and
despair.
Repeats: 80e62bc8487b ("MAINTAINERS: re-sort all entries and fields")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.infradead.org/users/hch/dma-mapping
Pull dma-mapping fixes from Christoph Hellwig:
- swiotlb area sizing fixes (Petr Tesarik)
* tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: reduce the number of areas to match actual memory pool size
swiotlb: always set the number of areas before allocating the pool
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq update from Borislav Petkov:
- Optimize IRQ domain's name assignment
* tag 'irq_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqdomain: Use return value of strreplace()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu fix from Borislav Petkov:
- Do FPU AP initialization on Xen PV too which got missed by the recent
boot reordering work
* tag 'x86_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/xen: Fix secondary processors' FPU initialization
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Thomas Gleixner:
"A single fix for the mechanism to park CPUs with an INIT IPI.
On shutdown or kexec, the kernel tries to park the non-boot CPUs with
an INIT IPI. But the same code path is also used by the crash utility.
If the CPU which panics is not the boot CPU then it sends an INIT IPI
to the boot CPU which resets the machine.
Prevent this by validating that the CPU which runs the stop mechanism
is the boot CPU. If not, leave the other CPUs in HLT"
* tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Don't send INIT to boot CPU
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Thomas Bogendoerfer:
- fixes for KVM
- fix for loongson build and cpu probing
- DT fixes
* tag 'mips_6.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: kvm: Fix build error with KVM_MIPS_DEBUG_COP0_COUNTERS enabled
MIPS: dts: add missing space before {
MIPS: Loongson: Fix build error when make modules_install
MIPS: KVM: Fix NULL pointer dereference
MIPS: Loongson: Fix cpu_probe_loongson() again
|
|
Pull xfs fix from Darrick Wong:
"Nothing exciting here, just getting rid of a gcc warning that I got
tired of seeing when I turn on gcov"
* tag 'xfs-6.5-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix uninit warning in xfs_growfs_data
|
|
git://git.samba.org/sfrench/cifs-2.6
Pull more smb client updates from Steve French:
- fix potential use after free in unmount
- minor cleanup
- add worker to cleanup stale directory leases
* tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Add a laundromat thread for cached directories
smb: client: remove redundant pointer 'server'
cifs: fix session state transition to avoid use-after-free issue
|
|
Pull NTB updates from Jon Mason:
"Fixes for pci_clean_master, error handling in driver inits, and
various other issues/bugs"
* tag 'ntb-6.5' of https://github.com/jonmason/ntb:
ntb: hw: amd: Fix debugfs_create_dir error checking
ntb.rst: Fix copy and paste error
ntb_netdev: Fix module_init problem
ntb: intel: Remove redundant pci_clear_master
ntb: epf: Remove redundant pci_clear_master
ntb_hw_amd: Remove redundant pci_clear_master
ntb: idt: drop redundant pci_enable_pcie_error_reporting()
MAINTAINERS: git://github -> https://github.com for jonmason
NTB: EPF: fix possible memory leak in pci_vntb_probe()
NTB: ntb_tool: Add check for devm_kcalloc
NTB: ntb_transport: fix possible memory leak while device_register() fails
ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
ntb: idt: Fix error handling in idt_pci_driver_init()
|
|
Lockdep is certainly right to complain about
(&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
but task is already holding lock:
(&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db
Invert those to the usual ordering.
Fixes: 33313a747e81 ("mm: lock newly mapped VMA which can be modified after it becomes visible")
Cc: stable@vger.kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull hotfixes from Andrew Morton:
"16 hotfixes. Six are cc:stable and the remainder address post-6.4
issues"
The merge undoes the disabling of the CONFIG_PER_VMA_LOCK feature, since
it was all hopefully fixed in mainline.
* tag 'mm-hotfixes-stable-2023-07-08-10-43' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
lib: dhry: fix sleeping allocations inside non-preemptable section
kasan, slub: fix HW_TAGS zeroing with slub_debug
kasan: fix type cast in memory_is_poisoned_n
mailmap: add entries for Heiko Stuebner
mailmap: update manpage link
bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
MAINTAINERS: add linux-next info
mailmap: add Markus Schneider-Pargmann
writeback: account the number of pages written back
mm: call arch_swap_restore() from do_swap_page()
squashfs: fix cache race with migration
mm/hugetlb.c: fix a bug within a BUG(): inconsistent pte comparison
docs: update ocfs2-devel mailing list address
MAINTAINERS: update ocfs2-devel mailing list address
mm: disable CONFIG_PER_VMA_LOCK until its fixed
fork: lock VMAs of the parent process when forking
|
|
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().
We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable. For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.
A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.
Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults. But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.
This fix can potentially regress some fork-heavy workloads. Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
mmap_region adds a newly created VMA into VMA tree and might modify it
afterwards before dropping the mmap_lock. This poses a problem for page
faults handled under per-VMA locks because they don't take the mmap_lock
and can stumble on this VMA while it's still being modified. Currently
this does not pose a problem since post-addition modifications are done
only for file-backed VMAs, which are not handled under per-VMA lock.
However, once support for handling file-backed page faults with per-VMA
locks is added, this will become a race.
Fix this by write-locking the VMA before inserting it into the VMA tree.
Other places where a new VMA is added into VMA tree do not modify it
after the insertion, so do not need the same locking.
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
With recent changes necessitating mmap_lock to be held for write while
expanding a stack, per-VMA locks should follow the same rules and be
write-locked to prevent page faults into the VMA being expanded. Add
the necessary locking.
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Pull more SCSI updates from James Bottomley:
"A few late arriving patches that missed the initial pull request. It's
mostly bug fixes (the dt-bindings is a fix for the initial pull)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Remove unused function declaration
scsi: target: docs: Remove tcm_mod_builder.py
scsi: target: iblock: Quiet bool conversion warning with pr_preempt use
scsi: dt-bindings: ufs: qcom: Fix ICE phandle
scsi: core: Simplify scsi_cdl_check_cmd()
scsi: isci: Fix comment typo
scsi: smartpqi: Replace one-element arrays with flexible-array members
scsi: target: tcmu: Replace strlcpy() with strscpy()
scsi: ncr53c8xx: Replace strlcpy() with strscpy()
scsi: lpfc: Fix lpfc_name struct packing
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull more i2c updates from Wolfram Sang:
- xiic patch should have been in the original pull but slipped through
- mpc patch fixes a build regression
- nomadik cleanup
* tag 'i2c-for-6.5-rc1-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mpc: Drop unused variable
i2c: nomadik: Remove a useless call in the remove function
i2c: xiic: Don't try to handle more interrupt events after error
|