Age | Commit message (Collapse) | Author | Files | Lines |
|
No internal process rule is enforced by cgroup_migrate_prepare_dst()
during process migration. It tests whether the target cgroup's
->child_subsys_mask is zero which is different from "subtree_control"
write path which tests ->subtree_control. This hasn't mattered
because up until now, both ->child_subsys_mask and ->subtree_control
are zero or non-zero at the same time. However, with the planned
addition of implicit controllers, this will no longer be true.
This patch prepares for the change by making
cgorup_migrate_prepare_dst() test ->subtree_control instead.
Signed-off-by: Tejun Heo <[email protected]>
|
|
The function currently returns -EBADF for a directory on the default
hierarchy. Make it also recognize cgroup2_fs_type. This will be used
for perf_event cgroup2 support.
Signed-off-by: Tejun Heo <[email protected]>
|
|
Signed-off-by: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
|
|
These var names are unnecessarily unwiedly and another similar
variable will be added. Let's shorten them.
Signed-off-by: Tejun Heo <[email protected]>
|
|
Conflicts:
drivers/net/phy/bcm7xxx.c
drivers/net/phy/marvell.c
drivers/net/vxlan.c
All three conflicts were cases of simple overlapping changes.
Signed-off-by: David S. Miller <[email protected]>
|
|
After the recent do_each_subsys_mask() conversion, there's no reason
to use ulong for subsystem masks. We'll be adding more subsystem
masks to persistent data structures, let's reduce its size to u16
which should be enough for now and the foreseeable future.
This doesn't create any noticeable behavior differences.
v2: Johannes spotted that the initial patch missed cgroup_no_v1_mask.
Converted.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
|
|
There are several places in cgroup_subtree_control_write() which can
use do_each_subsys_mask() instead of manual mask testing. Use it.
No functional changes.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
|
|
for_each_subsys_which() allows iterating subsystems specified in a
subsystem bitmask; unfortunately, it requires the mask to be an
unsigned long l-value which can be inconvenient and makes it awkward
to use a smaller type for subsystem masks.
This patch converts for_each_subsy_which() to do-while style which
allows it to drop the l-value requirement. The new iterator is named
do_each_subsys_mask() / while_each_subsys_mask().
Signed-off-by: Tejun Heo <[email protected]>
Cc: Aleksa Sarai <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
|
|
For consistency with cgroup->subtree_control.
* cgroup->child_subsys_mask -> cgroup->subtree_ss_mask
* cgroup_calc_child_subsys_mask() -> cgroup_calc_subtree_ss_mask()
* cgroup_refresh_child_subsys_mask() -> cgroup_refresh_subtree_ss_mask()
No functional changes.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
|
|
This reverts commit 56c807ba4e91f0980567b6a69de239677879b17f.
cgroup_subsys->css_e_css_changed() was supposed to be used by cgroup
writeback support; however, the change to per-inode cgroup association
made it unnecessary and the callback doesn't have any user. Remove
it.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
|
|
cgroup_addrm_files() incorrectly returned 0 after add failure. Fix
it.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"Two more small fixes.
One is by Yang Shi who added a READ_ONCE_NOCHECK() to the scan of the
stack made by the stack tracer. As the stack tracer scans the entire
kernel stack, KASAN triggers seeing it as a "stack out of bounds"
error. As the scan is looking at the contents of the stack from
parent functions. The NOCHECK() tells KASAN that this is done on
purpose, and is not some kind of stack overflow.
The second fix is to the ftrace selftests, to retrieve the PID of
executed commands from the shell with '$!' and not by parsing 'jobs'"
* tag 'trace-fixes-v4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing, kasan: Silence Kasan warning in check_stack of stack_tracer
ftracetest: Fix instance test to use proper shell command for pids
|
|
mappings
It may be useful to debug writes to the readonly sections of memory,
so provide a cmdline "rodata=off" to allow for this. This can be
expanded in the future to support "log" and "write" modes, but that
will need to be architecture-specific.
This also makes KDB software breakpoints more usable, as read-only
mappings can now be disabled on any kernel.
Suggested-by: H. Peter Anvin <[email protected]>
Signed-off-by: Kees Cook <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: David Brown <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: Emese Revfy <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mathias Krause <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: PaX Team <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: [email protected]
Cc: linux-arch <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Currently, when we pass a buffer from the eBPF stack into a helper
function, the function proto indicates argument types as ARG_PTR_TO_STACK
and ARG_CONST_STACK_SIZE pair. If R<X> contains the former, then R<X+1>
must be of the latter type. Then, verifier checks whether the buffer
points into eBPF stack, is initialized, etc. The verifier also guarantees
that the constant value passed in R<X+1> is greater than 0, so helper
functions don't need to test for it and can always assume a non-NULL
initialized buffer as well as non-0 buffer size.
This patch adds a new argument types ARG_CONST_STACK_SIZE_OR_ZERO that
allows to also pass NULL as R<X> and 0 as R<X+1> into the helper function.
Such helper functions, of course, need to be able to handle these cases
internally then. Verifier guarantees that either R<X> == NULL && R<X+1> == 0
or R<X> != NULL && R<X+1> != 0 (like the case of ARG_CONST_STACK_SIZE), any
other combinations are not possible to load.
I went through various options of extending the verifier, and introducing
the type ARG_CONST_STACK_SIZE_OR_ZERO seems to have most minimal changes
needed to the verifier.
Signed-off-by: Daniel Borkmann <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Replace copy_file_from_fd() with kernel_read_file_from_fd().
Two new identifiers named READING_KEXEC_IMAGE and READING_KEXEC_INITRAMFS
are defined for measuring, appraising or auditing the kexec image and
initramfs.
Changelog v3:
- return -EBADF, not -ENOEXEC
- identifier change
- split patch, moving copy_file_from_fd() to a separate patch
- split patch, moving IMA changes to a separate patch
v0:
- use kstat file size type loff_t, not size_t
- Calculate the file hash from the in memory buffer - Dave Young
Signed-off-by: Mimi Zohar <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Luis R. Rodriguez <[email protected]>
Cc: Eric Biederman <[email protected]>
Acked-by: Dave Young <[email protected]>
|
|
Replace copy_module_from_fd() with kernel_read_file_from_fd().
Although none of the upstreamed LSMs define a kernel_module_from_file
hook, IMA is called, based on policy, to prevent unsigned kernel modules
from being loaded by the original kernel module syscall and to
measure/appraise signed kernel modules.
The security function security_kernel_module_from_file() was called prior
to reading a kernel module. Preventing unsigned kernel modules from being
loaded by the original kernel module syscall remains on the pre-read
kernel_read_file() security hook. Instead of reading the kernel module
twice, once for measuring/appraising and again for loading the kernel
module, the signature validation is moved to the kernel_post_read_file()
security hook.
This patch removes the security_kernel_module_from_file() hook and security
call.
Signed-off-by: Mimi Zohar <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Luis R. Rodriguez <[email protected]>
Cc: Rusty Russell <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"A handful of CPU hotplug related fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Plug potential memory leak in CPU_UP_PREPARE
perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug state
perf/core: Remove bogus UP_CANCELED hotplug state
perf/x86/amd/uncore: Plug reference leak
|
|
In __request_region, if a conflict with a BUSY and MUXED resource is
detected, then the caller goes to sleep and waits for the resource to be
released. A pointer on the conflicting resource is kept. At wake-up
this pointer is used as a parent to retry to request the region.
A first problem is that this pointer might well be invalid (if for
example the conflicting resource have already been freed). Another
problem is that the next call to __request_region() fails to detect a
remaining conflict. The previously conflicting resource is passed as a
parameter and __request_region() will look for a conflict among the
children of this resource and not at the resource itself. It is likely
to succeed anyway, even if there is still a conflict.
Instead, the parent of the conflicting resource should be passed to
__request_region().
As a fix, this patch doesn't update the parent resource pointer in the
case we have to wait for a muxed region right after.
Reported-and-tested-by: Vincent Pelletier <[email protected]>
Signed-off-by: Simon Guinot <[email protected]>
Tested-by: Vincent Donnefort <[email protected]>
Cc: [email protected]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
add new map type to store stack traces and corresponding helper
bpf_get_stackid(ctx, map, flags) - walk user or kernel stack and return id
@ctx: struct pt_regs*
@map: pointer to stack_trace map
@flags: bits 0-7 - numer of stack frames to skip
bit 8 - collect user stack instead of kernel
bit 9 - compare stacks by hash only
bit 10 - if two different stacks hash into the same stackid
discard old
other bits - reserved
Return: >= 0 stackid on success or negative error
stackid is a 32-bit integer handle that can be further combined with
other data (including other stackid) and used as a key into maps.
Userspace will access stackmap using standard lookup/delete syscall commands to
retrieve full stack trace for given stackid.
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
. avoid walking the stack when there is no room left in the buffer
. generalize get_perf_callchain() to be called from bpf helper
Signed-off-by: Alexei Starovoitov <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Merge fixes from Andrew Morton:
"10 fixes"
* emailed patches from Andrew Morton <[email protected]>:
mm: slab: free kmem_cache_node after destroy sysfs file
ipc/shm: handle removed segments gracefully in shm_mmap()
MAINTAINERS: update Kselftest Framework mailing list
devm_memremap_release(): fix memremap'd addr handling
mm/hugetlb.c: fix incorrect proc nr_hugepages value
mm, x86: fix pte_page() crash in gup_pte_range()
fsnotify: turn fsnotify reaper thread into a workqueue job
Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread"
mm: fix regression in remap_file_pages() emulation
thp, dax: do not try to withdraw pgtable from non-anon VMA
|
|
bpf_percpu_hash_update() expects rcu lock to be held and warns if it's not,
which pointed out a missing rcu read lock.
Fixes: 15a07b338 ("bpf: add lookup/update support for per-cpu hash and array maps")
Signed-off-by: Sasha Levin <[email protected]>
Acked-by: Alexei Starovoitov <[email protected]>
Acked-by: Daniel Borkmann <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
When enabling stack trace via "echo 1 > /proc/sys/kernel/stack_tracer_enabled",
the below KASAN warning is triggered:
BUG: KASAN: stack-out-of-bounds in check_stack+0x344/0x848 at addr ffffffc0689ebab8
Read of size 8 by task ksoftirqd/4/29
page:ffffffbdc3a27ac0 count:0 mapcount:0 mapping: (null) index:0x0
flags: 0x0()
page dumped because: kasan: bad access detected
CPU: 4 PID: 29 Comm: ksoftirqd/4 Not tainted 4.5.0-rc1 #129
Hardware name: Freescale Layerscape 2085a RDB Board (DT)
Call trace:
[<ffffffc000091300>] dump_backtrace+0x0/0x3a0
[<ffffffc0000916c4>] show_stack+0x24/0x30
[<ffffffc0009bbd78>] dump_stack+0xd8/0x168
[<ffffffc000420bb0>] kasan_report_error+0x6a0/0x920
[<ffffffc000421688>] kasan_report+0x70/0xb8
[<ffffffc00041f7f0>] __asan_load8+0x60/0x78
[<ffffffc0002e05c4>] check_stack+0x344/0x848
[<ffffffc0002e0c8c>] stack_trace_call+0x1c4/0x370
[<ffffffc0002af558>] ftrace_ops_no_ops+0x2c0/0x590
[<ffffffc00009f25c>] ftrace_graph_call+0x0/0x14
[<ffffffc0000881bc>] fpsimd_thread_switch+0x24/0x1e8
[<ffffffc000089864>] __switch_to+0x34/0x218
[<ffffffc0011e089c>] __schedule+0x3ac/0x15b8
[<ffffffc0011e1f6c>] schedule+0x5c/0x178
[<ffffffc0001632a8>] smpboot_thread_fn+0x350/0x960
[<ffffffc00015b518>] kthread+0x1d8/0x2b0
[<ffffffc0000874d0>] ret_from_fork+0x10/0x40
Memory state around the buggy address:
ffffffc0689eb980: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4
ffffffc0689eba00: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
>ffffffc0689eba80: 00 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00
^
ffffffc0689ebb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffffc0689ebb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
The stacker tracer traverses the whole kernel stack when saving the max stack
trace. It may touch the stack red zones to cause the warning. So, just disable
the instrumentation to silence the warning.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Shi <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching
Pull livepatching fixes from Jiri Kosina:
- regression (from 4.4) fix for ordering issue, introduced by an
earlier ftrace change, that broke live patching of modules.
The fix replaces the ftrace module notifier by direct call in order
to make the ordering guaranteed and well-defined. The patch, from
Jessica Yu, has been acked both by Steven and Rusty
- error message fix from Miroslav Benes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
ftrace/module: remove ftrace module notifier
livepatch: change the error message in asm/livepatch.h header files
|
|
The pmem driver calls devm_memremap() to map a persistent memory range.
When the pmem driver is unloaded, this memremap'd range is not released
so the kernel will leak a vma.
Fix devm_memremap_release() to handle a given memremap'd address
properly.
Signed-off-by: Toshi Kani <[email protected]>
Acked-by: Dan Williams <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Ross Zwisler <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
alloc_cgroup_ns() returns an ERR_PTR value on error but
copy_cgroup_ns() was checking for NULL for error. Fix it.
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Dan Carpenter <[email protected]>
|
|
A protection key fault is very similar to any other access error.
There must be a VMA, etc... We even want to take the same action
(SIGSEGV) that we do with a normal access fault.
However, we do need to let userspace know that something is
different. We do this the same way what we did with SEGV_BNDERR
with Memory Protection eXtensions (MPX): define a new SEGV code:
SEGV_PKUERR.
We add a siginfo field: si_pkey that reveals to userspace which
protection key was set on the PTE that we faulted on. There is
no other easy way for userspace to figure this out. They could
parse smaps but that would be a bit cruel.
We share space with in siginfo with _addr_bnd. #BR faults from
MPX are completely separate from page faults (#PF) that trigger
from protection key violations, so we never need both at the same
time.
Note that _pkey is a 64-bit value. The current hardware only
supports 4-bit protection keys. We do this because there is
_plenty_ of space in _sigfault and it is possible that future
processors would support more than 4 bits of protection keys.
The x86 code to actually fill in the siginfo is in the next
patch.
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Amanieu d'Antras <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Richard Weinberger <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Sasha Levin <[email protected]>
Cc: Vegard Nossum <[email protected]>
Cc: Vladimir Davydov <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The init_name property of the device struct is sort of a hack and should
only be used for statically allocated devices. Since the device is
dynamically allocated here it is safe to use the proper way to set a
devices name by calling dev_set_name().
Signed-off-by: Lars-Peter Clausen <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Remove the ftrace module notifier in favor of directly calling
ftrace_module_enable() and ftrace_release_mod() in the module loader.
Hard-coding the function calls directly in the module loader removes
dependence on the module notifier call chain and provides better
visibility and control over what gets called when, which is important
to kernel utilities such as livepatch.
This fixes a notifier ordering issue in which the ftrace module notifier
(and hence ftrace_module_enable()) for coming modules was being called
after klp_module_notify(), which caused livepatch modules to initialize
incorrectly. This patch removes dependence on the module notifier call
chain in favor of hard coding the corresponding function calls in the
module loader. This ensures that ftrace and livepatch code get called in
the correct order on patch module load and unload.
Fixes: 5156dca34a3e ("ftrace: Fix the race between ftrace and insmod")
Signed-off-by: Jessica Yu <[email protected]>
Reviewed-by: Steven Rostedt <[email protected]>
Reviewed-by: Petr Mladek <[email protected]>
Acked-by: Rusty Russell <[email protected]>
Reviewed-by: Josh Poimboeuf <[email protected]>
Reviewed-by: Miroslav Benes <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
When dealing with key handling for shared futexes, we can drastically reduce
the usage/need of the page lock. 1) For anonymous pages, the associated futex
object is the mm_struct which does not require the page lock. 2) For inode
based, keys, we can check under RCU read lock if the page mapping is still
valid and take reference to the inode. This just leaves one rare race that
requires the page lock in the slow path when examining the swapcache.
Additionally realtime users currently have a problem with the page lock being
contended for unbounded periods of time during futex operations.
Task A
get_futex_key()
lock_page()
---> preempted
Now any other task trying to lock that page will have to wait until
task A gets scheduled back in, which is an unbound time.
With this patch, we pretty much have a lockless futex_get_key().
Experiments show that this patch can boost/speedup the hashing of shared
futexes with the perf futex benchmarks (which is good for measuring such
change) by up to 45% when there are high (> 100) thread counts on a 60 core
Westmere. Lower counts are pretty much in the noise range or less than 10%,
but mid range can be seen at over 30% overall throughput (hash ops/sec).
This makes anon-mem shared futexes much closer to its private counterpart.
Signed-off-by: Mel Gorman <[email protected]>
[ Ported on top of thp refcount rework, changelog, comments, fixes. ]
Signed-off-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Ingo suggested we rename how we reference barriers A and B
regarding futex ordering guarantees. This patch replaces,
for both barriers, MB (A) with smp_mb(); (A), such that:
- We explicitly state that the barriers are SMP, and
- We standardize how we reference these across futex.c
helping readers follow what barrier does what and where.
Suggested-by: Ingo Molnar <[email protected]>
Signed-off-by: Davidlohr Bueso <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Chris Mason <[email protected]>
Cc: Darren Hart <[email protected]>
Cc: Hugh Dickins <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Sebastian Andrzej Siewior <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
No functional change, just less confusing to read.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Signed-off-by: Ingo Molnar <[email protected]>
|
|
If CPU_UP_PREPARE is called it is not guaranteed, that a previously allocated
and assigned hash has been freed already, but perf_event_init_cpu()
unconditionally allocates and assignes a new hash if the swhash is referenced.
By overwriting the pointer the existing hash is not longer accessible.
Verify that there is no hash assigned on this cpu before allocating and
assigning a new one.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
If CPU_DOWN_PREPARE fails the perf hotplug notifier is called for
CPU_DOWN_FAILED and calls perf_event_init_cpu(), which checks whether the
swhash is referenced. If yes it allocates a new hash and stores the pointer in
the per cpu data structure.
But at this point the cpu is still online, so there must be a valid hash
already. By overwriting the pointer the existing hash is not longer
accessible.
Remove the CPU_DOWN_FAILED state, as there is nothing to (re)allocate.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
If CPU_UP_PREPARE fails the perf hotplug code calls perf_event_exit_cpu(),
which is a pointless exercise. The cpu is not online, so the smp function
calls return -ENXIO. So the result is a list walk to call noops.
Remove it.
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Stephane Eranian <[email protected]>
Cc: Vince Weaver <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
It's "too much" not "to much".
Signed-off-by: Steven Rostedt <[email protected]>
Acked-by: Juri Lelli <[email protected]>
Cc: Jiri Kosina <[email protected]>
Cc: Juri Lelli <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Remove an unnecessary assignment of variable not used any more.
( This has no runtime effects as GCC is smart enough to optimize
this out. )
Signed-off-by: Byungchul Park <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
[ Edited the changelog. ]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
allowing root in a non-init user namespace to mount it. This should
now be safe, because
1. non-init-root cannot mount a previously unbound subsystem
2. the task doing the mount must be privileged with respect to the
user namespace owning the cgroup namespace
3. the mounted subsystem will have its current cgroup as the root dentry.
the permissions will be unchanged, so tasks will receive no new
privilege over the cgroups which they did not have on the original
mounts.
Signed-off-by: Serge Hallyn <[email protected]>
|
|
This patch enables cgroup mounting inside userns when a process
as appropriate privileges. The cgroup filesystem mounted is
rooted at the cgroupns-root. Thus, in a container-setup, only
the hierarchy under the cgroupns-root is exposed inside the container.
This allows container management tools to run inside the containers
without depending on any global state.
Signed-off-by: Serge Hallyn <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
setns on a cgroup namespace is allowed only if
task has CAP_SYS_ADMIN in its current user-namespace and
over the user-namespace associated with target cgroupns.
No implicit cgroup changes happen with attaching to another
cgroupns. It is expected that the somone moves the attaching
process under the target cgroupns-root.
Signed-off-by: Aditya Kali <[email protected]>
Signed-off-by: Serge E. Hallyn <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Introduce the ability to create new cgroup namespace. The newly created
cgroup namespace remembers the cgroup of the process at the point
of creation of the cgroup namespace (referred as cgroupns-root).
The main purpose of cgroup namespace is to virtualize the contents
of /proc/self/cgroup file. Processes inside a cgroup namespace
are only able to see paths relative to their namespace root
(unless they are moved outside of their cgroupns-root, at which point
they will see a relative path from their cgroupns-root).
For a correctly setup container this enables container-tools
(like libcontainer, lxc, lmctfy, etc.) to create completely virtualized
containers without leaking system level cgroup hierarchy to the task.
This patch only implements the 'unshare' part of the cgroupns.
Signed-off-by: Aditya Kali <[email protected]>
Signed-off-by: Serge Hallyn <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
For protection keys, we need to understand whether protections
should be enforced in software or not. In general, we enforce
protections when working on our own task, but not when on others.
We call these "current" and "remote" operations.
This patch introduces a new get_user_pages() variant:
get_user_pages_remote()
Which is a replacement for when get_user_pages() is called on
non-current tsk/mm.
We also introduce a new gup flag: FOLL_REMOTE which can be used
for the "__" gup variants to get this new behavior.
The uprobes is_trap_at_addr() location holds mmap_sem and
calls get_user_pages(current->mm) on an instruction address. This
makes it a pretty unique gup caller. Being an instruction access
and also really originating from the kernel (vs. the app), I opted
to consider this a 'remote' access where protection keys will not
be enforced.
Without protection keys, this patch should not change any behavior.
Signed-off-by: Dave Hansen <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Denys Vlasenko <[email protected]>
Cc: H. Peter Anvin <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Naoya Horiguchi <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Srikar Dronamraju <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This patch fix spelling typos found in printk and Kconfig.
Signed-off-by: Masanari Iida <[email protected]>
Acked-by: Randy Dunlap <[email protected]>
Signed-off-by: Jiri Kosina <[email protected]>
|
|
The irq code browses the list of actions differently to inspect the element
one by one. Even if it is not a problem, for the sake of consistent code,
provide a macro similar to for_each_irq_desc in order to have the same loop to
go through the actions list and use it in the code.
[ tglx: Renamed the macro ]
Signed-off-by: Daniel Lezcano <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
We want the fixes in here, and this resolves a merge error in tty_io.c
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull lockdep fix from Thomas Gleixner:
"A single fix for the stack trace caching logic in lockdep, where the
duplicate avoidance managed to store no back trace at all"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/lockdep: Fix stack trace caching logic
|
|
It simplifies it and allows wide kick to be performed, even when IRQs
are disabled, without an asynchronous level in the middle.
This comes at a cost of some more overhead on features like perf and
posix cpu timers slow-paths, which is probably not much important
for nohz full users.
Requested-by: Peter Zijlstra <[email protected]>
Reviewed-by: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
Export fetch_or() that's implemented and used internally by the
scheduler. We are going to use it for NO_HZ so make it generally
available.
Reviewed-by: Chris Metcalf <[email protected]>
Cc: Christoph Lameter <[email protected]>
Cc: Chris Metcalf <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Luiz Capitulino <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rik van Riel <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Viresh Kumar <[email protected]>
Signed-off-by: Frederic Weisbecker <[email protected]>
|
|
Conflicts:
kernel/locking/lockdep.c
Signed-off-by: Ingo Molnar <[email protected]>
|