aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2016-02-23cgroup: use ->subtree_control when testing no internal process ruleTejun Heo1-2/+2
No internal process rule is enforced by cgroup_migrate_prepare_dst() during process migration. It tests whether the target cgroup's ->child_subsys_mask is zero which is different from "subtree_control" write path which tests ->subtree_control. This hasn't mattered because up until now, both ->child_subsys_mask and ->subtree_control are zero or non-zero at the same time. However, with the planned addition of implicit controllers, this will no longer be true. This patch prepares for the change by making cgorup_migrate_prepare_dst() test ->subtree_control instead. Signed-off-by: Tejun Heo <[email protected]>
2016-02-23cgroup: make css_tryget_online_from_dir() also recognize cgroup2 fsTejun Heo1-2/+3
The function currently returns -EBADF for a directory on the default hierarchy. Make it also recognize cgroup2_fs_type. This will be used for perf_event cgroup2 support. Signed-off-by: Tejun Heo <[email protected]>
2016-02-23cgroup: convert cgroup_subsys flag fields to bool bitfieldsTejun Heo3-3/+3
Signed-off-by: Tejun Heo <[email protected]> Cc: Li Zefan <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]>
2016-02-23cgroup: s/cgrp_dfl_root_/cgrp_dfl_/Tejun Heo1-9/+9
These var names are unnecessarily unwiedly and another similar variable will be added. Let's shorten them. Signed-off-by: Tejun Heo <[email protected]>
2016-02-23Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller12-156/+266
Conflicts: drivers/net/phy/bcm7xxx.c drivers/net/phy/marvell.c drivers/net/vxlan.c All three conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <[email protected]>
2016-02-22cgroup: make cgroup subsystem masks u16Tejun Heo1-26/+24
After the recent do_each_subsys_mask() conversion, there's no reason to use ulong for subsystem masks. We'll be adding more subsystem masks to persistent data structures, let's reduce its size to u16 which should be enough for now and the foreseeable future. This doesn't create any noticeable behavior differences. v2: Johannes spotted that the initial patch missed cgroup_no_v1_mask. Converted. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Johannes Weiner <[email protected]>
2016-02-22cgroup: use do_each_subsys_mask() where applicableTejun Heo1-23/+12
There are several places in cgroup_subtree_control_write() which can use do_each_subsys_mask() instead of manual mask testing. Use it. No functional changes. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Johannes Weiner <[email protected]>
2016-02-22cgroup: convert for_each_subsys_which() to do-while styleTejun Heo1-32/+40
for_each_subsys_which() allows iterating subsystems specified in a subsystem bitmask; unfortunately, it requires the mask to be an unsigned long l-value which can be inconvenient and makes it awkward to use a smaller type for subsystem masks. This patch converts for_each_subsy_which() to do-while style which allows it to drop the l-value requirement. The new iterator is named do_each_subsys_mask() / while_each_subsys_mask(). Signed-off-by: Tejun Heo <[email protected]> Cc: Aleksa Sarai <[email protected]> Acked-by: Johannes Weiner <[email protected]>
2016-02-22cgroup: s/child_subsys_mask/subtree_ss_mask/Tejun Heo1-24/+24
For consistency with cgroup->subtree_control. * cgroup->child_subsys_mask -> cgroup->subtree_ss_mask * cgroup_calc_child_subsys_mask() -> cgroup_calc_subtree_ss_mask() * cgroup_refresh_child_subsys_mask() -> cgroup_refresh_subtree_ss_mask() No functional changes. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Johannes Weiner <[email protected]>
2016-02-22Revert "cgroup: add cgroup_subsys->css_e_css_changed()"Tejun Heo1-18/+0
This reverts commit 56c807ba4e91f0980567b6a69de239677879b17f. cgroup_subsys->css_e_css_changed() was supposed to be used by cgroup writeback support; however, the change to per-inode cgroup association made it unnecessary and the callback doesn't have any user. Remove it. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Johannes Weiner <[email protected]>
2016-02-22cgroup: fix error return value of cgroup_addrm_files()Tejun Heo1-2/+2
cgroup_addrm_files() incorrectly returned 0 after add failure. Fix it. Signed-off-by: Tejun Heo <[email protected]> Acked-by: Johannes Weiner <[email protected]>
2016-02-22Merge tag 'trace-fixes-v4.5-rc5' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Two more small fixes. One is by Yang Shi who added a READ_ONCE_NOCHECK() to the scan of the stack made by the stack tracer. As the stack tracer scans the entire kernel stack, KASAN triggers seeing it as a "stack out of bounds" error. As the scan is looking at the contents of the stack from parent functions. The NOCHECK() tells KASAN that this is done on purpose, and is not some kind of stack overflow. The second fix is to the ftrace selftests, to retrieve the PID of executed commands from the shell with '$!' and not by parsing 'jobs'" * tag 'trace-fixes-v4.5-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing, kasan: Silence Kasan warning in check_stack of stack_tracer ftracetest: Fix instance test to use proper shell command for pids
2016-02-22mm/init: Add 'rodata=off' boot cmdline parameter to disable read-only kernel ↵Kees Cook1-3/+1
mappings It may be useful to debug writes to the readonly sections of memory, so provide a cmdline "rodata=off" to allow for this. This can be expanded in the future to support "log" and "write" modes, but that will need to be architecture-specific. This also makes KDB software breakpoints more usable, as read-only mappings can now be disabled on any kernel. Suggested-by: H. Peter Anvin <[email protected]> Signed-off-by: Kees Cook <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: David Brown <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: Emese Revfy <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mathias Krause <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: PaX Team <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: linux-arch <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-21bpf: add new arg_type that allows for 0 sized stack bufferDaniel Borkmann1-10/+32
Currently, when we pass a buffer from the eBPF stack into a helper function, the function proto indicates argument types as ARG_PTR_TO_STACK and ARG_CONST_STACK_SIZE pair. If R<X> contains the former, then R<X+1> must be of the latter type. Then, verifier checks whether the buffer points into eBPF stack, is initialized, etc. The verifier also guarantees that the constant value passed in R<X+1> is greater than 0, so helper functions don't need to test for it and can always assume a non-NULL initialized buffer as well as non-0 buffer size. This patch adds a new argument types ARG_CONST_STACK_SIZE_OR_ZERO that allows to also pass NULL as R<X> and 0 as R<X+1> into the helper function. Such helper functions, of course, need to be able to handle these cases internally then. Verifier guarantees that either R<X> == NULL && R<X+1> == 0 or R<X> != NULL && R<X+1> != 0 (like the case of ARG_CONST_STACK_SIZE), any other combinations are not possible to load. I went through various options of extending the verifier, and introducing the type ARG_CONST_STACK_SIZE_OR_ZERO seems to have most minimal changes needed to the verifier. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-21kexec: replace call to copy_file_from_fd() with kernel versionMimi Zohar1-64/+9
Replace copy_file_from_fd() with kernel_read_file_from_fd(). Two new identifiers named READING_KEXEC_IMAGE and READING_KEXEC_INITRAMFS are defined for measuring, appraising or auditing the kexec image and initramfs. Changelog v3: - return -EBADF, not -ENOEXEC - identifier change - split patch, moving copy_file_from_fd() to a separate patch - split patch, moving IMA changes to a separate patch v0: - use kstat file size type loff_t, not size_t - Calculate the file hash from the in memory buffer - Dave Young Signed-off-by: Mimi Zohar <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Luis R. Rodriguez <[email protected]> Cc: Eric Biederman <[email protected]> Acked-by: Dave Young <[email protected]>
2016-02-21module: replace copy_module_from_fd with kernel versionMimi Zohar1-60/+8
Replace copy_module_from_fd() with kernel_read_file_from_fd(). Although none of the upstreamed LSMs define a kernel_module_from_file hook, IMA is called, based on policy, to prevent unsigned kernel modules from being loaded by the original kernel module syscall and to measure/appraise signed kernel modules. The security function security_kernel_module_from_file() was called prior to reading a kernel module. Preventing unsigned kernel modules from being loaded by the original kernel module syscall remains on the pre-read kernel_read_file() security hook. Instead of reading the kernel module twice, once for measuring/appraising and again for loading the kernel module, the signature validation is moved to the kernel_post_read_file() security hook. This patch removes the security_kernel_module_from_file() hook and security call. Signed-off-by: Mimi Zohar <[email protected]> Acked-by: Kees Cook <[email protected]> Acked-by: Luis R. Rodriguez <[email protected]> Cc: Rusty Russell <[email protected]>
2016-02-20Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A handful of CPU hotplug related fixes" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Plug potential memory leak in CPU_UP_PREPARE perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug state perf/core: Remove bogus UP_CANCELED hotplug state perf/x86/amd/uncore: Plug reference leak
2016-02-20kernel/resource.c: fix muxed resource handling in __request_region()Simon Guinot1-2/+3
In __request_region, if a conflict with a BUSY and MUXED resource is detected, then the caller goes to sleep and waits for the resource to be released. A pointer on the conflicting resource is kept. At wake-up this pointer is used as a parent to retry to request the region. A first problem is that this pointer might well be invalid (if for example the conflicting resource have already been freed). Another problem is that the next call to __request_region() fails to detect a remaining conflict. The previously conflicting resource is passed as a parameter and __request_region() will look for a conflict among the children of this resource and not at the resource itself. It is likely to succeed anyway, even if there is still a conflict. Instead, the parent of the conflicting resource should be passed to __request_region(). As a fix, this patch doesn't update the parent resource pointer in the case we have to wait for a muxed region right after. Reported-and-tested-by: Vincent Pelletier <[email protected]> Signed-off-by: Simon Guinot <[email protected]> Tested-by: Vincent Donnefort <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2016-02-20bpf: introduce BPF_MAP_TYPE_STACK_TRACEAlexei Starovoitov4-1/+247
add new map type to store stack traces and corresponding helper bpf_get_stackid(ctx, map, flags) - walk user or kernel stack and return id @ctx: struct pt_regs* @map: pointer to stack_trace map @flags: bits 0-7 - numer of stack frames to skip bit 8 - collect user stack instead of kernel bit 9 - compare stacks by hash only bit 10 - if two different stacks hash into the same stackid discard old other bits - reserved Return: >= 0 stackid on success or negative error stackid is a 32-bit integer handle that can be further combined with other data (including other stackid) and used as a key into maps. Userspace will access stackmap using standard lookup/delete syscall commands to retrieve full stack trace for given stackid. Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-20perf: generalize perf_callchainAlexei Starovoitov2-14/+20
. avoid walking the stack when there is no room left in the buffer . generalize get_perf_callchain() to be called from bpf helper Signed-off-by: Alexei Starovoitov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-19Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge fixes from Andrew Morton: "10 fixes" * emailed patches from Andrew Morton <[email protected]>: mm: slab: free kmem_cache_node after destroy sysfs file ipc/shm: handle removed segments gracefully in shm_mmap() MAINTAINERS: update Kselftest Framework mailing list devm_memremap_release(): fix memremap'd addr handling mm/hugetlb.c: fix incorrect proc nr_hugepages value mm, x86: fix pte_page() crash in gup_pte_range() fsnotify: turn fsnotify reaper thread into a workqueue job Revert "fsnotify: destroy marks with call_srcu instead of dedicated thread" mm: fix regression in remap_file_pages() emulation thp, dax: do not try to withdraw pgtable from non-anon VMA
2016-02-19bpf: grab rcu read lock for bpf_percpu_hash_updateSasha Levin1-1/+7
bpf_percpu_hash_update() expects rcu lock to be held and warns if it's not, which pointed out a missing rcu read lock. Fixes: 15a07b338 ("bpf: add lookup/update support for per-cpu hash and array maps") Signed-off-by: Sasha Levin <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-02-19tracing, kasan: Silence Kasan warning in check_stack of stack_tracerYang Shi1-1/+5
When enabling stack trace via "echo 1 > /proc/sys/kernel/stack_tracer_enabled", the below KASAN warning is triggered: BUG: KASAN: stack-out-of-bounds in check_stack+0x344/0x848 at addr ffffffc0689ebab8 Read of size 8 by task ksoftirqd/4/29 page:ffffffbdc3a27ac0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() page dumped because: kasan: bad access detected CPU: 4 PID: 29 Comm: ksoftirqd/4 Not tainted 4.5.0-rc1 #129 Hardware name: Freescale Layerscape 2085a RDB Board (DT) Call trace: [<ffffffc000091300>] dump_backtrace+0x0/0x3a0 [<ffffffc0000916c4>] show_stack+0x24/0x30 [<ffffffc0009bbd78>] dump_stack+0xd8/0x168 [<ffffffc000420bb0>] kasan_report_error+0x6a0/0x920 [<ffffffc000421688>] kasan_report+0x70/0xb8 [<ffffffc00041f7f0>] __asan_load8+0x60/0x78 [<ffffffc0002e05c4>] check_stack+0x344/0x848 [<ffffffc0002e0c8c>] stack_trace_call+0x1c4/0x370 [<ffffffc0002af558>] ftrace_ops_no_ops+0x2c0/0x590 [<ffffffc00009f25c>] ftrace_graph_call+0x0/0x14 [<ffffffc0000881bc>] fpsimd_thread_switch+0x24/0x1e8 [<ffffffc000089864>] __switch_to+0x34/0x218 [<ffffffc0011e089c>] __schedule+0x3ac/0x15b8 [<ffffffc0011e1f6c>] schedule+0x5c/0x178 [<ffffffc0001632a8>] smpboot_thread_fn+0x350/0x960 [<ffffffc00015b518>] kthread+0x1d8/0x2b0 [<ffffffc0000874d0>] ret_from_fork+0x10/0x40 Memory state around the buggy address: ffffffc0689eb980: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4 ffffffc0689eba00: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 >ffffffc0689eba80: 00 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 ^ ffffffc0689ebb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0689ebb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 The stacker tracer traverses the whole kernel stack when saving the max stack trace. It may touch the stack red zones to cause the warning. So, just disable the instrumentation to silence the warning. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Shi <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-02-18Merge branch 'for-linus' of ↵Linus Torvalds2-35/+5
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching Pull livepatching fixes from Jiri Kosina: - regression (from 4.4) fix for ordering issue, introduced by an earlier ftrace change, that broke live patching of modules. The fix replaces the ftrace module notifier by direct call in order to make the ordering guaranteed and well-defined. The patch, from Jessica Yu, has been acked both by Steven and Rusty - error message fix from Miroslav Benes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching: ftrace/module: remove ftrace module notifier livepatch: change the error message in asm/livepatch.h header files
2016-02-18devm_memremap_release(): fix memremap'd addr handlingToshi Kani1-1/+1
The pmem driver calls devm_memremap() to map a persistent memory range. When the pmem driver is unloaded, this memremap'd range is not released so the kernel will leak a vma. Fix devm_memremap_release() to handle a given memremap'd address properly. Signed-off-by: Toshi Kani <[email protected]> Acked-by: Dan Williams <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-02-18cgroup: fix alloc_cgroup_ns() error handling in copy_cgroup_ns()Tejun Heo1-2/+3
alloc_cgroup_ns() returns an ERR_PTR value on error but copy_cgroup_ns() was checking for NULL for error. Fix it. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Dan Carpenter <[email protected]>
2016-02-18signals, pkeys: Notify userspace about protection key faultsDave Hansen1-0/+4
A protection key fault is very similar to any other access error. There must be a VMA, etc... We even want to take the same action (SIGSEGV) that we do with a normal access fault. However, we do need to let userspace know that something is different. We do this the same way what we did with SEGV_BNDERR with Memory Protection eXtensions (MPX): define a new SEGV code: SEGV_PKUERR. We add a siginfo field: si_pkey that reveals to userspace which protection key was set on the PTE that we faulted on. There is no other easy way for userspace to figure this out. They could parse smaps but that would be a bit cruel. We share space with in siginfo with _addr_bnd. #BR faults from MPX are completely separate from page faults (#PF) that trigger from protection key violations, so we never need both at the same time. Note that _pkey is a 64-bit value. The current hardware only supports 4-bit protection keys. We do this because there is _plenty_ of space in _sigfault and it is possible that future processors would support more than 4 bits of protection keys. The x86 code to actually fill in the siginfo is in the next patch. Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Al Viro <[email protected]> Cc: Amanieu d'Antras <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Vegard Nossum <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17workqueue: Replace usage of init_name with dev_set_name()Lars-Peter Clausen1-1/+1
The init_name property of the device struct is sort of a hack and should only be used for statically allocated devices. Since the device is dynamically allocated here it is safe to use the proper way to set a devices name by calling dev_set_name(). Signed-off-by: Lars-Peter Clausen <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2016-02-17ftrace/module: remove ftrace module notifierJessica Yu2-35/+5
Remove the ftrace module notifier in favor of directly calling ftrace_module_enable() and ftrace_release_mod() in the module loader. Hard-coding the function calls directly in the module loader removes dependence on the module notifier call chain and provides better visibility and control over what gets called when, which is important to kernel utilities such as livepatch. This fixes a notifier ordering issue in which the ftrace module notifier (and hence ftrace_module_enable()) for coming modules was being called after klp_module_notify(), which caused livepatch modules to initialize incorrectly. This patch removes dependence on the module notifier call chain in favor of hard coding the corresponding function calls in the module loader. This ensures that ftrace and livepatch code get called in the correct order on patch module load and unload. Fixes: 5156dca34a3e ("ftrace: Fix the race between ftrace and insmod") Signed-off-by: Jessica Yu <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> Reviewed-by: Petr Mladek <[email protected]> Acked-by: Rusty Russell <[email protected]> Reviewed-by: Josh Poimboeuf <[email protected]> Reviewed-by: Miroslav Benes <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2016-02-17futex: Remove requirement for lock_page() in get_futex_key()Mel Gorman1-8/+91
When dealing with key handling for shared futexes, we can drastically reduce the usage/need of the page lock. 1) For anonymous pages, the associated futex object is the mm_struct which does not require the page lock. 2) For inode based, keys, we can check under RCU read lock if the page mapping is still valid and take reference to the inode. This just leaves one rare race that requires the page lock in the slow path when examining the swapcache. Additionally realtime users currently have a problem with the page lock being contended for unbounded periods of time during futex operations. Task A get_futex_key() lock_page() ---> preempted Now any other task trying to lock that page will have to wait until task A gets scheduled back in, which is an unbound time. With this patch, we pretty much have a lockless futex_get_key(). Experiments show that this patch can boost/speedup the hashing of shared futexes with the perf futex benchmarks (which is good for measuring such change) by up to 45% when there are high (> 100) thread counts on a 60 core Westmere. Lower counts are pretty much in the noise range or less than 10%, but mid range can be seen at over 30% overall throughput (hash ops/sec). This makes anon-mem shared futexes much closer to its private counterpart. Signed-off-by: Mel Gorman <[email protected]> [ Ported on top of thp refcount rework, changelog, comments, fixes. ] Signed-off-by: Davidlohr Bueso <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Chris Mason <[email protected]> Cc: Darren Hart <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17futex: Rename barrier references in ordering guaranteesDavidlohr Bueso1-17/+17
Ingo suggested we rename how we reference barriers A and B regarding futex ordering guarantees. This patch replaces, for both barriers, MB (A) with smp_mb(); (A), such that: - We explicitly state that the barriers are SMP, and - We standardize how we reference these across futex.c helping readers follow what barrier does what and where. Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Davidlohr Bueso <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Chris Mason <[email protected]> Cc: Darren Hart <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17perf/core: Remove unused arguments from a bunch of functionsThomas Gleixner1-11/+10
No functional change, just less confusing to read. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17Merge branch 'perf/urgent' into perf/core, to queue up dependent patchIngo Molnar9-118/+250
Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17perf/core: Plug potential memory leak in CPU_UP_PREPAREThomas Gleixner1-1/+1
If CPU_UP_PREPARE is called it is not guaranteed, that a previously allocated and assigned hash has been freed already, but perf_event_init_cpu() unconditionally allocates and assignes a new hash if the swhash is referenced. By overwriting the pointer the existing hash is not longer accessible. Verify that there is no hash assigned on this cpu before allocating and assigning a new one. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug stateThomas Gleixner1-1/+0
If CPU_DOWN_PREPARE fails the perf hotplug notifier is called for CPU_DOWN_FAILED and calls perf_event_init_cpu(), which checks whether the swhash is referenced. If yes it allocates a new hash and stores the pointer in the per cpu data structure. But at this point the cpu is still online, so there must be a valid hash already. By overwriting the pointer the existing hash is not longer accessible. Remove the CPU_DOWN_FAILED state, as there is nothing to (re)allocate. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17perf/core: Remove bogus UP_CANCELED hotplug stateThomas Gleixner1-1/+0
If CPU_UP_PREPARE fails the perf hotplug code calls perf_event_exit_cpu(), which is a pointless exercise. The cpu is not online, so the smp function calls return -ENXIO. So the result is a list walk to call noops. Remove it. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17sched/deadline: Fix trivial typo in printk() messageSteven Rostedt1-1/+1
It's "too much" not "to much". Signed-off-by: Steven Rostedt <[email protected]> Acked-by: Juri Lelli <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-17sched/core: Remove dead statement in __schedule()Byungchul Park1-1/+0
Remove an unnecessary assignment of variable not used any more. ( This has no runtime effects as GCC is smart enough to optimize this out. ) Signed-off-by: Byungchul Park <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Edited the changelog. ] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-16Add FS_USERNS_FLAG to cgroup fsSerge Hallyn1-0/+2
allowing root in a non-init user namespace to mount it. This should now be safe, because 1. non-init-root cannot mount a previously unbound subsystem 2. the task doing the mount must be privileged with respect to the user namespace owning the cgroup namespace 3. the mounted subsystem will have its current cgroup as the root dentry. the permissions will be unchanged, so tasks will receive no new privilege over the cgroups which they did not have on the original mounts. Signed-off-by: Serge Hallyn <[email protected]>
2016-02-16cgroup: mount cgroupns-root when inside non-init cgroupnsSerge Hallyn1-1/+47
This patch enables cgroup mounting inside userns when a process as appropriate privileges. The cgroup filesystem mounted is rooted at the cgroupns-root. Thus, in a container-setup, only the hierarchy under the cgroupns-root is exposed inside the container. This allows container management tools to run inside the containers without depending on any global state. Signed-off-by: Serge Hallyn <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2016-02-16cgroup: cgroup namespace setns supportAditya Kali1-3/+16
setns on a cgroup namespace is allowed only if task has CAP_SYS_ADMIN in its current user-namespace and over the user-namespace associated with target cgroupns. No implicit cgroup changes happen with attaching to another cgroupns. It is expected that the somone moves the attaching process under the target cgroupns-root. Signed-off-by: Aditya Kali <[email protected]> Signed-off-by: Serge E. Hallyn <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2016-02-16cgroup: introduce cgroup namespacesAditya Kali4-10/+192
Introduce the ability to create new cgroup namespace. The newly created cgroup namespace remembers the cgroup of the process at the point of creation of the cgroup namespace (referred as cgroupns-root). The main purpose of cgroup namespace is to virtualize the contents of /proc/self/cgroup file. Processes inside a cgroup namespace are only able to see paths relative to their namespace root (unless they are moved outside of their cgroupns-root, at which point they will see a relative path from their cgroupns-root). For a correctly setup container this enables container-tools (like libcontainer, lxc, lmctfy, etc.) to create completely virtualized containers without leaking system level cgroup hierarchy to the task. This patch only implements the 'unshare' part of the cgroupns. Signed-off-by: Aditya Kali <[email protected]> Signed-off-by: Serge Hallyn <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2016-02-16mm/gup: Introduce get_user_pages_remote()Dave Hansen1-2/+8
For protection keys, we need to understand whether protections should be enforced in software or not. In general, we enforce protections when working on our own task, but not when on others. We call these "current" and "remote" operations. This patch introduces a new get_user_pages() variant: get_user_pages_remote() Which is a replacement for when get_user_pages() is called on non-current tsk/mm. We also introduce a new gup flag: FOLL_REMOTE which can be used for the "__" gup variants to get this new behavior. The uprobes is_trap_at_addr() location holds mmap_sem and calls get_user_pages(current->mm) on an instruction address. This makes it a pretty unique gup caller. Being an instruction access and also really originating from the kernel (vs. the app), I opted to consider this a 'remote' access where protection keys will not be enforced. Without protection keys, this patch should not change any behavior. Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Denys Vlasenko <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-02-15treewide: Fix typo in printkMasanari Iida1-1/+1
This patch fix spelling typos found in printk and Kconfig. Signed-off-by: Masanari Iida <[email protected]> Acked-by: Randy Dunlap <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2016-02-15genirq: Use a common macro to go through the actions listDaniel Lezcano5-13/+10
The irq code browses the list of actions differently to inspect the element one by one. Even if it is not a problem, for the sake of consistent code, provide a macro similar to for_each_irq_desc in order to have the same loop to go through the actions list and use it in the code. [ tglx: Renamed the macro ] Signed-off-by: Daniel Lezcano <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-02-14Merge 4.5-rc4 into tty-nextGreg Kroah-Hartman9-122/+267
We want the fixes in here, and this resolves a merge error in tty_io.c Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-02-14Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds1-6/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull lockdep fix from Thomas Gleixner: "A single fix for the stack trace caching logic in lockdep, where the duplicate avoidance managed to store no back trace at all" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/lockdep: Fix stack trace caching logic
2016-02-13nohz: Implement wide kick on top of irq workFrederic Weisbecker1-8/+4
It simplifies it and allows wide kick to be performed, even when IRQs are disabled, without an asynchronous level in the middle. This comes at a cost of some more overhead on features like perf and posix cpu timers slow-paths, which is probably not much important for nohz full users. Requested-by: Peter Zijlstra <[email protected]> Reviewed-by: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Luiz Capitulino <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Viresh Kumar <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2016-02-13atomic: Export fetch_or()Frederic Weisbecker1-14/+0
Export fetch_or() that's implemented and used internally by the scheduler. We are going to use it for NO_HZ so make it generally available. Reviewed-by: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Luiz Capitulino <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Viresh Kumar <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2016-02-13Merge branch 'linus' into locking/core, to resolve conflictsIngo Molnar6-84/+216
Conflicts: kernel/locking/lockdep.c Signed-off-by: Ingo Molnar <[email protected]>