aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2011-03-23crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, ↵Olaf Hering2-0/+35
setup_elfcorehdr and saved_max_pfn The Xen PV drivers in a crashed HVM guest can not connect to the dom0 backend drivers because both frontend and backend drivers are still in connected state. To run the connection reset function only in case of a crashdump, the is_kdump_kernel() function needs to be available for the PV driver modules. Consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn into kernel/crash_dump.c Also export elfcorehdr_addr to make is_kdump_kernel() usable for modules. Leave 'elfcorehdr' as early_param(). This changes powerpc from __setup() to early_param(). It adds an address range check from x86 also on ia64 and powerpc. [[email protected]: additional #includes] [[email protected]: remove elfcorehdr_addr export] [[email protected]: fix for Tejun's mm/nobootmem.c changes] Signed-off-by: Olaf Hering <[email protected]> Cc: Russell King <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mundt <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23taskstats: use appropriate printk priority levelMandeep Singh Baines1-1/+1
printk()s without a priority level default to KERN_WARNING. To reduce noise at KERN_WARNING, this patch set the priority level appriopriately for unleveled printks()s. This should be useful to folks that look at dmesg warnings closely. Signed-off-by: Mandeep Singh Baines <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23userns: user namespaces: convert several capable() callsSerge E. Hallyn6-13/+29
CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(), because the resource comes from current's own ipc namespace. setuid/setgid are to uids in own namespace, so again checks can be against current_user_ns(). Changelog: Jan 11: Use task_ns_capable() in place of sched_capable(). Jan 11: Use nsown_capable() as suggested by Bastian Blank. Jan 11: Clarify (hopefully) some logic in futex and sched.c Feb 15: use ns_capable for ipc, not nsown_capable Feb 23: let copy_ipcs handle setting ipc_ns->user_ns Feb 23: pass ns down rather than taking it from current [[email protected]: coding-style fixes] Signed-off-by: Serge E. Hallyn <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Acked-by: Daniel Lezcano <[email protected]> Acked-by: David Howells <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23userns: add a user namespace owner of ipc nsSerge E. Hallyn1-0/+5
Changelog: Feb 15: Don't set new ipc->user_ns if we didn't create a new ipc_ns. Feb 23: Move extern declaration to ipc_namespace.h, and group fwd declarations at top. Signed-off-by: Serge E. Hallyn <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Acked-by: Daniel Lezcano <[email protected]> Acked-by: David Howells <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23userns: user namespaces: convert all capable checks in kernel/sys.cSerge E. Hallyn1-26/+49
This allows setuid/setgid in containers. It also fixes some corner cases where kernel logic foregoes capability checks when uids are equivalent. The latter will need to be done throughout the whole kernel. Changelog: Jan 11: Use nsown_capable() as suggested by Bastian Blank. Jan 11: Fix logic errors in uid checks pointed out by Bastian. Feb 15: allow prlimit to current (was regression in previous version) Feb 23: remove debugging printks, uninline set_one_prio_perm and make it bool, and document its return value. Signed-off-by: Serge E. Hallyn <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Acked-by: Daniel Lezcano <[email protected]> Acked-by: David Howells <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23userns: make has_capability* into real functionsSerge E. Hallyn1-0/+54
So we can let type safety keep things sane, and as a bonus we can remove the declaration of init_user_ns in capability.h. Signed-off-by: Serge E. Hallyn <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Daniel Lezcano <[email protected]> Cc: David Howells <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23userns: allow ptrace from non-init user namespacesSerge E. Hallyn1-12/+15
ptrace is allowed to tasks in the same user namespace according to the usual rules (i.e. the same rules as for two tasks in the init user namespace). ptrace is also allowed to a user namespace to which the current task the has CAP_SYS_PTRACE capability. Changelog: Dec 31: Address feedback by Eric: . Correct ptrace uid check . Rename may_ptrace_ns to ptrace_capable . Also fix the cap_ptrace checks. Jan 1: Use const cred struct Jan 11: use task_ns_capable() in place of ptrace_capable(). Feb 23: same_or_ancestore_user_ns() was not an appropriate check to constrain cap_issubset. Rather, cap_issubset() only is meaningful when both capsets are in the same user_ns. Signed-off-by: Serge E. Hallyn <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Acked-by: Daniel Lezcano <[email protected]> Acked-by: David Howells <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23userns: allow killing tasks in your own or child usernsSerge E. Hallyn1-8/+22
Changelog: Dec 8: Fixed bug in my check_kill_permission pointed out by Eric Biederman. Dec 13: Apply Eric's suggestion to pass target task into kill_ok_by_cred() for clarity Dec 31: address comment by Eric Biederman: don't need cred/tcred in check_kill_permission. Jan 1: use const cred struct. Jan 11: Per Bastian Blank's advice, clean up kill_ok_by_cred(). Feb 16: kill_ok_by_cred: fix bad parentheses Feb 23: per akpm, let compiler inline kill_ok_by_cred Signed-off-by: Serge E. Hallyn <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Acked-by: Daniel Lezcano <[email protected]> Acked-by: David Howells <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23userns: allow sethostname in a containerSerge E. Hallyn3-12/+9
Changelog: Feb 23: let clone_uts_ns() handle setting uts->user_ns To do so we need to pass in the task_struct who'll get the utsname, so we can get its user_ns. Feb 23: As per Oleg's coment, just pass in tsk, instead of two of its members. Signed-off-by: Serge E. Hallyn <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Acked-by: Daniel Lezcano <[email protected]> Acked-by: David Howells <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23userns: security: make capabilities relative to the user namespaceSerge E. Hallyn2-5/+43
- Introduce ns_capable to test for a capability in a non-default user namespace. - Teach cap_capable to handle capabilities in a non-default user namespace. The motivation is to get to the unprivileged creation of new namespaces. It looks like this gets us 90% of the way there, with only potential uid confusion issues left. I still need to handle getting all caps after creation but otherwise I think I have a good starter patch that achieves all of your goals. Changelog: 11/05/2010: [serge] add apparmor 12/14/2010: [serge] fix capabilities to created user namespaces Without this, if user serge creates a user_ns, he won't have capabilities to the user_ns he created. THis is because we were first checking whether his effective caps had the caps he needed and returning -EPERM if not, and THEN checking whether he was the creator. Reverse those checks. 12/16/2010: [serge] security_real_capable needs ns argument in !security case 01/11/2011: [serge] add task_ns_capable helper 01/11/2011: [serge] add nsown_capable() helper per Bastian Blank suggestion 02/16/2011: [serge] fix a logic bug: the root user is always creator of init_user_ns, but should not always have capabilities to it! Fix the check in cap_capable(). 02/21/2011: Add the required user_ns parameter to security_capable, fixing a compile failure. 02/23/2011: Convert some macros to functions as per akpm comments. Some couldn't be converted because we can't easily forward-declare them (they are inline if !SECURITY, extern if SECURITY). Add a current_user_ns function so we can use it in capability.h without #including cred.h. Move all forward declarations together to the top of the #ifdef __KERNEL__ section, and use kernel-doc format. 02/23/2011: Per dhowells, clean up comment in cap_capable(). 02/23/2011: Per akpm, remove unreachable 'return -EPERM' in cap_capable. (Original written and signed off by Eric; latest, modified version acked by him) [[email protected]: fix build] [[email protected]: export current_user_ns() for ecryptfs] [[email protected]: remove unneeded extra argument in selinux's task_has_capability] Signed-off-by: Eric W. Biederman <[email protected]> Signed-off-by: Serge E. Hallyn <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Acked-by: Daniel Lezcano <[email protected]> Acked-by: David Howells <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Serge E. Hallyn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23userns: add a user_namespace as creator/owner of uts_namespaceSerge E. Hallyn3-2/+15
The expected course of development for user namespaces targeted capabilities is laid out at https://wiki.ubuntu.com/UserNamespace. Goals: - Make it safe for an unprivileged user to unshare namespaces. They will be privileged with respect to the new namespace, but this should only include resources which the unprivileged user already owns. - Provide separate limits and accounting for userids in different namespaces. Status: Currently (as of 2.6.38) you can clone with the CLONE_NEWUSER flag to get a new user namespace if you have the CAP_SYS_ADMIN, CAP_SETUID, and CAP_SETGID capabilities. What this gets you is a whole new set of userids, meaning that user 500 will have a different 'struct user' in your namespace than in other namespaces. So any accounting information stored in struct user will be unique to your namespace. However, throughout the kernel there are checks which - simply check for a capability. Since root in a child namespace has all capabilities, this means that a child namespace is not constrained. - simply compare uid1 == uid2. Since these are the integer uids, uid 500 in namespace 1 will be said to be equal to uid 500 in namespace 2. As a result, the lxc implementation at lxc.sf.net does not use user namespaces. This is actually helpful because it leaves us free to develop user namespaces in such a way that, for some time, user namespaces may be unuseful. Bugs aside, this patchset is supposed to not at all affect systems which are not actively using user namespaces, and only restrict what tasks in child user namespace can do. They begin to limit privilege to a user namespace, so that root in a container cannot kill or ptrace tasks in the parent user namespace, and can only get world access rights to files. Since all files currently belong to the initila user namespace, that means that child user namespaces can only get world access rights to *all* files. While this temporarily makes user namespaces bad for system containers, it starts to get useful for some sandboxing. I've run the 'runltplite.sh' with and without this patchset and found no difference. This patch: copy_process() handles CLONE_NEWUSER before the rest of the namespaces. So in the case of clone(CLONE_NEWUSER|CLONE_NEWUTS) the new uts namespace will have the new user namespace as its owner. That is what we want, since we want root in that new userns to be able to have privilege over it. Changelog: Feb 15: don't set uts_ns->user_ns if we didn't create a new uts_ns. Feb 23: Move extern init_user_ns declaration from init/version.c to utsname.h. Signed-off-by: Serge E. Hallyn <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Acked-by: Daniel Lezcano <[email protected]> Acked-by: David Howells <[email protected]> Cc: James Morris <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23pidns: call pid_ns_prepare_proc() from create_pid_namespace()Eric W. Biederman2-8/+9
Reorganize proc_get_sb() so it can be called before the struct pid of the first process is allocated. Signed-off-by: Eric W. Biederman <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Alexey Dobriyan <[email protected]> Acked-by: Serge E. Hallyn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23pid: remove the child_reaper special case in init/main.cEric W. Biederman1-1/+1
This patchset is a cleanup and a preparation to unshare the pid namespace. These prerequisites prepare for Eric's patchset to give a file descriptor to a namespace and join an existing namespace. This patch: It turns out that the existing assignment in copy_process of the child_reaper can handle the initial assignment of child_reaper we just need to generalize the test in kernel/fork.c Signed-off-by: Eric W. Biederman <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Alexey Dobriyan <[email protected]> Acked-by: Serge E. Hallyn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23sysctl: restrict write access to dmesg_restrictRichard Weinberger1-1/+17
When dmesg_restrict is set to 1 CAP_SYS_ADMIN is needed to read the kernel ring buffer. But a root user without CAP_SYS_ADMIN is able to reset dmesg_restrict to 0. This is an issue when e.g. LXC (Linux Containers) are used and complete user space is running without CAP_SYS_ADMIN. A unprivileged and jailed root user can bypass the dmesg_restrict protection. With this patch writing to dmesg_restrict is only allowed when root has CAP_SYS_ADMIN. Signed-off-by: Richard Weinberger <[email protected]> Acked-by: Dan Rosenberg <[email protected]> Acked-by: Serge E. Hallyn <[email protected]> Cc: Eric Paris <[email protected]> Cc: Kees Cook <[email protected]> Cc: James Morris <[email protected]> Cc: Eugene Teo <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23sysctl: add some missing input constraint checksPetr Holasek1-4/+13
Add boundaries of allowed input ranges for: dirty_expire_centisecs, drop_caches, overcommit_memory, page-cluster and panic_on_oom. Signed-off-by: Petr Holasek <[email protected]> Acked-by: Dave Young <[email protected]> Cc: David Rientjes <[email protected]> Cc: Wu Fengguang <[email protected]> Cc: Alexey Dobriyan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23sysctl_check: drop dead codeDenis Kirjanov1-4/+0
Drop dead code. Signed-off-by: Denis Kirjanov <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23sysctl_check: drop table->procname checksDenis Kirjanov1-4/+2
Since the for loop checks for the table->procname drop useless table->procname checks inside the loop body Signed-off-by: Denis Kirjanov <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23cpuset: hold callback_mutex in cpuset_post_clone()Li Zefan1-0/+2
Chaning cpuset->mems/cpuset->cpus should be protected under callback_mutex. cpuset_clone() doesn't follow this rule. It's ok because it's called when creating and initializing a cgroup, but we'd better hold the lock to avoid subtil break in the future. Signed-off-by: Li Zefan <[email protected]> Acked-by: Paul Menage <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Miao Xie <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23cpuset: fix unchecked calls to NODEMASK_ALLOC()Li Zefan1-35/+16
Those functions that use NODEMASK_ALLOC() can't propagate errno to users, but will fail silently. Fix it by using a static nodemask_t variable for each function, and those variables are protected by cgroup_mutex; [[email protected]: fix comment spelling, strengthen cgroup_lock comment] Signed-off-by: Li Zefan <[email protected]> Cc: Paul Menage <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Miao Xie <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23cpuset: remove unneeded NODEMASK_ALLOC() in cpuset_attach()Li Zefan1-5/+2
oldcs->mems_allowed is not modified during cpuset_attach(), so we don't have to copy it to a buffer allocated by NODEMASK_ALLOC(). Just pass it to cpuset_migrate_mm(). Signed-off-by: Li Zefan <[email protected]> Cc: Paul Menage <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Miao Xie <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23cpuset: remove unneeded NODEMASK_ALLOC() in cpuset_sprintf_memlist()Li Zefan1-16/+8
It's not necessary to copy cpuset->mems_allowed to a buffer allocated by NODEMASK_ALLOC(). Just pass it to nodelist_scnprintf(). As spotted by Paul, a side effect is we fix a bug that the function can return -ENOMEM but the caller doesn't expect negative return value. Therefore change the return value of cpuset_sprintf_cpulist() and cpuset_sprintf_memlist() from int to size_t. Signed-off-by: Li Zefan <[email protected]> Acked-by: Paul Menage <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Miao Xie <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23memcg: remove direct page_cgroup-to-page pointerJohannes Weiner1-0/+2
In struct page_cgroup, we have a full word for flags but only a few are reserved. Use the remaining upper bits to encode, depending on configuration, the node or the section, to enable page_cgroup-to-page lookups without a direct pointer. This saves a full word for every page in a system with memory cgroups enabled. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Daisuke Nishimura <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23memcg: res_counter_read_u64(): fix potential races on 32-bit machinesKAMEZAWA Hiroyuki1-0/+14
res_counter_read_u64 reads u64 value without lock. It's dangerous in a 32bit environment. Add locking. Signed-off-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: David Rientjes <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-23timekeeping: Use syscore_ops instead of sysdev class and sysdevRafael J. Wysocki1-19/+8
The timekeeping subsystem uses a sysdev class and a sysdev for executing timekeeping_suspend() after interrupts have been turned off on the boot CPU (during system suspend) and for executing timekeeping_resume() before turning on interrupts on the boot CPU (during system resume). However, since both of these functions ignore their arguments, the entire mechanism may be replaced with a struct syscore_ops object which is simpler. Signed-off-by: Rafael J. Wysocki <[email protected]> Reviewed-by: Thomas Gleixner <[email protected]>
2011-03-23mm: arch: rename in_gate_area_no_task to in_gate_area_no_mmStephen Wilson1-2/+2
Now that gate vma's are referenced with respect to a particular mm and not a particular task it only makes sense to propagate the change to this predicate as well. Signed-off-by: Stephen Wilson <[email protected]> Reviewed-by: Michel Lespinasse <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Al Viro <[email protected]>
2011-03-23perf: Better fit max unprivileged mlock pages for tools needsFrederic Weisbecker1-1/+2
The maximum kilobytes of locked memory that an unprivileged user can reserve is of 512 kB = 128 pages by default, scaled to the number of onlined CPUs, which fits well with the tools that use 128 data pages by default. However tools actually use 129 pages, because they need one more for the user control page. Thus the default mlock threshold is not sufficient for the default tools needs and we always end up to evaluate the constant mlock rlimit policy, which doesn't have this scaling with the number of online CPUs. Hence, on systems that have more than 16 CPUs, we overlap the rlimit threshold and fail to mmap: $ perf record ls Error: failed to mmap with 1 (Operation not permitted) Just increase the max unprivileged mlock threshold by one page so that it supports well perf tools even after 16 CPUs. Reported-by: Han Pingtian <[email protected]> Reported-by: Peter Zijlstra <[email protected]> Reported-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Acked-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Stable <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-23genirq; Remove the last leftovers of the old sparse irq codeThomas Gleixner1-14/+0
All users converted. Get rid of it. Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-23perf_events: Fix stale ->cgrp pointer in update_cgrp_time_from_cpuctx()Stephane Eranian1-1/+11
This patch solves a stale pointer problem in update_cgrp_time_from_cpuctx(). The cpuctx->cgrp was not cleared on all possible event exit paths, including: close() perf_release() perf_release_kernel() list_del_event() This patch fixes list_del_event() to clear cpuctx->cgrp when there are no cgroup events left in the context. [ This second version makes the code compile when CONFIG_CGROUP_PERF is not enabled. We unconditionally define perf_cpu_context->cgrp. ] Signed-off-by: Stephane Eranian <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] LKML-Reference: <20110323150306.GA1580@quad> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-23sched, doc: Update sched-design-CFS.txtBorislav Petkov2-4/+0
Correct ->dequeue_tree() thinko into sched_class->dequeue_task and drop all references to ->task_new() since it is obviously gone. Signed-off-by: Borislav Petkov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Mike Galbraith <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-23lockdep: Remove unused 'factor' variable from lockdep_stats_show()Sergey Senozhatsky1-8/+1
Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-23sched: Remove unused 'rq' variable and cpu_rq() call from ↵Sergey Senozhatsky1-3/+0
alloc_fair_sched_group() Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-23Merge branch 'tip/perf/urgent' of ↵Ingo Molnar1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
2011-03-22printk: allow setting DEFAULT_MESSAGE_LEVEL via KconfigMandeep Singh Baines1-1/+1
We've been burned by regressions/bugs which we later realized could have been triaged quicker if only we'd paid closer attention to dmesg. To make it easier to audit dmesg, we'd like to make DEFAULT_MESSAGE_LEVEL Kconfig-settable. That way we can set it to KERN_NOTICE and audit any messages <= KERN_WARNING. Signed-off-by: Mandeep Singh Baines <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Joe Perches <[email protected]> Cc: Olof Johansson <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22printk: use %pK for /proc/kallsyms and /proc/modulesKees Cook2-8/+6
In an effort to reduce kernel address leaks that might be used to help target kernel privilege escalation exploits, this patch uses %pK when displaying addresses in /proc/kallsyms, /proc/modules, and /sys/module/*/sections/*. Note that this changes %x to %p, so some legitimately 0 values in /proc/kallsyms would have changed from 00000000 to "(null)". To avoid this, "(null)" is not used when using the "K" format. Anything that was already successfully parsing "(null)" in addition to full hex digits should have no problem with this change. (Thanks to Joe Perches for the suggestion.) Due to the %x to %p, "void *" casts are needed since these addresses are already "unsigned long" everywhere internally, due to their starting life as ELF section offsets. Signed-off-by: Kees Cook <[email protected]> Cc: Eugene Teo <[email protected]> Cc: Dan Rosenberg <[email protected]> Cc: Rusty Russell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22console: prevent registered consoles from dumping old kernel message over againFeng Tang1-0/+18
For a platform with many consoles like: "console=tty1 console=ttyMFD2 console=ttyS0 earlyprintk=mrst" Each time when the non "selected_console" (tty1 and ttyMFD2 here) get registered, the existing kernel message will be printed out on registered consoles again, the "mrst" early console will get some same message for 3 times, and "tty1" will get some for twice. As suggested by Andrew Morton, every time a new console is registered, it will be set as the "exclusive" console which will dump the already existing kernel messages. Signed-off-by: Feng Tang <[email protected]> Cc: Greg KH <[email protected]> Cc: Alan Cox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22console: allow to retain boot console via boot option keep_bootconFabio M. Di Nitto1-1/+15
On some architectures, the boot process involves de-registering the boot console (early boot), initialize drivers and then re-register the console. This mechanism introduces a window in which no printk can happen on the console and messages are buffered and then printed once the new console is available. If a kernel crashes during this window, all it's left on the boot console is "console [foo] enabled, bootconsole disabled" making debug of the crash rather 'interesting'. By adding "keep_bootcon" option, do not unregister the boot console, that will allow to printk everything that is happening up to the crash. The option is clearly meant only for debugging purposes as it introduces lots of duplicated info printed on console, but will make bug report from users easier as it doesn't require a kernel build just to figure out where we crash. Signed-off-by: Fabio M. Di Nitto <[email protected]> Acked-by: David S. Miller <[email protected]> Cc: Alan Cox <[email protected]> Cc: Greg KH <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22kernel/watchdog.c: always return NOTIFY_OK during cpu up/down eventsDon Zickus1-6/+16
This patch addresses a couple of problems. One was the case when the hardlockup failed to start, it also failed to start the softlockup. There were valid cases when the hardlockup shouldn't start and that shouldn't block the softlockup (no lapic, bios controls perf counters). The second problem was when the hardlockup failed to start on boxes (from a no lapic or bios controlled perf counter case), it reported failure to the cpu notifier chain. This blocked the notifier from continuing to start other more critical pieces of cpu bring-up (in our case based on a 2.6.32 fork, it was the mce). As a result, during soft cpu online/offline testing, the system would panic when a cpu was offlined because the cpu notifier would succeed in processing a watchdog disable cpu event and would panic in the mce case as a result of un-initialized variables from a never executed cpu up event. I realized the hardlockup/softlockup cases are really just debugging aids and should never impede the progress of a cpu up/down event. Therefore I modified the code to always return NOTIFY_OK and instead rely on printks to inform the user of problems. Signed-off-by: Don Zickus <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Reviewed-by: WANG Cong <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22kernel/watchdog.c: allow hardlockup to panic by defaultDon Zickus1-1/+4
When a cpu is considered stuck, instead of limping along and just printing a warning, it is sometimes preferred to just panic, let kdump capture the vmcore and reboot. This gets the machine back into a stable state quickly while saving the info that got it into a stuck state to begin with. Add a Kconfig option to allow users to set the hardlockup to panic by default. Also add in a 'nmi_watchdog=nopanic' to override this. [[email protected]: fix strncmp length] Signed-off-by: Don Zickus <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Reviewed-by: WANG Cong <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22sys_unshare: remove the dead CLONE_THREAD/SIGHAND/VM codeOleg Nesterov1-98/+25
Cleanup: kill the dead code which does nothing but complicates the code and confuses the reader. sys_unshare(CLONE_THREAD/SIGHAND/VM) is not really implemented, and I doubt very much it will ever work. At least, nobody even tried since the original 99d1419d96d7df9cfa56 ("unshare system call -v5: system call handler function") was applied more than 4 years ago. And the code is not consistent. unshare_thread() always fails unconditionally, while unshare_sighand() and unshare_vm() pretend to work if there is nothing to unshare. Remove unshare_thread(), unshare_sighand(), unshare_vm() helpers and related variables and add a simple CLONE_THREAD | CLONE_SIGHAND| CLONE_VM check into check_unshare_flags(). Also, move the "CLONE_NEWNS needs CLONE_FS" check from check_unshare_flags() to sys_unshare(). This looks more consistent and matches the similar do_sysvsem check in sys_unshare(). Note: with or without this patch "atomic_read(mm->mm_users) > 1" can give a false positive due to get_task_mm(). Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Roland McGrath <[email protected]> Cc: Janak Desai <[email protected]> Cc: Daniel Lezcano <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Acked-by: Serge Hallyn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22kernel/cpu.c: fix many errors related to style.Michael Rodriguez1-6/+5
Change the printk() calls to have the KERN_INFO/KERN_ERROR stuff, and fixes other coding style errors. Not _all_ of them are gone, though. [[email protected]: revert the bits I disagree with] Signed-off-by: Michael Rodriguez <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22smp: move smp setup functions to kernel/smp.cAmerigo Wang1-0/+81
Move setup_nr_cpu_ids(), smp_init() and some other SMP boot parameter setup functions from init/main.c to kenrel/smp.c, saves some #ifdef CONFIG_SMP. Signed-off-by: WANG Cong <[email protected]> Cc: Rakib Mullick <[email protected]> Cc: David Howells <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Akinobu Mita <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22move x86 specific oops=panic to generic codeOlaf Hering1-0/+10
The oops=panic cmdline option is not x86 specific, move it to generic code. Update documentation. Signed-off-by: Olaf Hering <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22kthread: use kthread_create_on_node()Eric Dumazet3-5/+12
ksoftirqd, kworker, migration, and pktgend kthreads can be created with kthread_create_on_node(), to get proper NUMA affinities for their stack and task_struct. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: David S. Miller <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Acked-by: Rusty Russell <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: David Howells <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22kthread: NUMA aware kthread_create_on_node()Eric Dumazet2-7/+27
All kthreads being created from a single helper task, they all use memory from a single node for their kernel stack and task struct. This patch suite creates kthread_create_on_node(), adding a 'cpu' parameter to parameters already used by kthread_create(). This parameter serves in allocating memory for the new kthread on its memory node if possible. Signed-off-by: Eric Dumazet <[email protected]> Acked-by: David S. Miller <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Acked-by: Rusty Russell <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: David Howells <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm: NUMA aware alloc_thread_info_node()Eric Dumazet1-3/+6
Add a node parameter to alloc_thread_info(), and change its name to alloc_thread_info_node() This change is needed to allow NUMA aware kthread_create_on_cpu() Signed-off-by: Eric Dumazet <[email protected]> Acked-by: David S. Miller <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Acked-by: Rusty Russell <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: David Howells <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22mm: NUMA aware alloc_task_struct_node()Eric Dumazet1-4/+6
All kthreads being created from a single helper task, they all use memory from a single node for their kernel stack and task struct. This patch suite creates kthread_create_on_cpu(), adding a 'cpu' parameter to parameters already used by kthread_create(). This parameter serves in allocating memory for the new kthread on its memory node if available. Users of this new function are : ksoftirqd, kworker, migration, pktgend... This patch: Add a node parameter to alloc_task_struct(), and change its name to alloc_task_struct_node() This change is needed to allow NUMA aware kthread_create_on_cpu() Signed-off-by: Eric Dumazet <[email protected]> Acked-by: David S. Miller <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Acked-by: Rusty Russell <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: David Howells <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22cgroups: if you list_empty() a head then don't list_del() itPhil Carmody1-8/+6
list_del() leaves poison in the prev and next pointers. The next list_empty() will compare those poisons, and say the list isn't empty. Any list operations that assume the node is on a list because of such a check will be fooled into dereferencing poison. One needs to INIT the node after the del, and fortunately there's already a wrapper for that - list_del_init(). Some of the dels are followed by deallocations, so can be ignored, and one can be merged with an add to make a move. Apart from that, I erred on the side of caution in making nodes list_empty()-queriable. Signed-off-by: Phil Carmody <[email protected]> Reviewed-by: Paul Menage <[email protected]> Cc: Li Zefan <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-22tracing: Fix set_ftrace_filter probe function displayJiri Olsa1-2/+1
If one or more function probes (like traceon) are enabled, and there's no other function filter, the first probe func is skipped (which one depends on the position in the hash). $ echo sys_open:traceon sys_close:traceon > ./set_ftrace_filter $ cat set_ftrace_filter #### all functions enabled #### sys_close:traceon:unlimited $ The reason was, that in the case of no other function filter, the func_pos was not properly updated before calling t_hash_start. Signed-off-by: Jiri Olsa <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2011-03-21Prevent rt_sigqueueinfo and rt_tgsigqueueinfo from spoofing the signal codeJulien Tinnes1-4/+12
Userland should be able to trust the pid and uid of the sender of a signal if the si_code is SI_TKILL. Unfortunately, the kernel has historically allowed sigqueueinfo() to send any si_code at all (as long as it was negative - to distinguish it from kernel-generated signals like SIGILL etc), so it could spoof a SI_TKILL with incorrect siginfo values. Happily, it looks like glibc has always set si_code to the appropriate SI_QUEUE, so there are probably no actual user code that ever uses anything but the appropriate SI_QUEUE flag. So just tighten the check for si_code (we used to allow any negative value), and add a (one-time) warning in case there are binaries out there that might depend on using other si_code values. Signed-off-by: Julien Tinnes <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-20Merge branch 'trivial' of ↵Linus Torvalds2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6 * 'trivial' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild-2.6: (25 commits) video: change to new flag variable scsi: change to new flag variable rtc: change to new flag variable rapidio: change to new flag variable pps: change to new flag variable net: change to new flag variable misc: change to new flag variable message: change to new flag variable memstick: change to new flag variable isdn: change to new flag variable ieee802154: change to new flag variable ide: change to new flag variable hwmon: change to new flag variable dma: change to new flag variable char: change to new flag variable fs: change to new flag variable xtensa: change to new flag variable um: change to new flag variables s390: change to new flag variable mips: change to new flag variable ... Fix up trivial conflict in drivers/hwmon/Makefile