Age | Commit message (Collapse) | Author | Files | Lines |
|
Signed-off-by: David S. Miller <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
the first one is equal to signal_pt_regs(), the second is never used
(and always NULL, while we are at it).
Signed-off-by: Al Viro <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
... and get rid of idiotic struct pt_regs * in asm-generic/syscalls.h
prototypes of the same, while we are at it. Eventually we want those
in linux/syscalls.h, of course, but that'll have to wait a bit.
Note that there are *three* variants of sys_clone() order of arguments.
Braindamage galore...
Signed-off-by: Al Viro <[email protected]>
|
|
Signed-off-by: Al Viro <[email protected]>
|
|
Use list_del_init() rather than list_del() to remove events from
cgrp->event_list. No functional change. This is just defensive
coding.
Signed-off-by: Greg Thelen <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
The cgroup_event_wake() function is called with the wait queue head
locked and it takes cgrp->event_list_lock. However, in cgroup_rmdir()
remove_wait_queue() was being called after taking
cgrp->event_list_lock. Correct the lock ordering by using a temporary
list to obtain the event list to remove from the wait queue.
Signed-off-by: Greg Thelen <[email protected]>
Signed-off-by: Aaron Durbin <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Remove conditional code based on CONFIG_HOTPLUG being false. It's
always on now in preparation of it going away as an option.
Signed-off-by: Bill Pemberton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
Remove conditional code based on CONFIG_HOTPLUG being false. It's
always on now in preparation of it going away as an option.
Signed-off-by: Bill Pemberton <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The reason for the scaling and monotonicity correction performed
by cputime_adjust() may not be immediately clear to the reviewer.
Add some comments to explain what happens there.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul Gortmaker <[email protected]>
|
|
task_cputime_adjusted() and thread_group_cputime_adjusted()
essentially share the same code. They just don't use the same
source:
* The first function uses the cputime in the task struct and the
previous adjusted snapshot that ensures monotonicity.
* The second adds the cputime of all tasks in the group and the
previous adjusted snapshot of the whole group from the signal
structure.
Just consolidate the common code that does the adjustment. These
functions just need to fetch the values from the appropriate
source.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul Gortmaker <[email protected]>
|
|
We have thread_group_cputime() and thread_group_times(). The naming
doesn't provide enough information about the difference between
these two APIs.
To lower the confusion, rename thread_group_times() to
thread_group_cputime_adjusted(). This name better suggests that
it's a version of thread_group_cputime() that does some stabilization
on the raw cputime values. ie here: scale on top of CFS runtime
stats and bound lower value for monotonicity.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul Gortmaker <[email protected]>
|
|
thread_group_cputime() is a general cputime API that is not only
used by posix cpu timer. Let's move this helper to sched code.
Signed-off-by: Frederic Weisbecker <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Steven Rostedt <[email protected]>
Cc: Paul Gortmaker <[email protected]>
|
|
2243076ad1 ("cgroup: initialize cgrp->allcg_node in
init_cgroup_housekeeping()") initializes cgrp->allcg_node in
init_cgroup_housekeeping(). Then in init_cgroup_root(), we should
call init_cgroup_housekeeping() before adding it to &root->allcg_list;
otherwise, we are initializing an entry already in a list.
Signed-off-by: Li Zhong <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
As suggested by John, export time data similarly to how its
done by vsyscall support. This allows KVM to retrieve necessary
information to implement vsyscall support in KVM guests.
Acked-by: John Stultz <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Originally from Jeremy Fitzhardinge.
Acked-by: Ingo Molnar <[email protected]>
Signed-off-by: Marcelo Tosatti <[email protected]>
|
|
Dave Jones reported a bug with futex_lock_pi() that his trinity test
exposed. Sometime between queue_me() and taking the q.lock_ptr, the
lock_ptr became NULL, resulting in a crash.
While futex_wake() is careful to not call wake_futex() on futex_q's with
a pi_state or an rt_waiter (which are either waiting for a
futex_unlock_pi() or a PI futex_requeue()), futex_wake_op() and
futex_requeue() do not perform the same test.
Update futex_wake_op() and futex_requeue() to test for q.pi_state and
q.rt_waiter and abort with -EINVAL if detected. To ensure any future
breakage is caught, add a WARN() to wake_futex() if the same condition
is true.
This fix has seen 3 hours of testing with "trinity -c futex" on an
x86_64 VM with 4 CPUS.
[[email protected]: tidy up the WARN()]
Signed-off-by: Darren Hart <[email protected]>
Reported-by: Dave Jones <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: John Kacur <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In get_sample_period(), unsigned long is not enough:
watchdog_thresh * 2 * (NSEC_PER_SEC / 5)
case1:
watchdog_thresh is 10 by default, the sample value will be: 0xEE6B2800
case2:
set watchdog_thresh is 20, the sample value will be: 0x1 DCD6 5000
In case2, we need use u64 to express the sample period. Otherwise,
changing the threshold thru proc often can not be successful.
Signed-off-by: liu chuansheng <[email protected]>
Acked-by: Don Zickus <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
into timers/core
Fix trivial conflicts in: kernel/time/tick-sched.c
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
'guarantee' is already removed from cgroup_task_migrate, so remove
the corresponding comments. Some other typos in cgroup are also
changed.
Cc: Tejun Heo <[email protected]>
Cc: Li Zefan <[email protected]>
Signed-off-by: Tao Ma <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
|
|
Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.
A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.
This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.
We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.
I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.
Signed-off-by: Eric W. Biederman <[email protected]>
|
|
file
To keep things sane in the context of file descriptor passing derive the
user namespace that uids are mapped into from the opener of the file
instead of from current.
When writing to the maps file the lower user namespace must always
be the parent user namespace, or setting the mapping simply does
not make sense. Enforce that the opener of the file was in
the parent user namespace or the user namespace whose mapping
is being set.
Acked-by: Serge E. Hallyn <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
- Add CLONE_THREAD to the unshare flags if CLONE_NEWUSER is selected
As changing user namespaces is only valid if all there is only
a single thread.
- Restore the code to add CLONE_VM if CLONE_THREAD is selected and
the code to addCLONE_SIGHAND if CLONE_VM is selected.
Making the constraints in the code clear.
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
This allows entering a user namespace, and the ability
to store a reference to a user namespace with a bind
mount.
Addition of missing userns_ns_put in userns_install
from Gao feng <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
The task_user_ns function hides the fact that it is getting the user
namespace from struct cred on the task. struct cred may go away as
soon as the rcu lock is released. This leads to a race where we
can dereference a stale user namespace pointer.
To make it obvious a struct cred is involved kill task_user_ns.
To kill the race modify the users of task_user_ns to only
reference the user namespace while the rcu lock is held.
Cc: Kees Cook <[email protected]>
Cc: James Morris <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
Modify create_new_namespaces to explicitly take a user namespace
parameter, instead of implicitly through the task_struct.
This allows an implementation of unshare(CLONE_NEWUSER) where
the new user namespace is not stored onto the current task_struct
until after all of the namespaces are created.
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
- Push the permission check from the core setns syscall into
the setns install methods where the user namespace of the
target namespace can be determined, and used in a ns_capable
call.
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
If an unprivileged user has the appropriate capabilities in their
current user namespace allow the creation of new namespaces.
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
Acked-by: Serge Hallyn <[email protected]>
Signed-off-by: "Eric W. Biederman" <[email protected]>
|
|
WARN shouldn't be used as a means of communicating failure to a userspace programmer.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Dave Jones <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
|
|
It seems that 'ftrace_enabled' flag should not be used inside the tracer
functions. The ftrace core is using this flag for internal purposes, and
the flag wasn't meant to be used in tracers' runtime checks.
stack tracer is the only tracer that abusing the flag. So stop it from
serving as a bad example.
Also, there is a local 'stack_trace_disabled' flag in the stack tracer,
which is never updated; so it can be removed as well.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Anton Vorontsov <[email protected]>
Signed-off-by: Steven Rostedt <[email protected]>
|
|
With the introduction of generic cgroup hierarchy iterators, css_id is
being phased out. It was unnecessarily complex, id'ing the wrong
thing (cgroups need IDs, not CSSes) and has other oddities like not
being available at ->css_alloc().
This patch adds cgroup->id, which is a simple per-hierarchy
ida-allocated ID which is assigned before ->css_alloc() and released
after ->css_free().
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
Acked-by: Neil Horman <[email protected]>
|
|
Currently CGRP_CPUSET_CLONE_CHILDREN triggers ->post_clone(). Now
that clone_children is cpuset specific, there's no reason to have this
rather odd option activation mechanism in cgroup core. cpuset can
check the flag from its ->css_allocate() and take the necessary
action.
Move cpuset_post_clone() logic to the end of cpuset_css_alloc() and
remove cgroup_subsys->post_clone().
Loosely based on Glauber's "generalize post_clone into post_create"
patch.
Signed-off-by: Tejun Heo <[email protected]>
Original-patch-by: Glauber Costa <[email protected]>
Original-patch: <[email protected]>
Acked-by: Serge E. Hallyn <[email protected]>
Acked-by: Li Zefan <[email protected]>
Cc: Glauber Costa <[email protected]>
|
|
clone_children is only meaningful for cpuset and will stay that way.
Rename the flag to reflect that and update documentation. Also, drop
clone_children() wrapper in cgroup.c. The thin wrapper is used only a
few times and one of them will go away soon.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Serge E. Hallyn <[email protected]>
Acked-by: Li Zefan <[email protected]>
Cc: Glauber Costa <[email protected]>
|
|
->css_alloc/online/offline/free()
Rename cgroup_subsys css lifetime related callbacks to better describe
what their roles are. Also, update documentation.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
There could be cases where controllers want to do initialization
operations which may fail from ->post_create(). This patch makes
->post_create() return -errno to indicate failure and online_css()
relay such failures.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
Cc: Glauber Costa <[email protected]>
|
|
cgroup_create() was ignoring failure of cgroupfs files. Update it
such that, if file creation fails, it rolls back by calling
cgroup_destroy_locked() and returns failure.
Note that error out goto labels are renamed. The labels are a bit
confusing but will become better w/ later cgroup operation renames.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
All cgroup directory i_mutexes nest outside cgroup_mutex; however, new
directory creation is a special case. A new cgroup directory is
created while holding cgroup_mutex. Populating the new directory
requires both the new directory's i_mutex and cgroup_mutex. Because
all directory i_mutexes nest outside cgroup_mutex, grabbing both
requires releasing cgroup_mutex first, which isn't a good idea as the
new cgroup isn't yet ready to be manipulated by other cgroup
opreations.
This is worked around by grabbing the new directory's i_mutex while
holding cgroup_mutex before making it visible. As there's no other
user at that point, grabbing the i_mutex under cgroup_mutex can't lead
to deadlock.
cgroup_create_file() was using I_MUTEX_CHILD to tell lockdep not to
worry about the reverse locking order; however, this creates pseudo
locking dependency cgroup_mutex -> I_MUTEX_CHILD, which isn't true -
all directory i_mutexes are still nested outside cgroup_mutex. This
pseudo locking dependency can lead to spurious lockdep warnings.
Use mutex_trylock() instead. This will always succeed and lockdep
doesn't create any locking dependency for it.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
Now that cgroup_unload_subsys() can tell whether the root css is
online or not, we can safely call cgroup_unload_subsys() after idr
init failure in cgroup_load_subsys().
Replace the manual unrolling and invoke cgroup_unload_subsys() on
failure. This drops cgroup_mutex inbetween but should be safe as the
subsystem will fail try_module_get() and thus can't be mounted
inbetween. As this means that cgroup_unload_subsys() can be called
before css_sets are rehashed, remove BUG_ON() on %NULL
css_set->subsys[] from cgroup_unload_subsys().
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
New helpers on/offline_css() respectively wrap ->post_create() and
->pre_destroy() invocations. online_css() sets CSS_ONLINE after
->post_create() is complete and offline_css() invokes ->pre_destroy()
iff CSS_ONLINE is set and clears it while also handling the temporary
dropping of cgroup_mutex.
This patch doesn't introduce any behavior change at the moment but
will be used to improve cgroup_create() failure path and allow
->post_create() to fail.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
Separate out cgroup_destroy_locked() from cgroup_destroy(). This will
be later used in cgroup_create() failure path.
While at it, add lockdep asserts on i_mutex and cgroup_mutex, and move
@d and @parent assignments to their declarations.
This patch doesn't introduce any functional difference.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
cgroup_unload_subsys()
* If idr init fails, cgroup_load_subsys() cleared dummytop->subsys[]
before calilng ->destroy() making CSS inaccessible to the callback,
and didn't unlink ss->sibling. As no modular controller uses
->use_id, this doesn't cause any actual problems.
* cgroup_unload_subsys() was forgetting to free idr, call
->pre_destroy() and clear ->active. As there currently is no
modular controller which uses ->use_id, ->pre_destroy() or ->active,
this doesn't cause any actual problems.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|
|
Make cgroup_init_subsys() grab cgroup_mutex while initializing a
subsystem so that all helpers and callbacks are called under the
context they expect. This isn't strictly necessary as
cgroup_init_subsys() doesn't race with anybody but will allow adding
lockdep assertions.
Signed-off-by: Tejun Heo <[email protected]>
Acked-by: Li Zefan <[email protected]>
|