Age | Commit message (Collapse) | Author | Files | Lines |
|
In embedded systems, sometimes the same program (busybox) is the cause of
multiple warnings. Outputting the pid with the program name in the
warning printk helps distinguish which instances of a program are using
the stack most.
This is a small patch, but useful.
Signed-off-by: Tim Bird <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 8f92054e7ca1 ("CRED: Fix __task_cred()'s lockdep check and banner
comment"):
add the following validation condition:
task->exit_state >= 0
to permit the access if the target task is dead and therefore
unable to change its own credentials.
OK, but afaics currently this can only help wait_task_zombie() which calls
__task_cred() without rcu lock.
Remove this validation and change wait_task_zombie() to use task_uid()
instead. This means we do rcu_read_lock() only to shut up the lockdep,
but we already do the same in, say, wait_task_stopped().
task_is_dead() should die, task->exit_state != 0 means that this task has
passed exit_notify(), only do_wait-like code paths should use this.
Unfortunately, we can't kill task_is_dead() right now, it has already
acquired buggy users in drivers/staging. The fix already exists.
Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: "Eric W. Biederman" <[email protected]>
Acked-by: David Howells <[email protected]>
Cc: Jiri Olsa <[email protected]>
Cc: Paul E. McKenney <[email protected]>
Cc: James Morris <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Warning(kernel/kmod.c:419): No description found for parameter 'depth'
Signed-off-by: Randy Dunlap <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
If we move call_usermodehelper_fns() to kmod.c file and EXPORT_SYMBOL it
we can avoid exporting all it's helper functions:
call_usermodehelper_setup
call_usermodehelper_setfns
call_usermodehelper_exec
And make all of them static to kmod.c
Since the optimizer will see all these as a single call site it will
inline them inside call_usermodehelper_fns(). So we loose the call to
_fns but gain 3 calls to the helpers. (Not that it matters)
Signed-off-by: Boaz Harrosh <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Both kernel/sys.c && security/keys/request_key.c where inlining the exact
same code as call_usermodehelper_fns(); So simply convert these sites to
directly use call_usermodehelper_fns().
Signed-off-by: Boaz Harrosh <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
call_usermodehelper_freeinfo() is not used outside of kmod.c. So unexport
it, and make it static to kmod.c
Signed-off-by: Boaz Harrosh <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Tetsuo Handa <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Signed-off-by: Nicolas Pitre <[email protected]>
Acked-by: Colin Cross <[email protected]>
Acked-by: Santosh Shilimkar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Use the module-wide pr_fmt() mechanism rather than open-coding "genirq: "
everywhere.
Cc: Thomas Gleixner <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
uts_kern_table
sethostname() and setdomainname() notify userspace on failure (without
modifying uts_kern_table). Change things so that we only notify userspace
on success, when uts_kern_table was actually modified.
Signed-off-by: Sasikantha babu <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: WANG Cong <[email protected]>
Reviewed-by: Cyrill Gorcunov <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In the comment of allocate_resource(), the explanation of parameter max
and min is not correct.
Actually, these two parameters are used to specify the range of the
resource that will be allocated, not the min/max size that will be
allocated.
Signed-off-by: Wei Yang <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The @func callback was invoked twice for group leader when
perf_event_for_each() called. It seems the commit 75f937f24bd9
("perf_counter: Fix ctx->mutex vs counter ->mutex inversion") made the
mistake during the change.
Signed-off-by: Namhyung Kim <[email protected]>
Acked-by: Peter Zijlstra <[email protected]>
Cc: Namhyung Kim <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf updates from Ingo Molnar.
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
perf ui browser: Stop using 'self'
perf annotate browser: Read perf config file for settings
perf config: Allow '_' in config file variable names
perf annotate browser: Make feature toggles global
perf annotate browser: The idx_asm field should be used in asm only view
perf tools: Convert critical messages to ui__error()
perf ui: Make --stdio default when TUI is not supported
tools lib traceevent: Silence compiler warning on 32bit build
perf record: Fix branch_stack type in perf_record_opts
perf tools: Reconstruct event with modifiers from perf_event_attr
perf top: Fix counter name fixup when fallbacking to cpu-clock
perf tools: fix thread_map__new_by_pid_str() memory leak in error path
perf tools: Do not use _FORTIFY_SOURCE when DEBUG=1 is specified
tools lib traceevent: Fix signature of create_arg_item()
tools lib traceevent: Use proper function parameter type
tools lib traceevent: Fix freeing arg on process_dynamic_array()
tools lib traceevent: Fix a possibly wrong memory dereference
tools lib traceevent: Fix a possible memory leak
tools lib traceevent: Allow expressions in __print_symbolic() fields
perf evlist: Explicititely initialize input_name
...
|
|
Merge block/IO core bits from Jens Axboe:
"This is a bit bigger on the core side than usual, but that is purely
because we decided to hold off on parts of Tejun's submission on 3.4
to give it a bit more time to simmer. As a consequence, it's seen a
long cycle in for-next.
It contains:
- Bug fix from Dan, wrong locking type.
- Relax splice gifting restriction from Eric.
- A ton of updates from Tejun, primarily for blkcg. This improves
the code a lot, making the API nicer and cleaner, and also includes
fixes for how we handle and tie policies and re-activate on
switches. The changes also include generic bug fixes.
- A simple fix from Vivek, along with a fix for doing proper delayed
allocation of the blkcg stats."
Fix up annoying conflict just due to different merge resolution in
Documentation/feature-removal-schedule.txt
* 'for-3.5/core' of git://git.kernel.dk/linux-block: (92 commits)
blkcg: tg_stats_alloc_lock is an irq lock
vmsplice: relax alignement requirements for SPLICE_F_GIFT
blkcg: use radix tree to index blkgs from blkcg
blkcg: fix blkcg->css ref leak in __blkg_lookup_create()
block: fix elvpriv allocation failure handling
block: collapse blk_alloc_request() into get_request()
blkcg: collapse blkcg_policy_ops into blkcg_policy
blkcg: embed struct blkg_policy_data in policy specific data
blkcg: mass rename of blkcg API
blkcg: style cleanups for blk-cgroup.h
blkcg: remove blkio_group->path[]
blkcg: blkg_rwstat_read() was missing inline
blkcg: shoot down blkgs if all policies are deactivated
blkcg: drop stuff unused after per-queue policy activation update
blkcg: implement per-queue policy activation
blkcg: add request_queue->root_blkg
blkcg: make request_queue bypassing on allocation
blkcg: make sure blkg_lookup() returns %NULL if @q is bypassing
blkcg: make blkg_conf_prep() take @pol and return with queue lock held
blkcg: remove static policy ID enums
...
|
|
Remove explicit NULL assignment of static pointer
dattr_cur from init_sched_domains().
Signed-off-by: Kamalesh Babulal <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
No need to have the last NULL entry.
Signed-off-by: Hiroshi Shimamoto <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
The strings sched_feat_names are never changed.
Signed-off-by: Hiroshi Shimamoto <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
task_tick_rt() has an optimization to only reschedule SCHED_RR tasks
if they were the only element on their rq. However, with cgroups
a SCHED_RR task could be the only element on its per-cgroup rq but
still be competing with other SCHED_RR tasks in its parent's
cgroup. In this case, the SCHED_RR task in the child cgroup would
never yield at the end of its timeslice. If the child cgroup
rt_runtime_us was the same as the parent cgroup rt_runtime_us,
the task in the parent cgroup would starve completely.
Modify task_tick_rt() to check that the task is the only task on its
rq, and that the each of the scheduling entities of its ancestors
is also the only entity on its rq.
Signed-off-by: Colin Cross <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Since nr_cpus_allowed is used outside of sched/rt.c and wants to be
used outside of there more, move it to a more natural site.
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
We could re-read rq->rt_avg after we validated it was smaller than
total, invalidating the check and resulting in an unintended negative.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: David Rientjes <[email protected]>
Link: http://lkml.kernel.org/r/1337688268.9698.29.camel@twins
Signed-off-by: Ingo Molnar <[email protected]>
|
|
SD_OVERLAP exists to allow overlapping groups, overlapping groups
appear in NUMA topologies that aren't fully connected.
The typical result of not fully connected NUMA is that each cpu (or
rather node) will have different spans for a particular distance.
However due to how sched domains are traversed -- only the first cpu
in the mask goes one level up -- the next level only cares about the
spans of the cpus that went up.
Due to this two things were observed to be broken:
- build_overlap_sched_groups() -- since its possible the cpu we're
building the groups for exists in multiple (or all) groups, the
selection criteria of the first group didn't ensure there was a cpu
for which is was true that cpumask_first(span) == cpu. Thus load-
balancing would terminate.
- update_group_power() -- assumed that the cpu span of the first
group of the domain was covered by all groups of the child domain.
The above explains why this isn't true, so deal with it.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: David Rientjes <[email protected]>
Link: http://lkml.kernel.org/r/1337788843.9783.14.camel@laptop
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Allocators don't appreciate it when you try and allocate memory from
offline nodes.
Reported-and-tested-by: Tony Luck <[email protected]>
Reported-and-tested-by: Anton Blanchard <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Follow up on commit 556061b00 ("sched/nohz: Fix rq->cpu_load[]
calculations") since while that fixed the busy case it regressed the
mostly idle case.
Add a callback from the nohz exit to also age the rq->cpu_load[]
array. This closes the hole where either there was no nohz load
balance pass during the nohz, or there was a 'significant' amount of
idle time between the last nohz balance and the nohz exit.
So we'll update unconditionally from the tick to not insert any
accidental 0 load periods while busy, and we try and catch up from
nohz idle balance and nohz exit. Both these are still prone to missing
a jiffy, but that has always been the case.
Signed-off-by: Peter Zijlstra <[email protected]>
Cc: [email protected]
Cc: Venkatesh Pallipadi <[email protected]>
Link: http://lkml.kernel.org/n/[email protected]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Merge back Linus's latest branch so that we pick up the uprobes changes.
( I tested this branch locally and while it's one from the middle of the
merge window it's a good one to base further work off. )
Signed-off-by: Ingo Molnar <[email protected]>
|
|
lglocks and brlocks are currently generated with some complicated macros
in lglock.h. But there's no reason to not just use common utility
functions and put all the data into a common data structure.
Since there are at least two users it makes sense to share this code in a
library. This is also easier maintainable than a macro forest.
This will also make it later possible to dynamically allocate lglocks and
also use them in modules (this would both still need some additional, but
now straightforward, code)
[[email protected]: checkpatch fixes]
Signed-off-by: Andi Kleen <[email protected]>
Cc: Al Viro <[email protected]>
Cc: Rusty Russell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Rusty Russell <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Using %ps in a printk format will sometimes fail silently and print the
empty string if the address passed in does not match a symbol that
kallsyms knows about. But using %pS will fall back to printing the full
address if kallsyms can't find the symbol. Make %ps act the same as %pS
by falling back to printing the address.
While we're here also make %ps print the module that a symbol comes from
so that it matches what %pS already does. Take this simple function for
example (in a module):
static void test_printk(void)
{
int test;
pr_info("with pS: %pS\n", &test);
pr_info("with ps: %ps\n", &test);
}
Before this patch:
with pS: 0xdff7df44
with ps:
After this patch:
with pS: 0xdff7df44
with ps: 0xdff7df44
Signed-off-by: Stephen Boyd <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When killing a res_counter which is a child of other counter, we need to
do
res_counter_uncharge(child, xxx)
res_counter_charge(parent, xxx)
This is not atomic and wastes CPU. This patch adds
res_counter_uncharge_until(). This function's uncharge propagates to
ancestors until specified res_counter.
res_counter_uncharge_until(child, parent, xxx)
Now the operation is atomic and efficient.
Signed-off-by: Frederic Weisbecker <[email protected]>
Signed-off-by: KAMEZAWA Hiroyuki <[email protected]>
Cc: Aneesh Kumar K.V <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Ying Han <[email protected]>
Cc: Glauber Costa <[email protected]>
Reviewed-by: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Library functions should not grab locks when the callsites can do it,
even if the lock nests like the rcu read-side lock does.
Push the rcu_read_lock() from css_is_ancestor() to its single user,
mem_cgroup_same_or_subtree() in preparation for another user that may
already hold the rcu read-side lock.
Signed-off-by: Johannes Weiner <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Acked-by: KAMEZAWA Hiroyuki <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Li Zefan <[email protected]>
Cc: Li Zefan <[email protected]>
Cc: Tejun Heo <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The vma length in dup_mmap is calculated and stored in a unsigned int,
which is insufficient and hence overflows for very large maps (beyond
16TB). The following program demonstrates this:
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#define GIG 1024 * 1024 * 1024L
#define EXTENT 16393
int main(void)
{
int i, r;
void *m;
char buf[1024];
for (i = 0; i < EXTENT; i++) {
m = mmap(NULL, (size_t) 1 * 1024 * 1024 * 1024L,
PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, 0, 0);
if (m == (void *)-1)
printf("MMAP Failed: %d\n", m);
else
printf("%d : MMAP returned %p\n", i, m);
r = fork();
if (r == 0) {
printf("%d: successed\n", i);
return 0;
} else if (r < 0)
printf("FORK Failed: %d\n", r);
else if (r > 0)
wait(NULL);
}
return 0;
}
Increase the storage size of the result to unsigned long, which is
sufficient for storing the difference between addresses.
Signed-off-by: Siddhesh Poyarekar <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Oleg Nesterov <[email protected]>
Cc: Jens Axboe <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The swap token code no longer fits in with the current VM model. It
does not play well with cgroups or the better NUMA placement code in
development, since we have only one swap token globally.
It also has the potential to mess with scalability of the system, by
increasing the number of non-reclaimable pages on the active and
inactive anon LRU lists.
Last but not least, the swap token code has been broken for a year
without complaints, as reported by Konstantin Khlebnikov. This suggests
we no longer have much use for it.
The days of sub-1G memory systems with heavy use of swap are over. If
we ever need thrashing reducing code in the future, we will have to
implement something that does scale.
Signed-off-by: Rik van Riel <[email protected]>
Cc: Konstantin Khlebnikov <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Hugh Dickins <[email protected]>
Acked-by: Bob Picco <[email protected]>
Acked-by: KOSAKI Motohiro <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
48ddbe1946 "cgroup: make css->refcnt clearing on cgroup removal
optional" allowed a css to linger after the associated cgroup is
removed. As a css holds a reference on the cgroup's dentry, it means
that cgroup dentries may linger for a while.
cgroup_create() does grab an active reference on the superblock to
prevent it from going away while there are !root cgroups; however, the
reference is put from cgroup_diput() which is invoked on cgroup
removal, so cgroup dentries which are removed but persisting due to
lingering csses already have released their superblock active refs
allowing superblock to be killed while those dentries are around.
Given the right condition, this makes cgroup_kill_sb() call
kill_litter_super() with dentries with non-zero d_count leading to
BUG() in shrink_dcache_for_umount_subtree().
Fix it by adding cgroup_dops->d_release() operation and moving
deactivate_super() to it. cgroup_diput() now marks dentry->d_fsdata
with itself if superblock should be deactivated and cgroup_d_release()
deactivates the superblock on dentry release.
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Sasha Levin <[email protected]>
Tested-by: Sasha Levin <[email protected]>
LKML-Reference: <CA+1xoqe5hMuxzCRhMy7J0XchDk2ZnuxOHJKikROk1-ReAzcT6g@mail.gmail.com>
Acked-by: Li Zefan <[email protected]>
|
|
commit 5307c95 (tick: Add tick skew boot option) broke the
!CONFIG_HIGH_RES_TIMERS build.
Move the boot option parsing into the CONFIG_HIGH_RES_TIMERS section.
Reported-by: Ingo Molnar <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
Cc: Mike Galbraith <[email protected]>
|
|
Make clockevents_config() into a global symbol to allow it to be used
by compiled-in clockevent drivers. This is needed by drivers that want
to update the timer frequency after registration time.
Signed-off-by: Magnus Damm <[email protected]>
Tested-by: Simon Horman <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: Magnus Damm <[email protected]>
Link: http://lkml.kernel.org/r/20120509143934.27521.46553.sendpatchset@w520
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Let the user decide whether power consumption or jitter is the
more important consideration for their machines.
Quoting removal commit af5ab277ded04bd9bc6b048c5a2f0e7d70ef0867:
"Historically, Linux has tried to make the regular timer tick on the
various CPUs not happen at the same time, to avoid contention on
xtime_lock.
Nowadays, with the tickless kernel, this contention no longer happens
since time keeping and updating are done differently. In addition,
this skew is actually hurting power consumption in a measurable way on
many-core systems."
Problems:
- Contrary to the above, systems do encounter contention on both
xtime_lock and RCU structure locks when the tick is synchronized.
- Moderate sized RT systems suffer intolerable jitter due to the tick
being synchronized.
- SGI reports the same for their large systems.
- Fully utilized systems reap no power saving benefit from skew removal,
but do suffer from resulting induced lock contention.
- 0209f649 rcu: limit rcu_node leaf-level fanout
This patch was born to combat lock contention which testing showed
to have been _induced by_ skew removal. Skew the tick, contention
disappeared virtually completely.
Signed-off-by: Mike Galbraith <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Pull KVM changes from Avi Kivity:
"Changes include additional instruction emulation, page-crossing MMIO,
faster dirty logging, preventing the watchdog from killing a stopped
guest, module autoload, a new MSI ABI, and some minor optimizations
and fixes. Outside x86 we have a small s390 and a very large ppc
update.
Regarding the new (for kvm) rebaseless workflow, some of the patches
that were merged before we switch trees had to be rebased, while
others are true pulls. In either case the signoffs should be correct
now."
Fix up trivial conflicts in Documentation/feature-removal-schedule.txt
arch/powerpc/kvm/book3s_segment.S and arch/x86/include/asm/kvm_para.h.
I suspect the kvm_para.h resolution ends up doing the "do I have cpuid"
check effectively twice (it was done differently in two different
commits), but better safe than sorry ;)
* 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (125 commits)
KVM: make asm-generic/kvm_para.h have an ifdef __KERNEL__ block
KVM: s390: onereg for timer related registers
KVM: s390: epoch difference and TOD programmable field
KVM: s390: KVM_GET/SET_ONEREG for s390
KVM: s390: add capability indicating COW support
KVM: Fix mmu_reload() clash with nested vmx event injection
KVM: MMU: Don't use RCU for lockless shadow walking
KVM: VMX: Optimize %ds, %es reload
KVM: VMX: Fix %ds/%es clobber
KVM: x86 emulator: convert bsf/bsr instructions to emulate_2op_SrcV_nobyte()
KVM: VMX: unlike vmcs on fail path
KVM: PPC: Emulator: clean up SPR reads and writes
KVM: PPC: Emulator: clean up instruction parsing
kvm/powerpc: Add new ioctl to retreive server MMU infos
kvm/book3s: Make kernel emulated H_PUT_TCE available for "PR" KVM
KVM: PPC: bookehv: Fix r8/r13 storing in level exception handler
KVM: PPC: Book3S: Enable IRQs during exit handling
KVM: PPC: Fix PR KVM on POWER7 bare metal
KVM: PPC: Fix stbux emulation
KVM: PPC: bookehv: Use lwz/stw instead of PPC_LL/PPC_STL for 32-bit fields
...
|
|
The comment over idle_threads_init() really talks about the functionality
of idle_init(). Move that comment to idle_init(), and add a suitable
comment over idle_threads_init().
Signed-off-by: Srivatsa S. Bhat <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
While trying to initialize idle threads for all cpus, idle_threads_init()
calls smp_processor_id() in a loop, which is unnecessary. The intent
is to initialize idle threads for all non-boot cpus. So just use a variable
to note the boot cpu and use it in the loop.
Signed-off-by: Srivatsa S. Bhat <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Pull irqdomain changes from Grant Likely:
"Minor changes and fixups for irqdomain infrastructure. The most
important change adds the ability to remove a registered irqdomain."
* tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6:
irqdomain: Document size parameter of irq_domain_add_linear()
irqdomain: trivial pr_fmt conversion.
irqdomain: Kill off duplicate definitions.
irqdomain: Make irq_domain_simple_map() static.
irqdomain: Export remaining public API symbols.
irqdomain: Support removal of IRQ domains.
|
|
All invocations of chip->irq_set_affinity() are doing the same return
value checks. Let them all use a common function.
[ tglx: removed the silly likely while at it ]
Signed-off-by: Jiang Liu <[email protected]>
Cc: Jiang Liu <[email protected]>
Cc: Keping Chen <[email protected]>
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer updates from Thomas Gleixner.
Various trivial conflict fixups in arch Kconfig due to addition of
unrelated entries nearby. And one slightly more subtle one for sparc32
(new user of GENERIC_CLOCKEVENTS), fixed up as per Thomas.
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits)
timekeeping: Fix a few minor newline issues.
time: remove obsolete declaration
ntp: Fix a stale comment and a few stray newlines.
ntp: Correct TAI offset during leap second
timers: Fixup the Kconfig consolidation fallout
x86: Use generic time config
unicore32: Use generic time config
um: Use generic time config
tile: Use generic time config
sparc: Use: generic time config
sh: Use generic time config
score: Use generic time config
s390: Use generic time config
openrisc: Use generic time config
powerpc: Use generic time config
mn10300: Use generic time config
mips: Use generic time config
microblaze: Use generic time config
m68k: Use generic time config
m32r: Use generic time config
...
|
|
Every interrupt which is an active wakeup source needs the ability to
abort suspend if there is a pending irq. Right now only edge and level
irqs can do that.
|
+---------+
| INTC |
+---------+
| GPIO_IRQ
+------------+
| gpio-exp |
+------------+
| |
GPIO0_IRQ GPIO1_IRQ
In the above diagram, gpio expander has irq number GPIO_IRQ, it is
connected with two sub GPIO pins, GPIO0 and GPIO1.
During suspend, we set IRQF_NO_SUSPEND for GPIO_IRQ so that gpio
expander driver can handle the sub irq GPIO0_IRQ and GPIO1_IRQ, and
these two irqs themselves can further be handled by simple or nested
irq in some drivers(typically gpio and mfd driver). If they are used
as wakeup sources during suspend, we want them to be able to abort
suspend too.
Setting IRQS_PENDING flag in handle_nested_irq() and handle_simple_irq()
when the irq is disabled allows check_wakeup_irqs() to identify such
irqs as source for aborting suspend.
Signed-off-by: Ning Jiang <[email protected]>
Cc: [email protected]
Link: http://lkml.kernel.org/r/CAH3Oq6T905%[email protected]
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Pull more networking updates from David Miller:
"Ok, everything from here on out will be bug fixes."
1) One final sync of wireless and bluetooth stuff from John Linville.
These changes have all been in his tree for more than a week, and
therefore have had the necessary -next exposure. John was just away
on a trip and didn't have a change to send the pull request until a
day or two ago.
2) Put back some defines in user exposed header file areas that were
removed during the tokenring purge. From Stephen Hemminger and Paul
Gortmaker.
3) A bug fix for UDP hash table allocation got lost in the pile due to
one of those "you got it.. no I've got it.." situations. :-)
From Tim Bird.
4) SKB coalescing in TCP needs to have stricter checks, otherwise we'll
try to coalesce overlapping frags and crash. Fix from Eric Dumazet.
5) RCU routing table lookups can race with free_fib_info(), causing
crashes when we deref the device pointers in the route. Fix by
releasing the net device in the RCU callback. From Yanmin Zhang.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (293 commits)
tcp: take care of overlaps in tcp_try_coalesce()
ipv4: fix the rcu race between free_fib_info and ip_route_output_slow
mm: add a low limit to alloc_large_system_hash
ipx: restore token ring define to include/linux/ipx.h
if: restore token ring ARP type to header
xen: do not disable netfront in dom0
phy/micrel: Fix ID of KSZ9021
mISDN: Add X-Tensions USB ISDN TA XC-525
gianfar:don't add FCB length to hard_header_len
Bluetooth: Report proper error number in disconnection
Bluetooth: Create flags for bt_sk()
Bluetooth: report the right security level in getsockopt
Bluetooth: Lock the L2CAP channel when sending
Bluetooth: Restore locking semantics when looking up L2CAP channels
Bluetooth: Fix a redundant and problematic incoming MTU check
Bluetooth: Add support for Foxconn/Hon Hai AR5BBU22 0489:E03C
Bluetooth: Fix EIR data generation for mgmt_device_found
Bluetooth: Fix Inquiry with RSSI event mask
Bluetooth: improve readability of l2cap_seq_list code
Bluetooth: Fix skb length calculation
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull user-space probe instrumentation from Ingo Molnar:
"The uprobes code originates from SystemTap and has been used for years
in Fedora and RHEL kernels. This version is much rewritten, reviews
from PeterZ, Oleg and myself shaped the end result.
This tree includes uprobes support in 'perf probe' - but SystemTap
(and other tools) can take advantage of user probe points as well.
Sample usage of uprobes via perf, for example to profile malloc()
calls without modifying user-space binaries.
First boot a new kernel with CONFIG_UPROBE_EVENT=y enabled.
If you don't know which function you want to probe you can pick one
from 'perf top' or can get a list all functions that can be probed
within libc (binaries can be specified as well):
$ perf probe -F -x /lib/libc.so.6
To probe libc's malloc():
$ perf probe -x /lib64/libc.so.6 malloc
Added new event:
probe_libc:malloc (on 0x7eac0)
You can now use it in all perf tools, such as:
perf record -e probe_libc:malloc -aR sleep 1
Make use of it to create a call graph (as the flat profile is going to
look very boring):
$ perf record -e probe_libc:malloc -gR make
[ perf record: Woken up 173 times to write data ]
[ perf record: Captured and wrote 44.190 MB perf.data (~1930712
$ perf report | less
32.03% git libc-2.15.so [.] malloc
|
--- malloc
29.49% cc1 libc-2.15.so [.] malloc
|
--- malloc
|
|--0.95%-- 0x208eb1000000000
|
|--0.63%-- htab_traverse_noresize
11.04% as libc-2.15.so [.] malloc
|
--- malloc
|
7.15% ld libc-2.15.so [.] malloc
|
--- malloc
|
5.07% sh libc-2.15.so [.] malloc
|
--- malloc
|
4.99% python-config libc-2.15.so [.] malloc
|
--- malloc
|
4.54% make libc-2.15.so [.] malloc
|
--- malloc
|
|--7.34%-- glob
| |
| |--93.18%-- 0x41588f
| |
| --6.82%-- glob
| 0x41588f
...
Or:
$ perf report -g flat | less
# Overhead Command Shared Object Symbol
# ........ ............. ............. ..........
#
32.03% git libc-2.15.so [.] malloc
27.19%
malloc
29.49% cc1 libc-2.15.so [.] malloc
24.77%
malloc
11.04% as libc-2.15.so [.] malloc
11.02%
malloc
7.15% ld libc-2.15.so [.] malloc
6.57%
malloc
...
The core uprobes design is fairly straightforward: uprobes probe
points register themselves at (inode:offset) addresses of
libraries/binaries, after which all existing (or new) vmas that map
that address will have a software breakpoint injected at that address.
vmas are COW-ed to preserve original content. The probe points are
kept in an rbtree.
If user-space executes the probed inode:offset instruction address
then an event is generated which can be recovered from the regular
perf event channels and mmap-ed ring-buffer.
Multiple probes at the same address are supported, they create a
dynamic callback list of event consumers.
The basic model is further complicated by the XOL speedup: the
original instruction that is probed is copied (in an architecture
specific fashion) and executed out of line when the probe triggers.
The XOL area is a single vma per process, with a fixed number of
entries (which limits probe execution parallelism).
The API: uprobes are installed/removed via
/sys/kernel/debug/tracing/uprobe_events, the API is integrated to
align with the kprobes interface as much as possible, but is separate
to it.
Injecting a probe point is privileged operation, which can be relaxed
by setting perf_paranoid to -1.
You can use multiple probes as well and mix them with kprobes and
regular PMU events or tracepoints, when instrumenting a task."
Fix up trivial conflicts in mm/memory.c due to previous cleanup of
unmap_single_vma().
* 'perf-uprobes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
perf probe: Detect probe target when m/x options are absent
perf probe: Provide perf interface for uprobes
tracing: Fix kconfig warning due to a typo
tracing: Provide trace events interface for uprobes
tracing: Extract out common code for kprobes/uprobes trace events
tracing: Modify is_delete, is_return from int to bool
uprobes/core: Decrement uprobe count before the pages are unmapped
uprobes/core: Make background page replacement logic account for rss_stat counters
uprobes/core: Optimize probe hits with the help of a counter
uprobes/core: Allocate XOL slots for uprobes use
uprobes/core: Handle breakpoint and singlestep exceptions
uprobes/core: Rename bkpt to swbp
uprobes/core: Make order of function parameters consistent across functions
uprobes/core: Make macro names consistent
uprobes: Update copyright notices
uprobes/core: Move insn to arch specific structure
uprobes/core: Remove uprobe_opcode_sz
uprobes/core: Make instruction tables volatile
uprobes: Move to kernel/events/
uprobes/core: Clean up, refactor and improve the code
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media updates from Mauro Carvalho Chehab:
- some V4L2 API updates needed by embedded devices
- DVB API extensions for ATSC-MH delivery system, used in US for mobile
TV
- new tuners for fc0011/0012/0013 and tua9001
- a new dvb driver for af9033/9035
- a new ATSC-MH frontend (lg2160)
- new remote controller keymaps
- Removal of a few legacy webcam driver that got replaced by gspca on
several kernel versions ago
- a new driver for Exynos 4/5 webcams(s5pp fimc-lite)
- a new webcam sensor driver (smiapp)
- a new video input driver for embedded (sta2x1xx)
- several improvements, fixes, cleanups, etc inside the drivers.
Manually fix up conflicts due to err() -> dev_err() conversion in
drivers/staging/media/easycap/easycap_main.c
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (484 commits)
[media] saa7134-cards: Remove a PCI entry added by mistake
[media] radio-sf16fmi: add support for SF16-FMD
[media] rc-loopback: remove duplicate line
[media] patch for Asus My Cinema PS3-100 (1043:48cd)
[media] au0828: Move the Kconfig knob under V4L_USB_DRIVERS
[media] em28xx: simple comment fix
[media] [resend] radio-sf16fmr2: add PnP support for SF16-FMD2
[media] smiapp: Use v4l2_ctrl_new_int_menu() instead of v4l2_ctrl_new_custom()
[media] smiapp: Add support for 8-bit uncompressed formats
[media] smiapp: Allow generic quirk registers
[media] smiapp: Use non-binning limits if the binning limit is zero
[media] smiapp: Initialise rval in smiapp_read_nvm()
[media] smiapp: Round minimum pre_pll up rather than down in ip_clk_freq check
[media] smiapp: Use 8-bit reads only before identifying the sensor
[media] smiapp: Quirk for sensors that only do 8-bit reads
[media] smiapp: Pass struct sensor to register writing commands instead of i2c_client
[media] smiapp: Allow using external clock from the clock framework
[media] zl10353: change .read_snr() to report SNR as a 0.1 dB
[media] media: add support to gspca/pac7302.c for 093a:2627 (Genius FaceCam 300)
[media] m88rs2000 - only flip bit 2 on reg 0x70 on 16th try
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent
Pull an ftrace ring-buffer fix from Steve Rostedt:
* fix kernel crash when changing the size of the ring-buffer on
boxes where possible_cpus != online_cpus.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
UDP stack needs a minimum hash size value for proper operation and also
uses alloc_large_system_hash() for proper NUMA distribution of its hash
tables and automatic sizing depending on available system memory.
On some low memory situations, udp_table_init() must ignore the
alloc_large_system_hash() result and reallocs a bigger memory area.
As we cannot easily free old hash table, we leak it and kmemleak can
issue a warning.
This patch adds a low limit parameter to alloc_large_system_hash() to
solve this problem.
We then specify UDP_HTABLE_SIZE_MIN for UDP/UDPLite hash table
allocation.
Reported-by: Mark Asselstine <[email protected]>
Reported-by: Tim Bird <[email protected]>
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Paul Gortmaker <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Kill the no longer used task_struct->replacement_session_keyring, update
copy_creds() and exit_creds().
Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: David Howells <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: David Smith <[email protected]>
Cc: "Frank Ch. Eigler" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Larry Woodman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
exit_irq_thread() and task->irq_thread are needed to handle the unexpected
(and unlikely) exit of irq-thread.
We can use task_work instead and make this all private to
kernel/irq/manage.c, cleanup plus micro-optimization.
1. rename exit_irq_thread() to irq_thread_dtor(), make it
static, and move it up before irq_thread().
2. change irq_thread() to do task_work_add(irq_thread_dtor)
at the start and task_work_cancel() before return.
tracehook_notify_resume() can never play with kthreads,
only do_exit()->exit_task_work() can call the callback
and this is what we want.
3. remove task_struct->irq_thread and the special hook
in do_exit().
Signed-off-by: Oleg Nesterov <[email protected]>
Reviewed-by: Thomas Gleixner <[email protected]>
Cc: David Howells <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: David Smith <[email protected]>
Cc: "Frank Ch. Eigler" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Larry Woodman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
Provide a simple mechanism that allows running code in the (nonatomic)
context of the arbitrary task.
The caller does task_work_add(task, task_work) and this task executes
task_work->func() either from do_notify_resume() or from do_exit(). The
callback can rely on PF_EXITING to detect the latter case.
"struct task_work" can be embedded in another struct, still it has "void
*data" to handle the most common/simple case.
This allows us to kill the ->replacement_session_keyring hack, and
potentially this can have more users.
Performance-wise, this adds 2 "unlikely(!hlist_empty())" checks into
tracehook_notify_resume() and do_exit(). But at the same time we can
remove the "replacement_session_keyring != NULL" checks from
arch/*/signal.c and exit_creds().
Note: task_work_add/task_work_run abuses ->pi_lock. This is only because
this lock is already used by lookup_pi_state() to synchronize with
do_exit() setting PF_EXITING. Fortunately the scope of this lock in
task_work.c is really tiny, and the code is unlikely anyway.
Signed-off-by: Oleg Nesterov <[email protected]>
Acked-by: David Howells <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Richard Kuo <[email protected]>
Cc: Linus Torvalds <[email protected]>
Cc: Alexander Gordeev <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: David Smith <[email protected]>
Cc: "Frank Ch. Eigler" <[email protected]>
Cc: Geert Uytterhoeven <[email protected]>
Cc: Larry Woodman <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Tejun Heo <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Al Viro <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal
Pull first series of signal handling cleanups from Al Viro:
"This is just the first part of the queue (about a half of it);
assorted fixes all over the place in signal handling.
This one ends with all sigsuspend() implementations switched to
generic one (->saved_sigmask-based).
With this, a bunch of assorted old buglets are fixed and most of the
missing bits of NOTIFY_RESUME hookup are in place. Two more fixes sit
in arm and um trees respectively, and there's a couple of broken ones
that need obvious fixes - parisc and avr32 check TIF_NOTIFY_RESUME
only on one of two codepaths; fixes for that will happen in the next
series"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (55 commits)
unicore32: if there's no handler we need to restore sigmask, syscall or no syscall
xtensa: add handling of TIF_NOTIFY_RESUME
microblaze: drop 'oldset' argument of do_notify_resume()
microblaze: handle TIF_NOTIFY_RESUME
score: add handling of NOTIFY_RESUME to do_notify_resume()
m68k: add TIF_NOTIFY_RESUME and handle it.
sparc: kill ancient comment in sparc_sigaction()
h8300: missing checks of __get_user()/__put_user() return values
frv: missing checks of __get_user()/__put_user() return values
cris: missing checks of __get_user()/__put_user() return values
powerpc: missing checks of __get_user()/__put_user() return values
sh: missing checks of __get_user()/__put_user() return values
sparc: missing checks of __get_user()/__put_user() return values
avr32: struct old_sigaction is never used
m32r: struct old_sigaction is never used
xtensa: xtensa_sigaction doesn't exist
alpha: tidy signal delivery up
score: don't open-code force_sigsegv()
cris: don't open-code force_sigsegv()
blackfin: don't open-code force_sigsegv()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace enhancements from Eric Biederman:
"This is a course correction for the user namespace, so that we can
reach an inexpensive, maintainable, and reasonably complete
implementation.
Highlights:
- Config guards make it impossible to enable the user namespace and
code that has not been converted to be user namespace safe.
- Use of the new kuid_t type ensures the if you somehow get past the
config guards the kernel will encounter type errors if you enable
user namespaces and attempt to compile in code whose permission
checks have not been updated to be user namespace safe.
- All uids from child user namespaces are mapped into the initial
user namespace before they are processed. Removing the need to add
an additional check to see if the user namespace of the compared
uids remains the same.
- With the user namespaces compiled out the performance is as good or
better than it is today.
- For most operations absolutely nothing changes performance or
operationally with the user namespace enabled.
- The worst case performance I could come up with was timing 1
billion cache cold stat operations with the user namespace code
enabled. This went from 156s to 164s on my laptop (or 156ns to
164ns per stat operation).
- (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
Most uid/gid setting system calls treat these value specially
anyway so attempting to use -1 as a uid would likely cause
entertaining failures in userspace.
- If setuid is called with a uid that can not be mapped setuid fails.
I have looked at sendmail, login, ssh and every other program I
could think of that would call setuid and they all check for and
handle the case where setuid fails.
- If stat or a similar system call is called from a context in which
we can not map a uid we lie and return overflowuid. The LFS
experience suggests not lying and returning an error code might be
better, but the historical precedent with uids is different and I
can not think of anything that would break by lying about a uid we
can't map.
- Capabilities are localized to the current user namespace making it
safe to give the initial user in a user namespace all capabilities.
My git tree covers all of the modifications needed to convert the core
kernel and enough changes to make a system bootable to runlevel 1."
Fix up trivial conflicts due to nearby independent changes in fs/stat.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
userns: Silence silly gcc warning.
cred: use correct cred accessor with regards to rcu read lock
userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
userns: Convert cgroup permission checks to use uid_eq
userns: Convert tmpfs to use kuid and kgid where appropriate
userns: Convert sysfs to use kgid/kuid where appropriate
userns: Convert sysctl permission checks to use kuid and kgids.
userns: Convert proc to use kuid/kgid where appropriate
userns: Convert ext4 to user kuid/kgid where appropriate
userns: Convert ext3 to use kuid/kgid where appropriate
userns: Convert ext2 to use kuid/kgid where appropriate.
userns: Convert devpts to use kuid/kgid where appropriate
userns: Convert binary formats to use kuid/kgid where appropriate
userns: Add negative depends on entries to avoid building code that is userns unsafe
userns: signal remove unnecessary map_cred_ns
userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
userns: Convert stat to return values mapped from kuids and kgids
userns: Convert user specfied uids and gids in chown into kuids and kgid
userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
...
|