aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/mm/fault.c
AgeCommit message (Collapse)AuthorFilesLines
2013-04-02x86, cpu: Convert F00F bug detectionBorislav Petkov1-1/+1
... to using the new facility and drop the cpuinfo_x86 member. Signed-off-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-03-07context_tracking: Restore correct previous context state on exception exitFrederic Weisbecker1-2/+4
On exception exit, we restore the previous context tracking state based on the regs of the interrupted frame. Iff that frame is in user mode as stated by user_mode() helper, we restore the context tracking user mode. However there is a tiny chunck of low level arch code after we pass through user_enter() and until the CPU eventually resumes userspace. If an exception happens in this tiny area, exception_enter() correctly exits the context tracking user mode but exception_exit() won't restore it because of the value returned by user_mode(regs). As a result we may return to userspace with the wrong context tracking state. To fix this, change exception_enter() to return the context tracking state prior to its call and pass this saved state to exception_exit(). This restores the real context tracking state of the interrupted frame. (May be this patch was suggested to me, I don't recall exactly. If so, sorry for the missing credit). Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Li Zhong <[email protected]> Cc: Kevin Hilman <[email protected]> Cc: Mats Liljegren <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Paul E. McKenney <[email protected]>
2013-03-07context_tracking: Move exception handling to generic codeFrederic Weisbecker1-1/+1
Exceptions handling on context tracking should share common treatment: on entry we exit user mode if the exception triggered in that context. Then on exception exit we return to that previous context. Generalize this to avoid duplication across archs. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Li Zhong <[email protected]> Cc: Kevin Hilman <[email protected]> Cc: Mats Liljegren <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Paul E. McKenney <[email protected]>
2013-02-24Revert "x86, mm: Make spurious_fault check explicitly check explicitly check ↵Andrea Arcangeli1-7/+1
the PRESENT bit" I got a report for a minor regression introduced by commit 027ef6c87853b ("mm: thp: fix pmd_present for split_huge_page and PROT_NONE with THP"). So the problem is, pageattr creates kernel pagetables (pte and pmds) that breaks pte_present/pmd_present and the patch above exposed this invariant breakage for pmd_present. The same problem already existed for the pte and pte_present and it was fixed by commit 660a293ea9be709 ("x86, mm: Make spurious_fault check explicitly check the PRESENT bit") (if it wasn't for that commit, it wouldn't even be a regression). That fix avoids the pagefault to use pte_present. I could follow through by stopping using pmd_present/pmd_huge too. However I think it's more robust to fix pageattr and to clear the PSE/GLOBAL bitflags too in addition to the present bitflag. So the kernel page fault can keep using the regular pte_present/pmd_present/pmd_huge. The confusion arises because _PAGE_GLOBAL and _PAGE_PROTNONE are sharing the same bit, and in the pmd case we pretend _PAGE_PSE to be set only in present pmds (to facilitate split_huge_page final tlb flush). Signed-off-by: Andrea Arcangeli <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Shaohua Li <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2013-02-07x86: Do not leak kernel page mapping locationsKees Cook1-3/+5
Without this patch, it is trivial to determine kernel page mappings by examining the error code reported to dmesg[1]. Instead, declare the entire kernel memory space as a violation of a present page. Additionally, since show_unhandled_signals is enabled by default, switch branch hinting to the more realistic expectation, and unobfuscate the setting of the PF_PROT bit to improve readability. [1] http://vulnfactory.org/blog/2013/02/06/a-linux-memory-trick/ Reported-by: Dan Rosenberg <[email protected]> Suggested-by: Brad Spengler <[email protected]> Signed-off-by: Kees Cook <[email protected]> Cc: [email protected] Acked-by: H. Peter Anvin <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2012-12-12mm, oom: remove statically defined arch functions of same nameDavid Rientjes1-15/+8
out_of_memory() is a globally defined function to call the oom killer. x86, sh, and powerpc all use a function of the same name within file scope in their respective fault.c unnecessarily. Inline the functions into the pagefault handlers to clean the code up. Signed-off-by: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Paul Mundt <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-11-30context_tracking: New context tracking susbsystemFrederic Weisbecker1-1/+1
Create a new subsystem that probes on kernel boundaries to keep track of the transitions between level contexts with two basic initial contexts: user or kernel. This is an abstraction of some RCU code that use such tracking to implement its userspace extended quiescent state. We need to pull this up from RCU into this new level of indirection because this tracking is also going to be used to implement an "on demand" generic virtual cputime accounting. A necessary step to shutdown the tick while still accounting the cputime. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Andrew Morton <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Li Zhong <[email protected]> Cc: Gilad Ben-Yossef <[email protected]> Reviewed-by: Steven Rostedt <[email protected]> [ paulmck: fix whitespace error and email address. ] Signed-off-by: Paul E. McKenney <[email protected]>
2012-10-09readahead: fault retry breaks mmap file read random detectionShaohua Li1-0/+1
.fault now can retry. The retry can break state machine of .fault. In filemap_fault, if page is miss, ra->mmap_miss is increased. In the second try, since the page is in page cache now, ra->mmap_miss is decreased. And these are done in one fault, so we can't detect random mmap file access. Add a new flag to indicate .fault is tried once. In the second try, skip ra->mmap_miss decreasing. The filemap_fault state machine is ok with it. I only tested x86, didn't test other archs, but looks the change for other archs is obvious, but who knows :) Signed-off-by: Shaohua Li <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Wu Fengguang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2012-10-01Merge branch 'x86-smap-for-linus' of ↵Linus Torvalds1-0/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/smap support from Ingo Molnar: "This adds support for the SMAP (Supervisor Mode Access Prevention) CPU feature on Intel CPUs: a hardware feature that prevents unintended user-space data access from kernel privileged code. It's turned on automatically when possible. This, in combination with SMEP, makes it even harder to exploit kernel bugs such as NULL pointer dereferences." Fix up trivial conflict in arch/x86/kernel/entry_64.S due to newly added includes right next to each other. * 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, smep, smap: Make the switching functions one-way x86, suspend: On wakeup always initialize cr4 and EFER x86-32: Start out eflags and cr4 clean x86, smap: Do not abuse the [f][x]rstor_checking() functions for user space x86-32, smap: Add STAC/CLAC instructions to 32-bit kernel entry x86, smap: Reduce the SMAP overhead for signal handling x86, smap: A page fault due to SMAP is an oops x86, smap: Turn on Supervisor Mode Access Prevention x86, smap: Add STAC and CLAC instructions to control user space access x86, uaccess: Merge prototypes for clear_user/__clear_user x86, smap: Add a header file with macros for STAC/CLAC x86, alternative: Add header guards to <asm/alternative-asm.h> x86, alternative: Use .pushsection/.popsection x86, smap: Add CR4 bit for SMAP x86-32, mm: The WP test should be done on a kernel page
2012-09-26x86: Exception hooks for userspace RCU extended QSFrederic Weisbecker1-2/+11
Add necessary hooks to x86 exception for userspace RCU extended quiescent state support. This includes traps, page fault, debug exceptions, etc... Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Alessio Igor Bogani <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Avi Kivity <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Geoff Levand <[email protected]> Cc: Gilad Ben Yossef <[email protected]> Cc: Hakan Akkan <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Josh Triplett <[email protected]> Cc: Kevin Hilman <[email protected]> Cc: Max Krasnyansky <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephen Hemminger <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Sven-Thorsten Dietrich <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2012-09-21x86, smap: A page fault due to SMAP is an oopsH. Peter Anvin1-0/+18
If we get a page fault due to SMAP, trigger an oops rather than spinning forever. Signed-off-by: H. Peter Anvin <[email protected]> Link: http://lkml.kernel.org/r/[email protected]
2012-05-03userns: Store uid and gid values in struct cred with kuid_t and kgid_t typesEric W. Biederman1-1/+1
cred.h and a few trivial users of struct cred are changed. The rest of the users of struct cred are left for other patches as there are too many changes to make in one go and leave the change reviewable. If the user namespace is disabled and CONFIG_UIDGID_STRICT_TYPE_CHECKS are disabled the code will contiue to compile and behave correctly. Acked-by: Serge Hallyn <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]>
2012-03-13x86: Rename trap_no to trap_nr in thread_structSrikar Dronamraju1-5/+5
There are precedences of trap number being referred to as trap_nr. However thread struct refers trap number as trap_no. Change it to trap_nr. Also use enum instead of left-over literals for trap values. This is pure cleanup, no functional change intended. Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: Srikar Dronamraju <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Jim Keniston <[email protected]> Cc: Linux-mm <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Fixed the math-emu build ] Signed-off-by: Ingo Molnar <[email protected]>
2012-01-26bugs, x86: Fix printk levels for panic, softlockups and stack dumpsPrarit Bhargava1-2/+2
rsyslog will display KERN_EMERG messages on a connected terminal. However, these messages are useless/undecipherable for a general user. For example, after a softlockup we get: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Stack: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Call Trace: Message from syslogd@intel-s3e37-04 at Jan 25 14:18:06 ... kernel:Code: ff ff a8 08 75 25 31 d2 48 8d 86 38 e0 ff ff 48 89 d1 0f 01 c8 0f ae f0 48 8b 86 38 e0 ff ff a8 08 75 08 b1 01 4c 89 e0 0f 01 c9 <e8> ea 69 dd ff 4c 29 e8 48 89 c7 e8 0f bc da ff 49 89 c4 49 89 This happens because the printk levels for these messages are incorrect. Only an informational message should be displayed on a terminal. I modified the printk levels for various messages in the kernel and tested the output by using the drivers/misc/lkdtm.c kernel modules (ie, softlockups, panics, hard lockups, etc.) and confirmed that the console output was still the same and that the output to the terminals was correct. For example, in the case of a softlockup we now see the much more informative: Message from syslogd@intel-s3e37-04 at Jan 25 10:18:06 ... BUG: soft lockup - CPU4 stuck for 60s! instead of the above confusing messages. AFAICT, the messages no longer have to be KERN_EMERG. In the most important case of a panic we set console_verbose(). As for the other less severe cases the correct data is output to the console and /var/log/messages. Successfully tested by me using the drivers/misc/lkdtm.c module. Signed-off-by: Prarit Bhargava <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-12-05x86-64: Set siginfo and context on vsyscall emulation faultsAndy Lutomirski1-6/+16
To make this work, we teach the page fault handler how to send signals on failed uaccess. This only works for user addresses (kernel addresses will never hit the page fault handler in the first place), so we need to generate signals for those separately. This gets the tricky case right: if the user buffer spans multiple pages and only the second page is invalid, we set cr2 and si_addr correctly. UML relies on this behavior to "fault in" pages as needed. We steal a bit from thread_info.uaccess_err to enable this. Before this change, uaccess_err was a 32-bit boolean value. This fixes issues with UML when vsyscall=emulate. Reported-by: Adrian Bunk <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Cc: richard -rw- weinberger <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/4c8f91de7ec5cd2ef0f59521a04e1015f11e42b4.1320712291.git.luto@amacapital.net Signed-off-by: Ingo Molnar <[email protected]>
2011-10-28Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86-64, doc: Remove int 0xcc from entry_64.S documentation x86, vsyscall: Add missing <asm/fixmap.h> to arch/x86/mm/fault.c Fix up trivial conflicts in arch/x86/mm/fault.c (asm/fixmap.h vs asm/vsyscall.h: both work, which to use? Whatever..)
2011-09-28x86-64: Don't apply destructive erratum workaround on unaffected CPUsJan Beulich1-1/+7
Erratum 93 applies to AMD K8 CPUs only, and its workaround (forcing the upper 32 bits of %rip to all get set under certain conditions) is actually getting in the way of analyzing page faults occurring during EFI physical mode runtime calls (in particular the page table walk shown is completely unrelated to the actual fault). This is because typically EFI runtime code lives in the space between 2G and 4G, which - modulo the above manipulation - is likely to overlap with the kernel or modules area. While even for the other errata workarounds their taking effect could be limited to just the affected CPUs, none of them appears to be destructive, and they're generally getting called only outside of performance critical paths, so they're being left untouched. Signed-off-by: Jan Beulich <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-08-16x86, vsyscall: Add missing <asm/fixmap.h> to arch/x86/mm/fault.cH. Peter Anvin1-0/+1
arch/x86/mm/fault.c now depend on having the symbol VSYSCALL_START defined, which is best handled by including <asm/fixmap.h> (it isn't unreasonable we may want other fixed addresses in this file in the future, and so it is cleaner than including <asm/vsyscall.h> directly.) This addresses an x86-64 allnoconfig build failure. On other configurations it was masked by an indirect path: <asm/smp.h> -> <asm/apic.h> -> <asm/fixmap.h> -> <asm/vsyscall.h> ... however, the first such include is conditional on CONFIG_X86_LOCAL_APIC. Originally-by: Randy Dunlap <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/CA%2B55aFxsOMc9=p02r8-QhJ=h=Mqwckk4_Pnx9LQt5%[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2011-08-15x86: fix mm/fault.c buildRandy Dunlap1-0/+1
arch/x86/mm/fault.c needs to include asm/vsyscall.h to fix a build error: arch/x86/mm/fault.c: In function '__bad_area_nosemaphore': arch/x86/mm/fault.c:728: error: 'VSYSCALL_START' undeclared (first use in this function) Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-08-12Merge branch 'x86-vdso-for-linus' of ↵Linus Torvalds1-1/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip * 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip: x86-64: Rework vsyscall emulation and add vsyscall= parameter x86-64: Wire up getcpu syscall x86: Remove unnecessary compile flag tweaks for vsyscall code x86-64: Add vsyscall:emulate_vsyscall trace event x86-64: Add user_64bit_mode paravirt op x86-64, xen: Enable the vvar mapping x86-64: Work around gold bug 13023 x86-64: Move the "user" vsyscall segment out of the data segment. x86-64: Pad vDSO to a page boundary
2011-08-10x86-64: Rework vsyscall emulation and add vsyscall= parameterAndy Lutomirski1-0/+12
There are three choices: vsyscall=native: Vsyscalls are native code that issues the corresponding syscalls. vsyscall=emulate (default): Vsyscalls are emulated by instruction fault traps, tested in the bad_area path. The actual contents of the vsyscall page is the same as the vsyscall=native case except that it's marked NX. This way programs that make assumptions about what the code in the page does will not be confused when they read that code. vsyscall=none: Trying to execute a vsyscall will segfault. Signed-off-by: Andy Lutomirski <[email protected]> Link: http://lkml.kernel.org/r/8449fb3abf89851fd6b2260972666a6f82542284.1312988155.git.luto@mit.edu Signed-off-by: H. Peter Anvin <[email protected]>
2011-08-04x86-64: Add user_64bit_mode paravirt opAndy Lutomirski1-1/+1
Three places in the kernel assume that the only long mode CPL 3 selector is __USER_CS. This is not true on Xen -- Xen's sysretq changes cs to the magic value 0xe033. Two of the places are corner cases, but as of "x86-64: Improve vsyscall emulation CS and RIP handling" (c9712944b2a12373cb6ff8059afcfb7e826a6c54), vsyscalls will segfault if called with Xen's extra CS selector. This causes a panic when older init builds die. It seems impossible to make Xen use __USER_CS reliably without taking a performance hit on every system call, so this fixes the tests instead with a new paravirt op. It's a little ugly because ptrace.h can't include paravirt.h. Signed-off-by: Andy Lutomirski <[email protected]> Link: http://lkml.kernel.org/r/f4fcb3947340d9e96ce1054a432f183f9da9db83.1312378163.git.luto@mit.edu Reported-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2011-07-01perf: Remove the nmi parameter from the swevent and overflow interfacePeter Zijlstra1-3/+3
The nmi parameter indicated if we could do wakeups from the current context, if not, we would set some state and self-IPI and let the resulting interrupt do the wakeup. For the various event classes: - hardware: nmi=0; PMI is in fact an NMI or we run irq_work_run from the PMI-tail (ARM etc.) - tracepoint: nmi=0; since tracepoint could be from NMI context. - software: nmi=[0,1]; some, like the schedule thing cannot perform wakeups, and hence need 0. As one can see, there is very little nmi=1 usage, and the down-side of not using it is that on some platforms some software events can have a jiffy delay in wakeup (when arch_irq_work_raise isn't implemented). The up-side however is that we can remove the nmi parameter and save a bunch of conditionals in fast paths. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Michael Cree <[email protected]> Cc: Will Deacon <[email protected]> Cc: Deng-Cheng Zhu <[email protected]> Cc: Anton Blanchard <[email protected]> Cc: Eric B Munson <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Paul Mundt <[email protected]> Cc: David S. Miller <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Jason Wessel <[email protected]> Cc: Don Zickus <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-05-26x86: Move do_page_fault()'s error path under unlikely()KOSAKI Motohiro1-15/+20
Ingo suggested SIGKILL check should be moved into slowpath function. This will reduce the page fault fastpath impact of this recent commit: 37b23e0525d3: x86,mm: make pagefault killable Suggested-by: Ingo Molnar <[email protected]> Signed-off-by: KOSAKI Motohiro <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Andrew Morton <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2011-05-25x86,mm: make pagefault killableKOSAKI Motohiro1-1/+11
When an oom killing occurs, almost all processes are getting stuck at the following two points. 1) __alloc_pages_nodemask 2) __lock_page_or_retry 1) is not very problematic because TIF_MEMDIE leads to an allocation failure and getting out from page allocator. 2) is more problematic. In an OOM situation, zones typically don't have page cache at all and memory starvation might lead to greatly reduced IO performance. When a fork bomb occurs, TIF_MEMDIE tasks don't die quickly, meaning that a fork bomb may create new process quickly rather than the oom-killer killing it. Then, the system may become livelocked. This patch makes the pagefault interruptible by SIGKILL. Signed-off-by: KOSAKI Motohiro <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-05-20sanitize <linux/prefetch.h> usageLinus Torvalds1-0/+1
Commit e66eed651fd1 ("list: remove prefetching from regular list iterators") removed the include of prefetch.h from list.h, which uncovered several cases that had apparently relied on that rather obscure header file dependency. So this fixes things up a bit, using grep -L linux/prefetch.h $(git grep -l '[^a-z_]prefetchw*(' -- '*.[ch]') grep -L 'prefetchw*(' $(git grep -l 'linux/prefetch.h' -- '*.[ch]') to guide us in finding files that either need <linux/prefetch.h> inclusion, or have it despite not needing it. There are more of them around (mostly network drivers), but this gets many core ones. Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-03-10x86/mm: Fix pgd_lock deadlockAndrea Arcangeli1-4/+3
It's forbidden to take the page_table_lock with the irq disabled or if there's contention the IPIs (for tlb flushes) sent with the page_table_lock held will never run leading to a deadlock. Nobody takes the pgd_lock from irq context so the _irqsave can be removed. Signed-off-by: Andrea Arcangeli <[email protected]> Acked-by: Rik van Riel <[email protected]> Tested-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-10x86/mm: Handle mm_fault_error() in kernel spaceAndrey Vagin1-0/+7
mm_fault_error() should not execute oom-killer, if page fault occurs in kernel space. E.g. in copy_from_user()/copy_to_user(). This would happen if we find ourselves in OOM on a copy_to_user(), or a copy_from_user() which faults. Without this patch, the kernels hangs up in copy_from_user(), because OOM killer sends SIG_KILL to current process, but it can't handle a signal while in syscall, then the kernel returns to copy_from_user(), reexcute current command and provokes page_fault again. With this patch the kernel return -EFAULT from copy_from_user(). The code, which checks that page fault occurred in kernel space, has been copied from do_sigbus(). This situation is handled by the same way on powerpc, xtensa, tile, ... Signed-off-by: Andrey Vagin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-10-26x86: access_error API cleanupMichel Lespinasse1-3/+3
access_error() already takes error_code as an argument, so there is no need for an additional write flag. Signed-off-by: Michel Lespinasse <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Nick Piggin <[email protected]> Acked-by: Wu Fengguang <[email protected]> Cc: Ying Han <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Acked-by: "H. Peter Anvin" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26mm: retry page fault when blocking on disk transferMichel Lespinasse1-12/+26
This change reduces mmap_sem hold times that are caused by waiting for disk transfers when accessing file mapped VMAs. It introduces the VM_FAULT_ALLOW_RETRY flag, which indicates that the call site wants mmap_sem to be released if blocking on a pending disk transfer. In that case, filemap_fault() returns the VM_FAULT_RETRY status bit and do_page_fault() will then re-acquire mmap_sem and retry the page fault. It is expected that the retry will hit the same page which will now be cached, and thus it will complete with a low mmap_sem hold time. Tests: - microbenchmark: thread A mmaps a large file and does random read accesses to the mmaped area - achieves about 55 iterations/s. Thread B does mmap/munmap in a loop at a separate location - achieves 55 iterations/s before, 15000 iterations/s after. - We are seeing related effects in some applications in house, which show significant performance regressions when running without this change. [[email protected]: fix warning & crash] Signed-off-by: Michel Lespinasse <[email protected]> Acked-by: Rik van Riel <[email protected]> Acked-by: Linus Torvalds <[email protected]> Cc: Nick Piggin <[email protected]> Reviewed-by: Wu Fengguang <[email protected]> Cc: Ying Han <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Acked-by: "H. Peter Anvin" <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-22Merge branch 'hwpoison-hugepages' into hwpoisonAndi Kleen1-6/+13
Conflicts: mm/memory-failure.c
2010-10-21Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-25/+18
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32, percpu: Correct the ordering of the percpu readmostly section x86, mm: Enable ARCH_DMA_ADDR_T_64BIT with X86_64 || HIGHMEM64G x86: Spread tlb flush vector between nodes percpu: Introduce a read-mostly percpu API x86, mm: Fix incorrect data type in vmalloc_sync_all() x86, mm: Hold mm->page_table_lock while doing vmalloc_sync x86, mm: Fix bogus whitespace in sync_global_pgds() x86-32: Fix sparse warning for the __PHYSICAL_MASK calculation x86, mm: Add RESERVE_BRK_ARRAY() helper mm, x86: Saving vmcore with non-lazy freeing of vmas x86, kdump: Change copy_oldmem_page() to use cached addressing x86, mm: fix uninitialized addr in kernel_physical_mapping_init() x86, kmemcheck: Remove double test x86, mm: Make spurious_fault check explicitly check the PRESENT bit x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions x86, mm: Avoid unnecessary TLB flush
2010-10-20x86, mm: Fix incorrect data type in vmalloc_sync_all()Borislav Petkov1-1/+1
arch/x86/mm/fault.c: In function 'vmalloc_sync_all': arch/x86/mm/fault.c:238: warning: assignment makes integer from pointer without a cast introduced by 617d34d9e5d8326ec8f188c616aa06ac59d083fe. Signed-off-by: Borislav Petkov <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-10-19x86, mm: Hold mm->page_table_lock while doing vmalloc_syncJeremy Fitzhardinge1-1/+10
Take mm->page_table_lock while syncing the vmalloc region. This prevents a race with the Xen pagetable pin/unpin code, which expects that the page_table_lock is already held. If this race occurs, then Xen can see an inconsistent page type (a page can either be read/write or a pagetable page, and pin/unpin converts it between them), which will cause either the pin or the set_p[gm]d to fail; either will crash the kernel. vmalloc_sync_all() should be called rarely, so this extra use of page_table_lock should not interfere with its normal users. The mm pointer is stashed in the pgd page's index field, as that won't be otherwise used for pgds. Reported-by: Ian Campbell <[email protected]> Originally-by: Jan Beulich <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Jeremy Fitzhardinge <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-10-14x86: Barf when vmalloc and kmemcheck faults happen in NMIFrederic Weisbecker1-0/+4
In x86, faults exit by executing the iret instruction, which then reenables NMIs if we faulted in NMI context. Then if a fault happens in NMI, another NMI can nest after the fault exits. But we don't yet support nested NMIs because we have only one NMI stack. To prevent from that, check that vmalloc and kmemcheck faults don't happen in this context. Most of the other kernel faults in NMIs can be more easily spotted by finding explicit copy_from,to_user() calls on review. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Peter Zijlstra <[email protected]>
2010-10-08x86: HWPOISON: Report correct address granuality for huge hwpoison faultsAndi Kleen1-6/+13
An earlier patch fixed the hwpoison fault handling to encode the huge page size in the fault code of the page fault handler. This is needed to report this information in SIGBUS to user space. This is a straight forward patch to pass this information through to the signal handling in the x86 specific fault.c Cc: [email protected] Cc: Naoya Horiguchi <[email protected]> Cc: [email protected] Signed-off-by: Andi Kleen <[email protected]>
2010-08-26x86, mm: Make spurious_fault check explicitly check the PRESENT bitShaohua Li1-1/+7
pte_present() returns true even present bit isn't set but _PAGE_PROTNONE (global bit) bit is set. While with CONFIG_DEBUG_PAGEALLOC, free pages have global bit set but present bit clear. This patch makes we could catch free pages access with CONFIG_DEBUG_PAGEALLOC enabled. [ hpa: added a comment in the code as a warning to janitors ] Signed-off-by: Shaohua Li <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-08-26x86, mm: Separate x86_64 vmalloc_sync_all() into separate functionsHaicheng Li1-23/+1
No behavior change. Move some of vmalloc_sync_all() code into a new function sync_global_pgds() that will be useful for memory hotplug. Signed-off-by: Haicheng Li <[email protected]> LKML-Reference: <[email protected]> Reviewed-by: Wu Fengguang <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-08-13x86: don't send SIGBUS for kernel page faultsLinus Torvalds1-1/+3
It's wrong for several reasons, but the most direct one is that the fault may be for the stack accesses to set up a previous SIGBUS. When we have a kernel exception, the kernel exception handler does all the fixups, not some user-level signal handler. Even apart from the nested SIGBUS issue, it's also wrong to give out kernel fault addresses in the signal handler info block, or to send a SIGBUS when a system call already returns EFAULT. Signed-off-by: Linus Torvalds <[email protected]>
2009-12-05Merge branch 'x86-debug-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: Limit number of per cpu TSC sync messages x86: dumpstack, 64-bit: Disable preemption when walking the IRQ/exception stacks x86: dumpstack: Clean up the x86_stack_ids[][] initalization and other details x86, cpu: mv display_cacheinfo -> cpu_detect_cache_sizes x86: Suppress stack overrun message for init_task x86: Fix cpu_devs[] initialization in early_cpu_init() x86: Remove CPU cache size output for non-Intel too x86: Minimise printk spew from per-vendor init code x86: Remove the CPU cache size printk's cpumask: Avoid cpumask_t in arch/x86/kernel/apic/nmi.c x86: Make sure we also print a Code: line for show_regs()
2009-11-23x86: Suppress stack overrun message for init_taskJan Beulich1-1/+1
init_task doesn't get its stack end location set to STACK_END_MAGIC, and hence the message is confusing rather than helpful in this case. Signed-off-by: Jan Beulich <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-10-17Merge commit 'v2.6.32-rc5' into perf/probesIngo Molnar1-4/+15
Conflicts: kernel/trace/trace_event_profile.c Merge reason: update to -rc5 and resolve conflict. Signed-off-by: Ingo Molnar <[email protected]>
2009-09-24Merge branch 'hwpoison' of ↵Linus Torvalds1-4/+15
git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 * 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits) HWPOISON: Enable error_remove_page on btrfs HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs HWPOISON: Add madvise() based injector for hardware poisoned pages v4 HWPOISON: Enable error_remove_page for NFS HWPOISON: Enable .remove_error_page for migration aware file systems HWPOISON: The high level memory error handler in the VM v7 HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process HWPOISON: shmem: call set_page_dirty() with locked page HWPOISON: Define a new error_remove_page address space op for async truncation HWPOISON: Add invalidate_inode_page HWPOISON: Refactor truncate to allow direct truncating of page v2 HWPOISON: check and isolate corrupted free pages v2 HWPOISON: Handle hardware poisoned pages in try_to_unmap HWPOISON: Use bitmask/action code for try_to_unmap behaviour HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2 HWPOISON: Add poison check to page fault handling HWPOISON: Add basic support for poisoned pages in fault handler v3 HWPOISON: Add new SIGBUS error codes for hardware poison signals HWPOISON: Add support for poison swap entries v2 HWPOISON: Export some rmap vma locking to outside world ...
2009-09-23Merge commit 'linus/master' into tracing/kprobesFrederic Weisbecker1-34/+25
Conflicts: kernel/trace/Makefile kernel/trace/trace.h kernel/trace/trace_event_types.h kernel/trace/trace_export.c Merge reason: Sync with latest significant tracing core changes.
2009-09-21perf: Do the big rename: Performance Counters -> Performance EventsIngo Molnar1-4/+4
Bye-bye Performance Counters, welcome Performance Events! In the past few months the perfcounters subsystem has grown out its initial role of counting hardware events, and has become (and is becoming) a much broader generic event enumeration, reporting, logging, monitoring, analysis facility. Naming its core object 'perf_counter' and naming the subsystem 'perfcounters' has become more and more of a misnomer. With pending code like hw-breakpoints support the 'counter' name is less and less appropriate. All in one, we've decided to rename the subsystem to 'performance events' and to propagate this rename through all fields, variables and API names. (in an ABI compatible fashion) The word 'event' is also a bit shorter than 'counter' - which makes it slightly more convenient to write/handle as well. Thanks goes to Stephane Eranian who first observed this misnomer and suggested a rename. User-space tooling and ABI compatibility is not affected - this patch should be function-invariant. (Also, defconfigs were not touched to keep the size down.) This patch has been generated via the following script: FILES=$(find * -type f | grep -vE 'oprofile|[^K]config') sed -i \ -e 's/PERF_EVENT_/PERF_RECORD_/g' \ -e 's/PERF_COUNTER/PERF_EVENT/g' \ -e 's/perf_counter/perf_event/g' \ -e 's/nb_counters/nb_events/g' \ -e 's/swcounter/swevent/g' \ -e 's/tpcounter_event/tp_event/g' \ $FILES for N in $(find . -name perf_counter.[ch]); do M=$(echo $N | sed 's/perf_counter/perf_event/g') mv $N $M done FILES=$(find . -name perf_event.*) sed -i \ -e 's/COUNTER_MASK/REG_MASK/g' \ -e 's/COUNTER/EVENT/g' \ -e 's/\<event\>/event_id/g' \ -e 's/counter/event/g' \ -e 's/Counter/Event/g' \ $FILES ... to keep it as correct as possible. This script can also be used by anyone who has pending perfcounters patches - it converts a Linux kernel tree over to the new naming. We tried to time this change to the point in time where the amount of pending patches is the smallest: the end of the merge window. Namespace clashes were fixed up in a preparatory patch - and some stylistic fallout will be fixed up in a subsequent patch. ( NOTE: 'counters' are still the proper terminology when we deal with hardware registers - and these sed scripts are a bit over-eager in renaming them. I've undone some of that, but in case there's something left where 'counter' would be better than 'event' we can undo that on an individual basis instead of touching an otherwise nicely automated patch. ) Suggested-by: Stephane Eranian <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Acked-by: Paul Mackerras <[email protected]> Reviewed-by: Arjan van de Ven <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: David Howells <[email protected]> Cc: Kyle McMartin <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2009-09-16HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2Andi Kleen1-4/+15
Add VM_FAULT_HWPOISON handling to the x86 page fault handler. This is very similar to VM_FAULT_OOM, the only difference is that a different si_code is passed to user space and the new addr_lsb field is initialized. v2: Make the printk more verbose/unique Cc: [email protected] Signed-off-by: Andi Kleen <[email protected]>
2009-09-14Merge branch 'x86-mm-for-linus' of ↵Linus Torvalds1-30/+21
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, highmem_32.c: Clean up comment x86, pgtable.h: Clean up types x86: Clean up dump_pagetable()
2009-08-30kprobes/x86: Fix to add __kprobes to in-kernel fault handing functionsMasami Hiramatsu1-5/+6
Add __kprobes to the functions which handle in-kernel fixable page faults. Since kprobes can cause those in-kernel page faults by accessing kprobe data structures, probing those fault functions will cause fault-int3-loop (do_page_fault has already been marked as __kprobes). Signed-off-by: Masami Hiramatsu <[email protected]> Acked-by: Ananth N Mavinakayanahalli <[email protected]> Cc: Ingo Molnar <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2009-07-11x86: Remove spurious printk level from segfault messageRoland Dreier1-1/+1
Since commit 5fd29d6c ("printk: clean up handling of log-levels and newlines"), the kernel logs segfaults like: <6>gnome-power-man[24509]: segfault at 20 ip 00007f9d4950465a sp 00007fffbb50fc70 error 4 in libgobject-2.0.so.0.2103.0[7f9d494f7000+45000] with the extra "<6>" being KERN_INFO. This happens because the printk in show_signal_msg() started with KERN_CONT and then used "%s" to pass in the real level; and KERN_CONT is no longer an empty string, and printk only pays attention to the level at the very beginning of the format string. Therefore, remove the KERN_CONT from this printk, since it is now actively causing problems (and never really made any sense). Signed-off-by: Roland Dreier <[email protected]> Cc: Linus Torvalds <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2009-07-08Remove multiple KERN_ prefixes from printk formatsJoe Perches1-4/+5
Commit 5fd29d6ccbc98884569d6f3105aeca70858b3e0f ("printk: clean up handling of log-levels and newlines") changed printk semantics. printk lines with multiple KERN_<level> prefixes are no longer emitted as before the patch. <level> is now included in the output on each additional use. Remove all uses of multiple KERN_<level>s in formats. Signed-off-by: Joe Perches <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>