blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2011-03-14	xen/mmu: Add the notion of identity (1-1) mapping.	Konrad Rzeszutek Wilk	2	-4/+240
	Our P2M tree structure is a three-level. On the leaf nodes we set the Machine Frame Number (MFN) of the PFN. What this means is that when one does: pfn_to_mfn(pfn), which is used when creating PTE entries, you get the real MFN of the hardware. When Xen sets up a guest it initially populates a array which has descending (or ascending) MFN values, as so: idx: 0, 1, 2 [0x290F, 0x290E, 0x290D, ..] so pfn_to_mfn(2)==0x290D. If you start, restart many guests that list starts looking quite random. We graft this structure on our P2M tree structure and stick in those MFN in the leafs. But for all other leaf entries, or for the top root, or middle one, for which there is a void entry, we assume it is "missing". So pfn_to_mfn(0xc0000)=INVALID_P2M_ENTRY. We add the possibility of setting 1-1 mappings on certain regions, so that: pfn_to_mfn(0xc0000)=0xc0000 The benefit of this is, that we can assume for non-RAM regions (think PCI BARs, or ACPI spaces), we can create mappings easily b/c we get the PFN value to match the MFN. For this to work efficiently we introduce one new page p2m_identity and allocate (via reserved_brk) any other pages we need to cover the sides (1GB or 4MB boundary violations). All entries in p2m_identity are set to INVALID_P2M_ENTRY type (Xen toolstack only recognizes that and MFNs, no other fancy value). On lookup we spot that the entry points to p2m_identity and return the identity value instead of dereferencing and returning INVALID_P2M_ENTRY. If the entry points to an allocated page, we just proceed as before and return the PFN. If the PFN has IDENTITY_FRAME_BIT set we unmask that in appropriate functions (pfn_to_mfn). The reason for having the IDENTITY_FRAME_BIT instead of just returning the PFN is that we could find ourselves where pfn_to_mfn(pfn)==pfn for a non-identity pfn. To protect ourselves against we elect to set (and get) the IDENTITY_FRAME_BIT on all identity mapped PFNs. This simplistic diagram is used to explain the more subtle piece of code. There is also a digram of the P2M at the end that can help. Imagine your E820 looking as so: 1GB 2GB /-------------------+---------\/----\ /----------\ /---+-----\ \| System RAM \| Sys RAM \|\|ACPI\| \| reserved \| \| Sys RAM \| \-------------------+---------/\----/ \----------/ \---+-----/ ^- 1029MB ^- 2001MB [1029MB = 263424 (0x40500), 2001MB = 512256 (0x7D100), 2048MB = 524288 (0x80000)] And dom0_mem=max:3GB,1GB is passed in to the guest, meaning memory past 1GB is actually not present (would have to kick the balloon driver to put it in). When we are told to set the PFNs for identity mapping (see patch: "xen/setup: Set identity mapping for non-RAM E820 and E820 gaps.") we pass in the start of the PFN and the end PFN (263424 and 512256 respectively). The first step is to reserve_brk a top leaf page if the p2m[1] is missing. The top leaf page covers 512^2 of page estate (1GB) and in case the start or end PFN is not aligned on 512^2*PAGE_SIZE (1GB) we loop on aligned 1GB PFNs from start pfn to end pfn. We reserve_brk top leaf pages if they are missing (means they point to p2m_mid_missing). With the E820 example above, 263424 is not 1GB aligned so we allocate a reserve_brk page which will cover the PFNs estate from 0x40000 to 0x80000. Each entry in the allocate page is "missing" (points to p2m_missing). Next stage is to determine if we need to do a more granular boundary check on the 4MB (or 2MB depending on architecture) off the start and end pfn's. We check if the start pfn and end pfn violate that boundary check, and if so reserve_brk a middle (p2m[x][y]) leaf page. This way we have a much finer granularity of setting which PFNs are missing and which ones are identity. In our example 263424 and 512256 both fail the check so we reserve_brk two pages. Populate them with INVALID_P2M_ENTRY (so they both have "missing" values) and assign them to p2m[1][2] and p2m[1][488] respectively. At this point we would at minimum reserve_brk one page, but could be up to three. Each call to set_phys_range_identity has at maximum a three page cost. If we were to query the P2M at this stage, all those entries from start PFN through end PFN (so 1029MB -> 2001MB) would return INVALID_P2M_ENTRY ("missing"). The next step is to walk from the start pfn to the end pfn setting the IDENTITY_FRAME_BIT on each PFN. This is done in 'set_phys_range_identity'. If we find that the middle leaf is pointing to p2m_missing we can swap it over to p2m_identity - this way covering 4MB (or 2MB) PFN space. At this point we do not need to worry about boundary aligment (so no need to reserve_brk a middle page, figure out which PFNs are "missing" and which ones are identity), as that has been done earlier. If we find that the middle leaf is not occupied by p2m_identity or p2m_missing, we dereference that page (which covers 512 PFNs) and set the appropriate PFN with IDENTITY_FRAME_BIT. In our example 263424 and 512256 end up there, and we set from p2m[1][2][256->511] and p2m[1][488][0->256] with IDENTITY_FRAME_BIT set. All other regions that are void (or not filled) either point to p2m_missing (considered missing) or have the default value of INVALID_P2M_ENTRY (also considered missing). In our case, p2m[1][2][0->255] and p2m[1][488][257->511] contain the INVALID_P2M_ENTRY value and are considered "missing." This is what the p2m ends up looking (for the E820 above) with this fabulous drawing: p2m /--------------\ /-----\ \| &mfn_list[0],\| /-----------------\ \| 0 \|------>\| &mfn_list[1],\| /---------------\ \| ~0, ~0, .. \| \|-----\| \| ..., ~0, ~0 \| \| ~0, ~0, [x]---+----->\| IDENTITY [@256] \| \| 1 \|---\ \--------------/ \| [p2m_identity]+\ \| IDENTITY [@257] \| \|-----\| \ \| [p2m_identity]+\\ \| .... \| \| 2 \|--\ \-------------------->\| ... \| \\ \----------------/ \|-----\| \ \---------------/ \\ \| 3 \|\ \ \\ p2m_identity \|-----\| \ \-------------------->/---------------\ /-----------------\ \| .. +->+ \| [p2m_identity]+-->\| ~0, ~0, ~0, ... \| \-----/ / \| [p2m_identity]+-->\| ..., ~0 \| / /---------------\ \| .... \| \-----------------/ / \| IDENTITY[@0] \| /-+-[x], ~0, ~0.. \| / \| IDENTITY[@256]\|<----/ \---------------/ / \| ~0, ~0, .... \| \| \---------------/ \| p2m_missing p2m_missing /------------------\ /------------\ \| [p2m_mid_missing]+---->\| ~0, ~0, ~0 \| \| [p2m_mid_missing]+---->\| ..., ~0 \| \------------------/ \------------/ where ~0 is INVALID_P2M_ENTRY. IDENTITY is (PFN \| IDENTITY_BIT) Reviewed-by: Ian Campbell <[email protected]> [v5: Changed code to use ranges, added ASCII art] [v6: Rebased on top of xen->p2m code split] [v4: Squished patches in just this one] [v7: Added RESERVE_BRK for potentially allocated pages] [v8: Fixed alignment problem] [v9: Changed 1<<3X to 1<<BITS_PER_LONG-X] [v10: Copied git commit description in the p2m code + Add Review tag] [v11: Title had '2-1' - should be '1-1' mapping] Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2011-03-14	x86: ce4100: Set pci ops via callback instead of module init	Sebastian Andrzej Siewior	3	-3/+12
	Setting the pci ops on subsys initcall unconditionally will break multi platform kernels on anything except ce4100. Use x86_init.pci.init ops to call this only on real ce4100 platforms. Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-12	x86: Enable forced interrupt threading support	Thomas Gleixner	1	-0/+1
	All non threadeable interrupts are marked. Enable forced irq threading support. Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-12	x86: Mark low level interrupts IRQF_NO_THREAD	Thomas Gleixner	2	-0/+4
	These cannot be threaded. Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-12	x86: Use generic show_interrupts	Thomas Gleixner	2	-55/+3
	Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-12	x86: ioapic: Avoid redundant lookup of irq_cfg	Thomas Gleixner	1	-4/+5
	The caller of ioapic_register_intr() has a pointer to the irq_cfg for the irq already. Hand it in to avoid a full lookup. In msi_compose_msg() the pointer to irq_cfg is already available. No need to look it up again. Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-12	x86: ioapic: Use new move_irq functions	Thomas Gleixner	1	-2/+2
	Use the functions which take irq_data. We already have a pointer to irq_data. That avoids a sparse irq lookup in move_*_irq. Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-12	x86: Use the proper accessors in fixup_irqs()	Thomas Gleixner	1	-10/+13
	Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-12	x86: ioapic: Use irq_data->state	Thomas Gleixner	1	-8/+7
	Use the state information in irq_data. That avoids a radix-tree lookup from apic_ack_level() and simplifies setup_ioapic_dest(). Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-12	x86: ioapic: Simplify irq chip and handler setup	Thomas Gleixner	1	-27/+21
	Use pointers instead of ugly multiline if/else constructs. Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-12	x86: Cleanup the genirq name space	Thomas Gleixner	8	-48/+51
	genirq is switching to a consistent name space for the irq related functions. Convert x86. Conversion was done with coccinelle. Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-12	Merge branch 'x86/apic' into x86/irq	Thomas Gleixner	6	-176/+163
	Reason: Update to latest genirq code conflicts with pending apic changes Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-12	x86-64, NUMA: Don't call numa_set_distanc() for all possible node ↵	Tejun Heo	1	-11/+12
	combinations during emulation The distance transforming in numa_emulation() used to call numa_set_distance() for all MAX_NUMNODES * MAX_NUMNODES node combinations regardless of which are enabled. As numa_set_distance() ignores all out-of-bound distance settings, this doesn't cause any problem other than looping unnecessarily many times during boot. However, as MAX_NUMNODES * MAX_NUMNODES can be pretty high, update the code such that it iterates through only the enabled combinations. Yinghai Lu identified the issue and provided an initial patch to address the issue; however, the patch was incorrect in that it didn't build emulated distance table when there's no physical distance table and unnecessarily complex. http://thread.gmane.org/gmane.linux.kernel/1107986/focus=1107988 Signed-off-by: Tejun Heo <[email protected]> Reported-by: Yinghai Lu <[email protected]> Acked-by: Yinghai Lu <[email protected]>
2011-03-12	x86, binutils, xen: Fix another wrong size directive	Alexander van Heukelum	1	-1/+1
	The latest binutils (2.21.0.20110302/Ubuntu) breaks the build yet another time, under CONFIG_XEN=y due to a .size directive that refers to a slightly differently named (hence, to the now very strict and unforgiving assembler, non-existent) symbol. [ mingo: This unnecessary build breakage caused by new binutils version 2.21 gets escallated back several kernel releases spanning several years of Linux history, affecting over 130,000 upstream kernel commits (!), on CONFIG_XEN=y 64-bit kernels (i.e. essentially affecting all major Linux distro kernel configs). Git annotate tells us that this slight debug symbol code mismatch bug has been introduced in 2008 in commit 3d75e1b8: 3d75e1b8 (Jeremy Fitzhardinge 2008-07-08 15:06:49 -0700 1231) ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct pt_regs) The 'bug' is just a slight assymetry in ENTRY()/END() debug-symbols sequences, with lots of assembly code between the ENTRY() and the END(): ENTRY(xen_do_hypervisor_callback) # do_hypervisor_callback(struct pt_regs) ... END(do_hypervisor_callback) Human reviewers almost never catch such small mismatches, and binutils never even warned about it either. This new binutils version thus breaks the Xen build on all upstream kernels since v2.6.27, out of the blue. This makes a straightforward Git bisection of all 64-bit Xen-enabled kernels impossible on such binutils, for a bisection window of over hundred thousand historic commits. (!) This is a major fail on the side of binutils and binutils needs to turn this show-stopper build failure into a warning ASAP. ] Signed-off-by: Alexander van Heukelum <[email protected]> Cc: Jeremy Fitzhardinge <[email protected]> Cc: Jan Beulich <[email protected]> Cc: H.J. Lu <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Kees Cook <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-11	xen/e820: Don't mark balloon memory as E820_UNUSABLE when running as guest ↵	Konrad Rzeszutek Wilk	1	-1/+2
	and fix overflow. If we have a guest that asked for: memory=1024 maxmem=2048 Which means we want 1GB now, and create pagetables so that we can expand up to 2GB, we would have this E820 layout: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000080800000 (usable) Due to patch: "xen/setup: Inhibit resource API from using System RAM E820 gaps as PCI mem gaps." we would mark the memory past the 1GB mark as unusuable resulting in: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) [ 0.000000] Xen: 0000000000100000 - 0000000040000000 (usable) [ 0.000000] Xen: 0000000040000000 - 0000000080800000 (unusable) which meant that we could not balloon up anymore. We could balloon the guest down. The fix is to run the code introduced by the above mentioned patch only for the initial domain. We will have to revisit this once we start introducing a modified E820 for PCI passthrough so that we can utilize the P2M identity code. We also fix an overflow by having UL instead of ULL on 32-bit machines. [v2: Ian pointed to the overflow issue] Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2011-03-11	futex: Sanitize futex ops argument types	Michel Lespinasse	1	-5/+5
	Change futex_atomic_op_inuser and futex_atomic_cmpxchg_inatomic prototypes to use u32 types for the futex as this is the data type the futex core code uses all over the place. Signed-off-by: Michel Lespinasse <[email protected]> Cc: Darren Hart <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Matt Turner <[email protected]> Cc: Russell King <[email protected]> Cc: David Howells <[email protected]> Cc: Tony Luck <[email protected]> Cc: Michal Simek <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Paul Mundt <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Linus Torvalds <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-11	futex: Sanitize cmpxchg_futex_value_locked API	Michel Lespinasse	1	-7/+9
	The cmpxchg_futex_value_locked API was funny in that it returned either the original, user-exposed futex value OR an error code such as -EFAULT. This was confusing at best, and could be a source of livelocks in places that retry the cmpxchg_futex_value_locked after trying to fix the issue by running fault_in_user_writeable(). This change makes the cmpxchg_futex_value_locked API more similar to the get_futex_value_locked one, returning an error code and updating the original value through a reference argument. Signed-off-by: Michel Lespinasse <[email protected]> Acked-by: Chris Metcalf <[email protected]> [tile] Acked-by: Tony Luck <[email protected]> [ia64] Acked-by: Thomas Gleixner <[email protected]> Tested-by: Michal Simek <[email protected]> [microblaze] Acked-by: David Howells <[email protected]> [frv] Cc: Darren Hart <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Matt Turner <[email protected]> Cc: Russell King <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Paul Mundt <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Linus Torvalds <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2011-03-11	x86: Clean up apic.c and apic.h	Henrik Kretzschmar	2	-47/+22
	This patch moves some functions and variables into init sections, makes a function static and removes some lines of cruft. Signed-off-by: Henrik Kretzschmar <[email protected]> Acked-by: Cyrill Gorcunov <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-11	x86: Remove superflous goal definition of tsc_sync	Henrik Kretzschmar	1	-2/+2
	The extra tsc_sync.o goal definition is superflous. CONFIG_X86_64_SMP depends on CONFIG_SMP and tsc_sync.o is already in the definition of CONFIG_SMP. Signed-off-by: Henrik Kretzschmar <[email protected]> Acked-by: Cyrill Gorcunov <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-10	Merge branch 'x86-fixes-for-linus' of ↵	Linus Torvalds	4	-9/+10
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, UV: Initialize the broadcast assist unit base destination node id properly x86, numa: Fix numa_emulation code with memory-less node0 x86, build: Make sure mkpiggy fails on read error
2011-03-10	xen: events: refactor GSI pirq bindings functions	Ian Campbell	1	-12/+29
	Following the example set by xen_allocate_pirq_msi and xen_bind_pirq_msi_to_irq: xen_allocate_pirq becomes xen_allocate_pirq_gsi and now only allocates a pirq number and does not bind it. xen_map_pirq_gsi becomes xen_bind_pirq_gsi_to_irq and binds an existing pirq. Signed-off-by: Ian Campbell <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2011-03-10	xen: events: remove dom0 specific xen_create_msi_irq	Ian Campbell	1	-5/+40
	The function name does not distinguish it from xen_allocate_pirq_msi (which operates on domU and pvhvm domains rather than dom0). Hoist domain 0 specific functionality up into the only caller leaving functionality common to all guest types in xen_bind_pirq_msi_to_irq. Signed-off-by: Ian Campbell <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2011-03-10	xen: events: use xen_bind_pirq_msi_to_irq from xen_create_msi_irq	Ian Campbell	1	-2/+2
	Signed-off-by: Ian Campbell <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2011-03-10	xen: events: push set_irq_msi down into xen_create_msi_irq	Ian Campbell	1	-9/+1
	Makes the tail end of this function look even more like xen_bind_pirq_msi_to_irq. Signed-off-by: Ian Campbell <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2011-03-10	xen: events: separate MSI PIRQ allocation from PIRQ binding to IRQ	Ian Campbell	1	-42/+26
	Split the binding aspect of xen_allocate_pirq_msi out into a new xen_bind_pirq_to_irq function. In xen_hvm_setup_msi_irq when allocating a pirq write the MSI message to signal the PIRQ as soon as the pirq is obtained. There is no way to free the pirq back so if the subsequent binding to an IRQ fails we want to ensure that we will reuse the PIRQ next time rather than leak it. Signed-off-by: Ian Campbell <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2011-03-10	xen: pci: collapse apic_register_gsi_xen_hvm and xen_hvm_register_pirq	Ian Campbell	1	-8/+3
	apic_register_gsi_xen_hvm is a tiny wrapper around xen_hvm_register_pirq. Signed-off-by: Ian Campbell <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2011-03-10	xen: events: return irq from xen_allocate_pirq_msi	Ian Campbell	1	-6/+6
	consistent with other similar functions. Signed-off-by: Ian Campbell <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2011-03-10	xen: events: drop XEN_ALLOC_IRQ flag to xen_allocate_pirq_msi	Ian Campbell	1	-3/+3
	All callers pass this flag so it is pointless. Signed-off-by: Ian Campbell <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2011-03-10	xen: pci: only define xen_initdom_setup_msi_irqs if CONFIG_XEN_DOM0	Ian Campbell	1	-0/+2
	Fixes: CC arch/x86/pci/xen.o arch/x86/pci/xen.c:183: warning: 'xen_initdom_setup_msi_irqs' defined but not used Signed-off-by: Ian Campbell <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2011-03-10	Merge branch 'stable/pcifront-fixes' into stable/irq.cleanup	Konrad Rzeszutek Wilk	2	-9/+12
	* stable/pcifront-fixes: pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code. pci/xen: Cleanup: convert int** to int[] pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq xen-pcifront: Sanity check the MSI/MSI-X values xen-pcifront: don't use flush_scheduled_work()
2011-03-10	Merge branch 'stable/irq.rework' into stable/irq.cleanup	Konrad Rzeszutek Wilk	2	-9/+17
	* stable/irq.rework: xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well. xen: Use IRQF_FORCE_RESUME xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend. xen: Fix compile error introduced by "switch to new irq_chip functions" xen: Switch to new irq_chip functions xen: Remove stale irq_chip.end xen: events: do not free legacy IRQs xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges. xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq xen:events: move find_unbound_irq inside CONFIG_PCI_MSI xen: handled remapped IRQs when enabling a pcifront PCI device. genirq: Add IRQF_FORCE_RESUME
2011-03-10	ftrace/graph: Trace function entry before updating index	Steven Rostedt	1	-7/+8
	Currently the index to the ret_stack is updated and the real return address is saved in the ret_stack. Then we call the trace function. The trace function could decide that it doesn't want to trace this function (ex. set_graph_function does not match) and it will return 0 which means not to trace this call. The normal function graph tracer has this code: if (!(trace->depth \|\| ftrace_graph_addr(trace->func)) \|\| ftrace_graph_ignore_irqs()) return 0; What this states is, if the trace depth (which is curr_ret_stack) is zero (top of nested functions) then test if we want to trace this function. If this function is not to be traced, then return 0 and the rest of the function graph tracer logic will not trace this function. The problem arises when an interrupt comes in after we updated the curr_ret_stack. The next function that gets called will have a trace->depth of 1. Which fools this trace code into thinking that we are in a nested function, and that we should trace. This causes interrupts to be traced when they should not be. The solution is to trace the function first and then update the ret_stack. Reported-by: zhiping zhong <[email protected]> Reported-by: wu zhangjin <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2011-03-10	tracing: Fix event alignment: kvm:kvm_hv_hypercall	David Sharp	1	-4/+4
	Acked-by: Avi Kivity <[email protected]> Signed-off-by: David Sharp <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2011-03-10	x86/mm: Fix pgd_lock deadlock	Andrea Arcangeli	5	-30/+22
	It's forbidden to take the page_table_lock with the irq disabled or if there's contention the IPIs (for tlb flushes) sent with the page_table_lock held will never run leading to a deadlock. Nobody takes the pgd_lock from irq context so the _irqsave can be removed. Signed-off-by: Andrea Arcangeli <[email protected]> Acked-by: Rik van Riel <[email protected]> Tested-by: Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-10	x86/mm: Handle mm_fault_error() in kernel space	Andrey Vagin	1	-0/+7
	mm_fault_error() should not execute oom-killer, if page fault occurs in kernel space. E.g. in copy_from_user()/copy_to_user(). This would happen if we find ourselves in OOM on a copy_to_user(), or a copy_from_user() which faults. Without this patch, the kernels hangs up in copy_from_user(), because OOM killer sends SIG_KILL to current process, but it can't handle a signal while in syscall, then the kernel returns to copy_from_user(), reexcute current command and provokes page_fault again. With this patch the kernel return -EFAULT from copy_from_user(). The code, which checks that page fault occurred in kernel space, has been copied from do_sigbus(). This situation is handled by the same way on powerpc, xtensa, tile, ... Signed-off-by: Andrey Vagin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-09	[CPUFREQ] pcc-cpufreq: don't load driver if get_freq fails during init.	Naga Chumbalkar	1	-1/+1
	Return 0 on failure. This will cause the initialization of the driver to fail and prevent the driver from loading if the BIOS cannot handle the PCC interface command to "get frequency". Otherwise, the driver will load and display a very high value like "4294967274" (which is actually -EINVAL) for frequency: # cat /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_cur_freq 4294967274 Signed-off-by: Naga Chumbalkar <[email protected]> CC: [email protected] Signed-off-by: Dave Jones <[email protected]>
2011-03-09	x86: Don't check for BIOS corruption in first 64K when there's no need to	Naga Chumbalkar	1	-4/+4
	Due to commit 781c5a67f152c17c3e4a9ed9647f8c0be6ea5ae9 it is likely that the number of areas to scan for BIOS corruption is 0 -- especially when the first 64K is already reserved (X86_RESERVE_LOW is 64K by default). If that's the case then don't set up the scan. Signed-off-by: Naga Chumbalkar <[email protected]> Cc: <[email protected]> LKML-Reference: <20110225202838.2229.71011.sendpatchset@nchumbalkar.americas.hpqcorp.net> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-09	x86, UV: Initialize the broadcast assist unit base destination node id properly	Cliff Wickman	2	-3/+3
	The BAU's initialization of the broadcast description header is lacking the coherence domain (high bits) in the nasid. This causes a catastrophic system failure when running on a system with multiple coherence domains. Signed-off-by: Cliff Wickman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-09	x86: Remove dead config option X86_CPU	Jan Beulich	2	-5/+2
	This isn't being referenced anywhere, and the selects done from it can be easily done together with all the other X86 ones. v2: Also adjust UML's Kconfig.x86. Signed-off-by: Jan Beulich <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-09	Merge commit 'v2.6.38-rc8' into x86/asm	Ingo Molnar	21	-41/+93
	Merge reason: Update with the latest fixes. Signed-off-by: Ingo Molnar <[email protected]>
2011-03-09	x86: Fix binutils-2.21 symbol related build failures	Sedat Dilek	2	-2/+2
	New binutils version 2.21.0.20110302-1 started checking that the symbol parameter to the .size directive matches the entry name's symbol parameter, unearthing two mismatches: AS arch/x86/kernel/acpi/wakeup_rm.o arch/x86/kernel/acpi/wakeup_rm.S: Assembler messages: arch/x86/kernel/acpi/wakeup_rm.S:12: Error: .size expression with symbol `wakeup_code_start' does not evaluate to a constant arch/x86/kernel/entry_32.S: Assembler messages: arch/x86/kernel/entry_32.S:1421: Error: .size expression with symbol `apf_page_fault' does not evaluate to a constant The problem was discovered while using Debian's binutils (2.21.0.20110302-1) and experimenting with binutils from upstream. Thanks Alexander and H.J. for the vital help. Signed-off-by: Sedat Dilek <[email protected]> Cc: Alexander van Heukelum <[email protected]> Cc: H.J. Lu <[email protected]> Cc: Len Brown <[email protected]> Cc: Pavel Machek <[email protected]> Cc: Rafael J. Wysocki <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-08	kprobes: Disabling optimized kprobes for entry text section	Jiri Olsa	1	-0/+8
	You can crash the kernel (with root/admin privileges) using kprobe tracer by running: echo "p system_call_after_swapgs" > ./kprobe_events echo 1 > ./events/kprobes/enable The reason is that at the system_call_after_swapgs label, the kernel stack is not set up. If optimized kprobes are enabled, the user space stack is being used in this case (see optimized kprobe template) and this might result in a crash. There are several places like this over the entry code (entry_$BIT). As it seems there's no any reasonable/maintainable way to disable only those places where the stack is not ready, I switched off the whole entry code from kprobe optimizing. Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-08	x86: Separate out entry text section	Jiri Olsa	4	-4/+11
	Put x86 entry code into a separate link section: .entry.text. Separating the entry text section seems to have performance benefits - caused by more efficient instruction cache usage. Running hackbench with perf stat --repeat showed that the change compresses the icache footprint. The icache load miss rate went down by about 15%: before patch: 19417627 L1-icache-load-misses ( +- 0.147% ) after patch: 16490788 L1-icache-load-misses ( +- 0.180% ) The motivation of the patch was to fix a particular kprobes bug that relates to the entry text section, the performance advantage was discovered accidentally. Whole perf output follows: - results for current tip tree: Performance counter stats for './hackbench/hackbench 10' (500 runs): 19417627 L1-icache-load-misses ( +- 0.147% ) 2676914223 instructions # 0.497 IPC ( +- 0.079% ) 5389516026 cycles ( +- 0.144% ) 0.206267711 seconds time elapsed ( +- 0.138% ) - results for current tip tree with the patch applied: Performance counter stats for './hackbench/hackbench 10' (500 runs): 16490788 L1-icache-load-misses ( +- 0.180% ) 2717734941 instructions # 0.502 IPC ( +- 0.079% ) 5414756975 cycles ( +- 0.148% ) 0.206747566 seconds time elapsed ( +- 0.137% ) Signed-off-by: Jiri Olsa <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-08	Merge commit 'v2.6.38-rc8' into perf/core	Ingo Molnar	11	-23/+49
	Merge reason: Merge latest fixes. Signed-off-by: Ingo Molnar <[email protected]>
2011-03-08	KEYS: Add an iovec version of KEYCTL_INSTANTIATE	David Howells	1	-0/+5
	Add a keyctl op (KEYCTL_INSTANTIATE_IOV) that is like KEYCTL_INSTANTIATE, but takes an iovec array and concatenates the data in-kernel into one buffer. Since the KEYCTL_INSTANTIATE copies the data anyway, this isn't too much of a problem. Signed-off-by: David Howells <[email protected]> Signed-off-by: James Morris <[email protected]>
2011-03-05	x86: Really print supported CPUs if PROCESSOR_SELECT=y	Jan Beulich	1	-2/+2
	I'm sure it was a mere oversight that the CONFIG_ prefixes are missing. Signed-off-by: Jan Beulich <[email protected]> Cc: Dave Jones <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-05	Merge branch 'x86-mm' of ↵	Ingo Molnar	5	-101/+79
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc into x86/mm
2011-03-05	perf: Avoid the percore allocations if the CPU is not HT capable	Lin Ming	3	-6/+23
	Signed-off-by: Lin Ming <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-03-04	x86-64, NUMA: Don't assume phys node 0 is always online in numa_emulation()	Tejun Heo	1	-1/+15
	Undetermined entries in emu_nid_to_phys[] are filled with zero assuming that physical node 0 is always online; however, this might not be true depending on hardware configuration. Find a physical node which is actually online and use it instead. Signed-off-by: Tejun Heo <[email protected]> Reported-by: David Rientjes <[email protected]> LKML-Reference: <[email protected]>
2011-03-04	x86, numa: Fix numa_emulation code with memory-less node0	Yinghai Lu	1	-5/+1
	This crash happens on a system that does not have RAM on node0. When numa_emulation is compiled in, and: 1. we boot the system without numa=fake... 2. or we boot the system with numa=fake=128 to make emulation fail we will get: [ 0.076025] ------------[ cut here ]------------ [ 0.080004] kernel BUG at arch/x86/mm/numa_64.c:788! [ 0.080004] invalid opcode: 0000 [#1] SMP [...] need to use early_cpu_to_node() directly, because cpu_to_apicid and apicid_to_node will return node0 that is not onlined. Signed-off-by: Yinghai Lu <[email protected]> Acked-by: Tejun Heo <[email protected]> Cc: David Rientjes <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>