aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/include/asm
AgeCommit message (Collapse)AuthorFilesLines
2011-02-18x86: Remove die_nmi()Jan Beulich2-2/+0
With no caller left, the function and the DIE_NMIWATCHDOG enumerator can both go away. Signed-off-by: Jan Beulich <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Don Zickus <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-02-17x86, reboot: Move the real-mode reboot code to an assembly fileH. Peter Anvin1-1/+4
Move the real-mode reboot code out to an assembly file (reboot_32.S) which is allocated using the common lowmem trampoline allocator. Signed-off-by: H. Peter Anvin <[email protected]> LKML-Reference: <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Matthieu Castet <[email protected]>
2011-02-17x86: Make the GDT_ENTRY() macro in <asm/segment.h> safe for assemblyH. Peter Anvin1-5/+7
Make the GDT_ENTRY() macro in <asm/segment.h> safe for use in assembly code by guarding the ULL suffixes with _AC() macros. Signed-off-by: H. Peter Anvin <[email protected]> LKML-Reference: <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Matthieu Castet <[email protected]> Cc: Stephen Rothwell <[email protected]>
2011-02-17x86, trampoline: Use the unified trampoline setup for ACPI wakeupH. Peter Anvin2-12/+25
Use the unified trampoline allocation setup to allocate and install the ACPI wakeup code in low memory. Signed-off-by: H. Peter Anvin <[email protected]> LKML-Reference: <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Matthieu Castet <[email protected]> Cc: Stephen Rothwell <[email protected]>
2011-02-17intel_idle: disable Atom/Lincroft HW C-state auto-demotionLen Brown1-0/+1
Just as we had to disable auto-demotion for NHM/WSM, we need to do the same for Atom (Lincroft version). In particular, auto-demotion will prevent Lincroft from entering the S0i3 idle power saving state. https://bugzilla.kernel.org/show_bug.cgi?id=25252 Signed-off-by: Len Brown <[email protected]>
2011-02-17intel_idle: disable NHM/WSM HW C-state auto-demotionLen Brown1-0/+4
Hardware C-state auto-demotion is a mechanism where the HW overrides the OS C-state request, instead demoting to a shallower state, which is less expensive, but saves less power. Modern Linux should generally get exactly the states it requests. In particular, when a CPU is taken off-line, it must not be demoted, else it can prevent the entire package from reaching deep C-states. https://bugzilla.kernel.org/show_bug.cgi?id=25252 Signed-off-by: Len Brown <[email protected]>
2011-02-16x86-64, NUMA: Unify emulated distance mappingTejun Heo3-11/+0
NUMA emulation needs to update node distance information. It did it by remapping apicid to PXM mapping, even when amdtopology is being used. There is no reason to go through such convolution. The generic code has all the information necessary to transform the distance table to the emulated nid space. Implement generic distance table transformation in numa_emulation() and drop private implementations in srat_64 and amdtopology_64. This makes find_node_by_addr() and fake_physnodes() and related functions unnecessary, drop them. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Implement generic node distance handlingTejun Heo3-1/+3
Node distance either used direct node comparison, ACPI PXM comparison or ACPI SLIT table lookup. This patch implements generic node distance handling. NUMA init methods can call numa_set_distance() to set distance between nodes and the common __node_distance() implementation will report the set distance. Due to the way NUMA emulation is implemented, the generic node distance handling is used only when emulation is not used. Later patches will update NUMA emulation to use the generic distance mechanism. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Kill mem_nodes_parsedTejun Heo1-1/+0
With all memory configuration information now carried in numa_meminfo, there's no need to keep mem_nodes_parsed separate. Drop it and use numa_nodes_parsed for CPU / memory-less nodes. A new helper numa_nodemask_from_meminfo() is added to calculate memnode mask on the fly which is currently used to set node_possible_map. This simplifies NUMA init methods a bit and removes a source of possible inconsistencies. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Rename cpu_nodes_parsed to numa_nodes_parsedTejun Heo1-1/+1
It's no longer necessary to keep both cpu_nodes_parsed and mem_nodes_parsed. In preparation for merge, rename cpu_nodes_parsed to numa_nodes_parsed. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Kill numa_nodes[]Tejun Heo1-1/+0
numa_nodes[] doesn't carry any information which isn't present in numa_meminfo. Each entry is simply min/max range of all the memblks for the node. This is not only redundant but also inaccurate when memblks for different nodes interleave - for example, find_node_by_addr() can return the wrong nodeid. Kill numa_nodes[] and always use numa_meminfo instead. * nodes_cover_memory() is renamed to numa_meminfo_cover_memory() and now operations on numa_meminfo and returns bool. * setup_node_bootmem() needs min/max range. Compute the range on the fly. setup_node_bootmem() invocation is restructured to use outer loop instead of hardcoding the double invocations. * find_node_by_addr() now operates on numa_meminfo. * setup_physnodes() builds physnodes[] from memblks. This will go away when emulation code is updated to use struct numa_meminfo. This patch also makes the following misc changes. * Clearing of nodes_add[] clearing is converted to memset(). * numa_add_memblk() in amd_numa_init() is moved down a bit for consistency. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Add common find_node_by_addr()Tejun Heo1-0/+1
srat_64.c and amdtopology_64.c had their own versions of find_node_by_addr() which were basically the same. Add common one in numa_64.c and remove the duplicates. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Kill {acpi|amd|dummy}_scan_nodes()Tejun Heo2-2/+0
They are empty now. Kill them. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Unify use of memblk in all init methodsTejun Heo1-4/+0
Make both amd and dummy use numa_add_memblk() to describe the detected memory blocks. This allows initmem_init() to call numa_register_memblk() regardless of init method in use. Drop custom memory registration codes from amd and dummy. After this change, memblk merge/cleanup in numa_register_memblks() is applied to all init methods. As this makes compute_hash_shift() and numa_register_memblks() used only inside numa_64.c, make them static. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Factor out memblk handling into numa_{add|register}_memblk()Tejun Heo3-1/+5
Factor out memblk handling from srat_64.c into two functions in numa_64.c. This patch doesn't introduce any behavior change. The next patch will make all init methods use these functions. - v2: Fixed build failure on 32bit due to misplaced NR_NODE_MEMBLKS. Reported by Ingo. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16Merge commit 'v2.6.38-rc5' into core/lockingIngo Molnar6-41/+18
Merge reason: pick up upstream fixes. Signed-off-by: Ingo Molnar <[email protected]>
2011-02-16perf, x86: Add support for AMD family 15h core countersRobert Richter1-0/+2
This patch adds support for AMD family 15h core counters. There are major changes compared to family 10h. First, there is a new perfctr msr range for up to 6 counters. Northbridge counters are separate now. This patch only adds support for core counters. Second, certain events may only be scheduled on certain counters. For this we need to extend the event scheduling and constraints. We use cpu feature flags to calculate family 15h msr address offsets. This way we later can implement a faster ALTERNATIVE() version for this. Signed-off-by: Robert Richter <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-02-16perf, x86: P4 PMU: Fix spurious NMI messagesCyrill Gorcunov1-0/+1
Several people have reported spurious unknown NMI messages on some P4 CPUs. This patch fixes it by checking for an overflow (negative counter values) directly, instead of relying on the P4_CCCR_OVF bit. Reported-by: George Spelvin <[email protected]> Reported-by: Meelis Roos <[email protected]> Reported-by: Don Zickus <[email protected]> Reported-by: Dave Airlie <[email protected]> Signed-off-by: Cyrill Gorcunov <[email protected]> Cc: Lin Ming <[email protected]> Cc: Don Zickus <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-02-16x86-64, NUMA: Kill {acpi|amd}_get_nodes()Tejun Heo2-3/+0
With common numa_nodes[], common code in numa_64.c can access it directly. Copy directly and kill {acpi|amd}_get_nodes(). Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Use common numa_nodes[]Tejun Heo1-0/+1
ACPI and amd are using separate nodes[] array. Add numa_nodes[] and use them in all NUMA init methods. cutoff_node() cleanup is moved from srat_64.c to numa_64.c and applied in initmem_init() regardless of init methods. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Use common {cpu|mem}_nodes_parsedTejun Heo1-0/+3
ACPI and amd are using separate nodes_parsed masks. Add {cpu|mem}_nodes_parsed and use them in all NUMA init methods. Initialization of the masks and building node_possible_map are now handled commonly by initmem_init(). dummy_numa_init() is updated to set node 0 on both masks. While at it, move the info messages from scan to init. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86, NUMA: Move *_numa_init() invocations into initmem_init()Tejun Heo1-1/+1
There's no reason for these to live in setup_arch(). Move them inside initmem_init(). - v2: x86-32 initmem_init() weren't updated breaking 32bit builds. Fixed. Found by Ankita. Signed-off-by: Tejun Heo <[email protected]> Cc: Ankita Garg <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Wrap acpi_numa_init() so that failure can be indicated by ↵Tejun Heo1-0/+1
return value Because of the way ACPI tables are parsed, the generic acpi_numa_init() couldn't return failure when error was detected by arch hooks. Instead, the failure state was recorded and later arch dependent init hook - acpi_scan_nodes() - would fail. Wrap acpi_numa_init() with x86_acpi_numa_init() so that failure can be indicated as return value immediately. This is in preparation for further NUMA init cleanups. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86-64, NUMA: Unify {acpi|amd}_{numa_init|scan_nodes}() arguments and return ↵Tejun Heo2-2/+2
values The functions used during NUMA initialization - *_numa_init() and *_scan_nodes() - have different arguments and return values. Unify them such that they all take no argument and return 0 on success and -errno on failure. This is in preparation for further NUMA init cleanups. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16x86, NUMA: Drop @start/last_pfn from initmem_init()Tejun Heo1-2/+1
initmem_init() extensively accesses and modifies global data structures and the parameters aren't even followed depending on which path is being used. Drop @start/last_pfn and let it deal with @max_pfn directly. This is in preparation for further NUMA init cleanups. - v2: x86-32 initmem_init() weren't updated breaking 32bit builds. Fixed. Found by Yinghai. Signed-off-by: Tejun Heo <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Brian Gerst <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Shaohui Zheng <[email protected]> Cc: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]>
2011-02-16Merge branch 'x86/amd-nb' into x86/mmIngo Molnar1-1/+5
Merge reason: consolidate it into the more generic x86/mm tree to prevent conflicts with ongoing NUMA work. Signed-off-by: Ingo Molnar <[email protected]>
2011-02-16Merge branch 'x86/numa' into x86/mmIngo Molnar8-52/+87
Merge reason: consolidate it into the more generic x86/mm tree to prevent conflicts. Signed-off-by: Ingo Molnar <[email protected]>
2011-02-16Merge branch 'x86/bootmem' into x86/mmIngo Molnar1-0/+8
Merge reason: the topic is ready - consolidate it into the more generic x86/mm tree and prevent conflicts. Signed-off-by: Ingo Molnar <[email protected]>
2011-02-14x86/platform: Add a wallclock_init func to x86_init.timers opsFeng Tang1-0/+2
Some wall clock devices use MMIO based HW register, this new function will give them a chance to do some initialization work before their get/set_time service get called, which is usually in early kernel boot phase. Signed-off-by: Feng Tang <[email protected]> Signed-off-by: Jacob Pan <[email protected]> Signed-off-by: Alan Cox <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2011-02-14Merge commit 'v2.6.38-rc4' into x86/numaIngo Molnar2-6/+4
Merge reason: Merge latest fixes before applying new patch. Signed-off-by: Ingo Molnar <[email protected]>
2011-02-14Merge commit 'v2.6.38-rc4' into x86/cpuIngo Molnar5-43/+18
Merge reason: pick up the latest fixes. Signed-off-by: Ingo Molnar <[email protected]>
2011-02-14x86: Scale up the number of TLB invalidate vectors with NR_CPUs, up to 32Shaohua Li1-4/+9
Make the maxium TLB invalidate vectors depend on NR_CPUS linearly, with a maximum of 32 vectors. We currently only have 8 vectors for TLB invalidation and that is clearly inadequate. If we have a lot of CPUs, the CPUs need share the 8 vectors and tlbstate_lock is used to protect them. flush_tlb_page() is heavily used in page reclaim, which will cause a lot of lock contention for tlbstate_lock. Andi Kleen suggested increasing the vectors number to 32, which should be good for current typical systems to reduce the tlbstate_lock contention. My test system has 4 sockets and 64G memory, and 64 CPUs. My workload creates 64 processes. Each process mmap reads a big empty sparse file. The total size of the files are 2*total_mem, so this will cause a lot of page reclaim. Below is the result I get from perf call-graph profiling: without the patch: ------------------ 24.25% usemem [kernel] [k] _raw_spin_lock | --- _raw_spin_lock | |--42.15%-- native_flush_tlb_others with the patch: ------------------ 14.96% usemem [kernel] [k] _raw_spin_lock | --- _raw_spin_lock |--13.89%-- native_flush_tlb_others So this heavily reduces the tlbstate_lock contention. Suggested-by: Andi Kleen <[email protected]> Signed-off-by: Shaohua Li <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> LKML-Reference: <1295232727.1949.709.camel@sli10-conroe> Signed-off-by: Ingo Molnar <[email protected]>
2011-02-14x86: Allocate 32 tlb_invalidate_interrupt handler stubsShaohua Li2-1/+28
Add up to 32 invalidate_interrupt handlers. How many handlers are added depends on NUM_INVALIDATE_TLB_VECTORS. So if NUM_INVALIDATE_TLB_VECTORS is smaller than 32, we reduce code size. Signed-off-by: Shaohua Li <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Eric Dumazet <[email protected]> LKML-Reference: <1295232725.1949.708.camel@sli10-conroe> Signed-off-by: Ingo Molnar <[email protected]>
2011-02-14x86: Cleanup vector usageShaohua Li1-19/+21
Cleanup the vector usage and make them continuous if possible. Signed-off-by: Shaohua Li <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Eric Dumazet <[email protected]> LKML-Reference: <1295232722.1949.707.camel@sli10-conroe> Signed-off-by: Ingo Molnar <[email protected]>
2011-02-14x86: Fix mwait_usable section mismatchBorislav Petkov1-1/+1
We use it in non __cpuinit code now too so drop marker. Signed-off-by: Borislav Petkov <[email protected]> LKML-Reference: <20110211171754.GA21047@aftab> Signed-off-by: Ingo Molnar <[email protected]>
2011-02-14Merge branch 'linus' into x86/bootmemIngo Molnar68-308/+961
Conflicts: arch/x86/mm/numa_64.c Merge reason: fix the conflict, update to latest -rc and pick up this dependent fix from Yinghai: e6d2e2b2b1e1: memblock: don't adjust size in memblock_find_base() Signed-off-by: Ingo Molnar <[email protected]>
2011-02-10x86: Adjust section placement in AMD northbridge related codeJan Beulich1-1/+1
amd_nb_misc_ids[] can live in .rodata, and enable_pci_io_ecs() can be moved into .cpuinit.text. Signed-off-by: Jan Beulich <[email protected]> Cc: Hans Rosenfeld <[email protected]> Cc: Andreas Herrmann <[email protected]> Cc: Borislav Petkov <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-02-10x86: Fix section mismatch in LAPIC initializationJan Beulich1-0/+1
Additionally doing things conditionally upon smp_processor_id() being zero is generally a bad idea, as this means CPU 0 cannot be offlined and brought back online later again. While there may be other places where this is done, I think adding more of those should be avoided so that some day SMP can really become "symmetrical". Signed-off-by: Jan Beulich <[email protected]> Cc: Cyrill Gorcunov <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-02-07x86, amd: Support L3 Cache Partitioning on AMD family 0x15 CPUsHans Rosenfeld1-0/+3
L3 Cache Partitioning allows selecting which of the 4 L3 subcaches can be used for evictions by the L2 cache of each compute unit. By writing a 4-bit hexadecimal mask into the the sysfs file /sys/devices/system/cpu/cpuX/cache/index3/subcaches, the user can set the enabled subcaches for a CPU. The settings are directly read from and written to the hardware, so there is no way to have contradicting settings for two CPUs belonging to the same compute unit. Writing will always overwrite any previous setting for a compute unit. Signed-off-by: Hans Rosenfeld <[email protected]> Cc: <[email protected]> LKML-Reference: <[email protected]> [ -v3: minor style fixes ] Signed-off-by: Ingo Molnar <[email protected]>
2011-02-06Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds1-4/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86-32: Make sure the stack is set up before we use it x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms x86, nx: Don't force pages RW when setting NX bits
2011-02-04x86-32: Make sure the stack is set up before we use itH. Peter Anvin1-4/+1
Since checkin ebba638ae723d8a8fc2f7abce5ec18b688b791d7 we call verify_cpu even in 32-bit mode. Unfortunately, calling a function means using the stack, and the stack pointer was not initialized in the 32-bit setup code! This code initializes the stack pointer, and simplifies the interface slightly since it is easier to rely on just a pointer value rather than a descriptor; we need to have different values for the segment register anyway. This retains start_stack as a virtual address, even though a physical address would be more convenient for 32 bits; the 64-bit code wants the other way around... Reported-by: Matthieu Castet <[email protected]> LKML-Reference: <[email protected]> Tested-by: Kees Cook <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2011-02-03x86, mm: avoid possible bogus tlb entries by clearing prev mm_cpumask after ↵Suresh Siddha1-2/+3
switching mm Clearing the cpu in prev's mm_cpumask early will avoid the flush tlb IPI's while the cr3 is still pointing to the prev mm. And this window can lead to the possibility of bogus TLB fills resulting in strange failures. One such problematic scenario is mentioned below. T1. CPU-1 is context switching from mm1 to mm2 context and got a NMI etc between the point of clearing the cpu from the mm_cpumask(mm1) and before reloading the cr3 with the new mm2. T2. CPU-2 is tearing down a specific vma for mm1 and will proceed with flushing the TLB for mm1. It doesn't send the flush TLB to CPU-1 as it doesn't see that cpu listed in the mm_cpumask(mm1). T3. After the TLB flush is complete, CPU-2 goes ahead and frees the page-table pages associated with the removed vma mapping. T4. CPU-2 now allocates those freed page-table pages for something else. T5. As the CR3 and TLB caches for mm1 is still active on CPU-1, CPU-1 can potentially speculate and walk through the page-table caches and can insert new TLB entries. As the page-table pages are already freed and being used on CPU-2, this page walk can potentially insert a bogus global TLB entry depending on the (random) contents of the page that is being used on CPU-2. T6. This bogus TLB entry being global will be active across future CR3 changes and can result in weird memory corruption etc. To avoid this issue, for the prev mm that is handing over the cpu to another mm, clear the cpu from the mm_cpumask(prev) after the cr3 is changed. Marking it for -stable, though we haven't seen any reported failure that can be attributed to this. Signed-off-by: Suresh Siddha <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: [email protected] [v2.6.32+] Signed-off-by: Linus Torvalds <[email protected]>
2011-02-02x86: Add clock_adjtime for x86Richard Cochran2-1/+4
This patch adds the clock_adjtime system call to the x86 architecture. Signed-off-by: Richard Cochran <[email protected]> Acked-by: John Stultz <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2011-01-31x86: Rename incorrectly named parameter of numa_cpu_node()Tejun Heo1-1/+1
numa_cpu_node() prototype in numa_32.h has wrongly named parameter @apicid when it actually takes the CPU number. Change it to @cpu. Reported-by: Yinghai Lu <[email protected]> Signed-off-by: Tejun Heo <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-01-31Merge branch 'tip/rtmutex' of ↵Thomas Gleixner4-45/+5
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into core/locking *git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace tip/rtmutex: rtmutex: Simplify PI algorithm and make highest prio task get lock
2011-01-28x86: Fix build failure on X86_UP_APICTejun Heo1-1/+1
Commit 4c321ff8 (x86: Replace cpu_2_logical_apicid[] with early percpu variable) and following changes introduced and used x86_cpu_to_logical_apicid percpu variable. It was declared and defined inside CONFIG_SMP && CONFIG_X86_32 but if CONFIG_X86_UP_APIC is set UP configuration makes use of it and build fails. Fix it by declaring and defining it inside CONFIG_X86_LOCAL_APIC && CONFIG_X86_32. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Ingo Molnar <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2011-01-28x86: Unify NUMA initialization between 32 and 64bitTejun Heo2-3/+4
Now that everything else is unified, NUMA initialization can be unified too. * numa_init_array() and init_cpu_to_node() are moved from numa_64 to numa. * numa_32::initmem_init() is updated to call numa_init_array() and setup_arch() to call init_cpu_to_node() on 32bit too. * x86_cpu_to_node_map is now initialized to NUMA_NO_NODE on 32bit too. This is safe now as numa_init_array() will initialize it early during boot. This makes NUMA mapping fully initialized before setup_per_cpu_areas() on 32bit too and thus makes the first percpu chunk which contains all the static variables and some of dynamic area allocated with NUMA affinity correctly considered. Signed-off-by: Tejun Heo <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reported-by: Eric Dumazet <[email protected]> Reviewed-by: Pekka Enberg <[email protected]>
2011-01-28x86: Unify node_to_cpumask_map handling between 32 and 64bitTejun Heo3-5/+9
x86_32 has been managing node_to_cpumask_map explicitly from map_cpu_to_node() and friends in a rather ugly way. With previous changes, it's now possible to share the code with 64bit. * When CONFIG_NUMA_EMU is disabled, numa_add/remove_cpu() are implemented in numa.c and shared by 32 and 64bit. CONFIG_NUMA_EMU versions still live in numa_64.c. NUMA_EMU's dependency on 64bit is planned to be removed and the above should go away together. * identify_cpu() now calls numa_add_cpu() for 32bit too. This makes the explicit mask management from map_cpu_to_node() unnecessary. * The whole x86_32 specific map_cpu_to_node() chunk is no longer necessary. Dropped. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: David Rientjes <[email protected]> Cc: Shaohui Zheng <[email protected]>
2011-01-28x86: Unify CPU -> NUMA node mapping between 32 and 64bitTejun Heo3-21/+8
Unlike 64bit, 32bit has been using its own cpu_to_node_map[] for CPU -> NUMA node mapping. Replace it with early_percpu variable x86_cpu_to_node_map and share the mapping code with 64bit. * USE_PERCPU_NUMA_NODE_ID is now enabled for 32bit too. * x86_cpu_to_node_map and numa_set/clear_node() are moved from numa_64 to numa. For now, on 32bit, x86_cpu_to_node_map is initialized with 0 instead of NUMA_NO_NODE. This is to avoid introducing unexpected behavior change and will be updated once init path is unified. * srat_detect_node() is now enabled for x86_32 too. It calls numa_set_node() and initializes the mapping making explicit cpu_to_node_map[] updates from map/unmap_cpu_to_node() unnecessary. Signed-off-by: Tejun Heo <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: David Rientjes <[email protected]>
2011-01-28x86: Unify cpu/apicid <-> NUMA node mapping between 32 and 64bitTejun Heo4-4/+36
The mapping between cpu/apicid and node is done via apicid_to_node[] on 64bit and apicid_2_node[] + apic->x86_32_numa_cpu_node() on 32bit. This difference makes it difficult to further unify 32 and 64bit NUMA handling. This patch unifies it by replacing both apicid_to_node[] and apicid_2_node[] with __apicid_to_node[] array, which is accessed by two accessors - set_apicid_to_node() and numa_cpu_node(). On 64bit, numa_cpu_node() always consults __apicid_to_node[] directly while 32bit goes through apic->numa_cpu_node() method to allow apic implementations to override it. srat_detect_node() for amd cpus contains workaround for broken NUMA configuration which assumes relationship between APIC ID, HT node ID and NUMA topology. Leave it to access __apicid_to_node[] directly as mapping through CPU might result in undesirable behavior change. The comment is reformatted and updated to note the ugliness. Signed-off-by: Tejun Heo <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: David Rientjes <[email protected]>