Age | Commit message (Collapse) | Author | Files | Lines |
|
With no caller left, the function and the DIE_NMIWATCHDOG
enumerator can both go away.
Signed-off-by: Jan Beulich <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Don Zickus <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Move the real-mode reboot code out to an assembly file (reboot_32.S)
which is allocated using the common lowmem trampoline allocator.
Signed-off-by: H. Peter Anvin <[email protected]>
LKML-Reference: <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Matthieu Castet <[email protected]>
|
|
Make the GDT_ENTRY() macro in <asm/segment.h> safe for use in
assembly code by guarding the ULL suffixes with _AC() macros.
Signed-off-by: H. Peter Anvin <[email protected]>
LKML-Reference: <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Matthieu Castet <[email protected]>
Cc: Stephen Rothwell <[email protected]>
|
|
Use the unified trampoline allocation setup to allocate and install
the ACPI wakeup code in low memory.
Signed-off-by: H. Peter Anvin <[email protected]>
LKML-Reference: <[email protected]>
Cc: Rafael J. Wysocki <[email protected]>
Cc: Matthieu Castet <[email protected]>
Cc: Stephen Rothwell <[email protected]>
|
|
Just as we had to disable auto-demotion for NHM/WSM,
we need to do the same for Atom (Lincroft version).
In particular, auto-demotion will prevent Lincroft
from entering the S0i3 idle power saving state.
https://bugzilla.kernel.org/show_bug.cgi?id=25252
Signed-off-by: Len Brown <[email protected]>
|
|
Hardware C-state auto-demotion is a mechanism where the HW overrides
the OS C-state request, instead demoting to a shallower state,
which is less expensive, but saves less power.
Modern Linux should generally get exactly the states it requests.
In particular, when a CPU is taken off-line, it must not be demoted, else
it can prevent the entire package from reaching deep C-states.
https://bugzilla.kernel.org/show_bug.cgi?id=25252
Signed-off-by: Len Brown <[email protected]>
|
|
NUMA emulation needs to update node distance information. It did it
by remapping apicid to PXM mapping, even when amdtopology is being
used. There is no reason to go through such convolution. The generic
code has all the information necessary to transform the distance table
to the emulated nid space.
Implement generic distance table transformation in numa_emulation()
and drop private implementations in srat_64 and amdtopology_64. This
makes find_node_by_addr() and fake_physnodes() and related functions
unnecessary, drop them.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
Node distance either used direct node comparison, ACPI PXM comparison
or ACPI SLIT table lookup. This patch implements generic node
distance handling. NUMA init methods can call numa_set_distance() to
set distance between nodes and the common __node_distance()
implementation will report the set distance.
Due to the way NUMA emulation is implemented, the generic node
distance handling is used only when emulation is not used. Later
patches will update NUMA emulation to use the generic distance
mechanism.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
With all memory configuration information now carried in numa_meminfo,
there's no need to keep mem_nodes_parsed separate. Drop it and use
numa_nodes_parsed for CPU / memory-less nodes.
A new helper numa_nodemask_from_meminfo() is added to calculate
memnode mask on the fly which is currently used to set
node_possible_map.
This simplifies NUMA init methods a bit and removes a source of
possible inconsistencies.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
It's no longer necessary to keep both cpu_nodes_parsed and
mem_nodes_parsed. In preparation for merge, rename cpu_nodes_parsed
to numa_nodes_parsed.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
numa_nodes[] doesn't carry any information which isn't present in
numa_meminfo. Each entry is simply min/max range of all the memblks
for the node. This is not only redundant but also inaccurate when
memblks for different nodes interleave - for example,
find_node_by_addr() can return the wrong nodeid.
Kill numa_nodes[] and always use numa_meminfo instead.
* nodes_cover_memory() is renamed to numa_meminfo_cover_memory() and
now operations on numa_meminfo and returns bool.
* setup_node_bootmem() needs min/max range. Compute the range on the
fly. setup_node_bootmem() invocation is restructured to use outer
loop instead of hardcoding the double invocations.
* find_node_by_addr() now operates on numa_meminfo.
* setup_physnodes() builds physnodes[] from memblks. This will go
away when emulation code is updated to use struct numa_meminfo.
This patch also makes the following misc changes.
* Clearing of nodes_add[] clearing is converted to memset().
* numa_add_memblk() in amd_numa_init() is moved down a bit for
consistency.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
srat_64.c and amdtopology_64.c had their own versions of
find_node_by_addr() which were basically the same. Add common one in
numa_64.c and remove the duplicates.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
They are empty now. Kill them.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
Make both amd and dummy use numa_add_memblk() to describe the detected
memory blocks. This allows initmem_init() to call
numa_register_memblk() regardless of init method in use. Drop custom
memory registration codes from amd and dummy.
After this change, memblk merge/cleanup in numa_register_memblks() is
applied to all init methods.
As this makes compute_hash_shift() and numa_register_memblks() used
only inside numa_64.c, make them static.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
Factor out memblk handling from srat_64.c into two functions in
numa_64.c. This patch doesn't introduce any behavior change. The
next patch will make all init methods use these functions.
- v2: Fixed build failure on 32bit due to misplaced NR_NODE_MEMBLKS.
Reported by Ingo.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
Merge reason: pick up upstream fixes.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
This patch adds support for AMD family 15h core counters. There are
major changes compared to family 10h. First, there is a new perfctr
msr range for up to 6 counters. Northbridge counters are separate
now. This patch only adds support for core counters. Second, certain
events may only be scheduled on certain counters. For this we need to
extend the event scheduling and constraints.
We use cpu feature flags to calculate family 15h msr address offsets.
This way we later can implement a faster ALTERNATIVE() version for
this.
Signed-off-by: Robert Richter <[email protected]>
Signed-off-by: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Several people have reported spurious unknown NMI
messages on some P4 CPUs.
This patch fixes it by checking for an overflow (negative
counter values) directly, instead of relying on the
P4_CCCR_OVF bit.
Reported-by: George Spelvin <[email protected]>
Reported-by: Meelis Roos <[email protected]>
Reported-by: Don Zickus <[email protected]>
Reported-by: Dave Airlie <[email protected]>
Signed-off-by: Cyrill Gorcunov <[email protected]>
Cc: Lin Ming <[email protected]>
Cc: Don Zickus <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
With common numa_nodes[], common code in numa_64.c can access it
directly. Copy directly and kill {acpi|amd}_get_nodes().
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
ACPI and amd are using separate nodes[] array. Add numa_nodes[] and
use them in all NUMA init methods. cutoff_node() cleanup is moved
from srat_64.c to numa_64.c and applied in initmem_init() regardless
of init methods.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
ACPI and amd are using separate nodes_parsed masks. Add
{cpu|mem}_nodes_parsed and use them in all NUMA init methods.
Initialization of the masks and building node_possible_map are now
handled commonly by initmem_init().
dummy_numa_init() is updated to set node 0 on both masks. While at
it, move the info messages from scan to init.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
There's no reason for these to live in setup_arch(). Move them inside
initmem_init().
- v2: x86-32 initmem_init() weren't updated breaking 32bit builds.
Fixed. Found by Ankita.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Ankita Garg <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
return value
Because of the way ACPI tables are parsed, the generic
acpi_numa_init() couldn't return failure when error was detected by
arch hooks. Instead, the failure state was recorded and later arch
dependent init hook - acpi_scan_nodes() - would fail.
Wrap acpi_numa_init() with x86_acpi_numa_init() so that failure can be
indicated as return value immediately. This is in preparation for
further NUMA init cleanups.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
values
The functions used during NUMA initialization - *_numa_init() and
*_scan_nodes() - have different arguments and return values. Unify
them such that they all take no argument and return 0 on success and
-errno on failure. This is in preparation for further NUMA init
cleanups.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
initmem_init() extensively accesses and modifies global data
structures and the parameters aren't even followed depending on which
path is being used. Drop @start/last_pfn and let it deal with
@max_pfn directly. This is in preparation for further NUMA init
cleanups.
- v2: x86-32 initmem_init() weren't updated breaking 32bit builds.
Fixed. Found by Yinghai.
Signed-off-by: Tejun Heo <[email protected]>
Cc: Yinghai Lu <[email protected]>
Cc: Brian Gerst <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
Cc: Shaohui Zheng <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: H. Peter Anvin <[email protected]>
|
|
Merge reason: consolidate it into the more generic x86/mm tree to prevent conflicts
with ongoing NUMA work.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Merge reason: consolidate it into the more generic x86/mm tree to prevent conflicts.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Merge reason: the topic is ready - consolidate it into the more generic x86/mm tree
and prevent conflicts.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Some wall clock devices use MMIO based HW register, this new
function will give them a chance to do some initialization work
before their get/set_time service get called, which is usually
in early kernel boot phase.
Signed-off-by: Feng Tang <[email protected]>
Signed-off-by: Jacob Pan <[email protected]>
Signed-off-by: Alan Cox <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
Merge reason: Merge latest fixes before applying new patch.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Merge reason: pick up the latest fixes.
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Make the maxium TLB invalidate vectors depend on NR_CPUS linearly,
with a maximum of 32 vectors.
We currently only have 8 vectors for TLB invalidation and that is clearly
inadequate. If we have a lot of CPUs, the CPUs need share the 8 vectors and
tlbstate_lock is used to protect them. flush_tlb_page() is
heavily used in page reclaim, which will cause a lot of lock
contention for tlbstate_lock.
Andi Kleen suggested increasing the vectors number to 32, which should be
good for current typical systems to reduce the tlbstate_lock contention.
My test system has 4 sockets and 64G memory, and 64 CPUs. My
workload creates 64 processes. Each process mmap reads a big
empty sparse file. The total size of the files are 2*total_mem,
so this will cause a lot of page reclaim.
Below is the result I get from perf call-graph profiling:
without the patch:
------------------
24.25% usemem [kernel] [k] _raw_spin_lock
|
--- _raw_spin_lock
|
|--42.15%-- native_flush_tlb_others
with the patch:
------------------
14.96% usemem [kernel] [k] _raw_spin_lock
|
--- _raw_spin_lock
|--13.89%-- native_flush_tlb_others
So this heavily reduces the tlbstate_lock contention.
Suggested-by: Andi Kleen <[email protected]>
Signed-off-by: Shaohua Li <[email protected]>
Cc: Eric Dumazet <[email protected]>
Cc: Arnaldo Carvalho de Melo <[email protected]>
Cc: Frederic Weisbecker <[email protected]>
Cc: Peter Zijlstra <[email protected]>
LKML-Reference: <1295232727.1949.709.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Add up to 32 invalidate_interrupt handlers. How many handlers are
added depends on NUM_INVALIDATE_TLB_VECTORS. So if
NUM_INVALIDATE_TLB_VECTORS is smaller than 32, we reduce code
size.
Signed-off-by: Shaohua Li <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Eric Dumazet <[email protected]>
LKML-Reference: <1295232725.1949.708.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Cleanup the vector usage and make them continuous if possible.
Signed-off-by: Shaohua Li <[email protected]>
Cc: Andi Kleen <[email protected]>
Cc: Eric Dumazet <[email protected]>
LKML-Reference: <1295232722.1949.707.camel@sli10-conroe>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
We use it in non __cpuinit code now too so drop marker.
Signed-off-by: Borislav Petkov <[email protected]>
LKML-Reference: <20110211171754.GA21047@aftab>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Conflicts:
arch/x86/mm/numa_64.c
Merge reason: fix the conflict, update to latest -rc and pick up this
dependent fix from Yinghai:
e6d2e2b2b1e1: memblock: don't adjust size in memblock_find_base()
Signed-off-by: Ingo Molnar <[email protected]>
|
|
amd_nb_misc_ids[] can live in .rodata, and enable_pci_io_ecs()
can be moved into .cpuinit.text.
Signed-off-by: Jan Beulich <[email protected]>
Cc: Hans Rosenfeld <[email protected]>
Cc: Andreas Herrmann <[email protected]>
Cc: Borislav Petkov <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Additionally doing things conditionally upon smp_processor_id()
being zero is generally a bad idea, as this means CPU 0 cannot
be offlined and brought back online later again.
While there may be other places where this is done, I think adding
more of those should be avoided so that some day SMP can really
become "symmetrical".
Signed-off-by: Jan Beulich <[email protected]>
Cc: Cyrill Gorcunov <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
L3 Cache Partitioning allows selecting which of the 4 L3 subcaches can be used
for evictions by the L2 cache of each compute unit. By writing a 4-bit
hexadecimal mask into the the sysfs file
/sys/devices/system/cpu/cpuX/cache/index3/subcaches, the user can set the
enabled subcaches for a CPU.
The settings are directly read from and written to the hardware, so there is no
way to have contradicting settings for two CPUs belonging to the same compute
unit. Writing will always overwrite any previous setting for a compute unit.
Signed-off-by: Hans Rosenfeld <[email protected]>
Cc: <[email protected]>
LKML-Reference: <[email protected]>
[ -v3: minor style fixes ]
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86-32: Make sure the stack is set up before we use it
x86, mtrr: Avoid MTRR reprogramming on BP during boot on UP platforms
x86, nx: Don't force pages RW when setting NX bits
|
|
Since checkin ebba638ae723d8a8fc2f7abce5ec18b688b791d7 we call
verify_cpu even in 32-bit mode. Unfortunately, calling a function
means using the stack, and the stack pointer was not initialized in
the 32-bit setup code! This code initializes the stack pointer, and
simplifies the interface slightly since it is easier to rely on just a
pointer value rather than a descriptor; we need to have different
values for the segment register anyway.
This retains start_stack as a virtual address, even though a physical
address would be more convenient for 32 bits; the 64-bit code wants
the other way around...
Reported-by: Matthieu Castet <[email protected]>
LKML-Reference: <[email protected]>
Tested-by: Kees Cook <[email protected]>
Signed-off-by: H. Peter Anvin <[email protected]>
|
|
switching mm
Clearing the cpu in prev's mm_cpumask early will avoid the flush tlb
IPI's while the cr3 is still pointing to the prev mm. And this window
can lead to the possibility of bogus TLB fills resulting in strange
failures. One such problematic scenario is mentioned below.
T1. CPU-1 is context switching from mm1 to mm2 context and got a NMI
etc between the point of clearing the cpu from the mm_cpumask(mm1)
and before reloading the cr3 with the new mm2.
T2. CPU-2 is tearing down a specific vma for mm1 and will proceed with
flushing the TLB for mm1. It doesn't send the flush TLB to CPU-1
as it doesn't see that cpu listed in the mm_cpumask(mm1).
T3. After the TLB flush is complete, CPU-2 goes ahead and frees the
page-table pages associated with the removed vma mapping.
T4. CPU-2 now allocates those freed page-table pages for something
else.
T5. As the CR3 and TLB caches for mm1 is still active on CPU-1, CPU-1
can potentially speculate and walk through the page-table caches
and can insert new TLB entries. As the page-table pages are
already freed and being used on CPU-2, this page walk can
potentially insert a bogus global TLB entry depending on the
(random) contents of the page that is being used on CPU-2.
T6. This bogus TLB entry being global will be active across future CR3
changes and can result in weird memory corruption etc.
To avoid this issue, for the prev mm that is handing over the cpu to
another mm, clear the cpu from the mm_cpumask(prev) after the cr3 is
changed.
Marking it for -stable, though we haven't seen any reported failure that
can be attributed to this.
Signed-off-by: Suresh Siddha <[email protected]>
Acked-by: Ingo Molnar <[email protected]>
Cc: [email protected] [v2.6.32+]
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This patch adds the clock_adjtime system call to the x86 architecture.
Signed-off-by: Richard Cochran <[email protected]>
Acked-by: John Stultz <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Thomas Gleixner <[email protected]>
|
|
numa_cpu_node() prototype in numa_32.h has wrongly named
parameter @apicid when it actually takes the CPU number.
Change it to @cpu.
Reported-by: Yinghai Lu <[email protected]>
Signed-off-by: Tejun Heo <[email protected]>
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into core/locking
*git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace tip/rtmutex:
rtmutex: Simplify PI algorithm and make highest prio task get lock
|
|
Commit 4c321ff8 (x86: Replace cpu_2_logical_apicid[] with early
percpu variable) and following changes introduced and used
x86_cpu_to_logical_apicid percpu variable. It was declared and
defined inside CONFIG_SMP && CONFIG_X86_32 but if
CONFIG_X86_UP_APIC is set UP configuration makes use of it and
build fails.
Fix it by declaring and defining it inside CONFIG_X86_LOCAL_APIC
&& CONFIG_X86_32.
Signed-off-by: Tejun Heo <[email protected]>
Reported-by: Ingo Molnar <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
|
|
Now that everything else is unified, NUMA initialization can be
unified too.
* numa_init_array() and init_cpu_to_node() are moved from
numa_64 to numa.
* numa_32::initmem_init() is updated to call numa_init_array()
and setup_arch() to call init_cpu_to_node() on 32bit too.
* x86_cpu_to_node_map is now initialized to NUMA_NO_NODE on
32bit too. This is safe now as numa_init_array() will initialize
it early during boot.
This makes NUMA mapping fully initialized before
setup_per_cpu_areas() on 32bit too and thus makes the first
percpu chunk which contains all the static variables and some of
dynamic area allocated with NUMA affinity correctly considered.
Signed-off-by: Tejun Heo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Reported-by: Eric Dumazet <[email protected]>
Reviewed-by: Pekka Enberg <[email protected]>
|
|
x86_32 has been managing node_to_cpumask_map explicitly from
map_cpu_to_node() and friends in a rather ugly way. With
previous changes, it's now possible to share the code with
64bit.
* When CONFIG_NUMA_EMU is disabled, numa_add/remove_cpu() are
implemented in numa.c and shared by 32 and 64bit. CONFIG_NUMA_EMU
versions still live in numa_64.c.
NUMA_EMU's dependency on 64bit is planned to be removed and the
above should go away together.
* identify_cpu() now calls numa_add_cpu() for 32bit too. This
makes the explicit mask management from map_cpu_to_node() unnecessary.
* The whole x86_32 specific map_cpu_to_node() chunk is no longer
necessary. Dropped.
Signed-off-by: Tejun Heo <[email protected]>
Reviewed-by: Pekka Enberg <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Shaohui Zheng <[email protected]>
|
|
Unlike 64bit, 32bit has been using its own cpu_to_node_map[] for
CPU -> NUMA node mapping. Replace it with early_percpu variable
x86_cpu_to_node_map and share the mapping code with 64bit.
* USE_PERCPU_NUMA_NODE_ID is now enabled for 32bit too.
* x86_cpu_to_node_map and numa_set/clear_node() are moved from
numa_64 to numa. For now, on 32bit, x86_cpu_to_node_map is initialized
with 0 instead of NUMA_NO_NODE. This is to avoid introducing unexpected
behavior change and will be updated once init path is unified.
* srat_detect_node() is now enabled for x86_32 too. It calls
numa_set_node() and initializes the mapping making explicit
cpu_to_node_map[] updates from map/unmap_cpu_to_node() unnecessary.
Signed-off-by: Tejun Heo <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: David Rientjes <[email protected]>
|
|
The mapping between cpu/apicid and node is done via
apicid_to_node[] on 64bit and apicid_2_node[] +
apic->x86_32_numa_cpu_node() on 32bit. This difference makes it
difficult to further unify 32 and 64bit NUMA handling.
This patch unifies it by replacing both apicid_to_node[] and
apicid_2_node[] with __apicid_to_node[] array, which is accessed
by two accessors - set_apicid_to_node() and numa_cpu_node(). On
64bit, numa_cpu_node() always consults __apicid_to_node[]
directly while 32bit goes through apic->numa_cpu_node() method
to allow apic implementations to override it.
srat_detect_node() for amd cpus contains workaround for broken
NUMA configuration which assumes relationship between APIC ID,
HT node ID and NUMA topology. Leave it to access
__apicid_to_node[] directly as mapping through CPU might result
in undesirable behavior change. The comment is reformatted and
updated to note the ugliness.
Signed-off-by: Tejun Heo <[email protected]>
Reviewed-by: Pekka Enberg <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
LKML-Reference: <[email protected]>
Signed-off-by: Ingo Molnar <[email protected]>
Cc: David Rientjes <[email protected]>
|