aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2010-04-19perf: Enhance perf to allow for guest statistic collection from hostZhang, Yanmin2-11/+35
Below patch introduces perf_guest_info_callbacks and related register/unregister functions. Add more PERF_RECORD_MISC_XXX bits meaning guest kernel and guest user space. Signed-off-by: Zhang Yanmin <[email protected]> Signed-off-by: Avi Kivity <[email protected]>
2010-04-17x86, UV: uv_irq.c: Fix all sparse warningsRandy Dunlap2-7/+7
Fix all sparse warnings in building uv_irq.c. arch/x86/kernel/uv_irq.c:46:17: warning: symbol 'uv_irq_chip' was not declared. Should it be static? arch/x86/kernel/uv_irq.c:143:50: error: no identifier for function argument arch/x86/kernel/uv_irq.c:162:13: error: typename in expression arch/x86/kernel/uv_irq.c:162:13: error: undefined identifier 'restrict' arch/x86/kernel/uv_irq.c:250:44: error: no identifier for function argument arch/x86/kernel/uv_irq.c:260:17: error: typename in expression arch/x86/kernel/uv_irq.c:260:17: error: undefined identifier 'restrict' arch/x86/kernel/uv_irq.c:233:50: warning: incorrect type in argument 3 (different signedness) arch/x86/kernel/uv_irq.c:233:50: expected int *pnode arch/x86/kernel/uv_irq.c:233:50: got unsigned int *<noident> arch/x86/include/asm/uv/uv_hub.h:318:44: warning: incorrect type in argument 2 (different address spaces) arch/x86/include/asm/uv/uv_hub.h:318:44: expected void volatile [noderef] <asn:2>*addr arch/x86/include/asm/uv/uv_hub.h:318:44: got unsigned long * Signed-off-by: Randy Dunlap <[email protected]> Cc: Dimitri Sivanich <[email protected]> Cc: Russ Anderson <[email protected]> Cc: Robin Holt <[email protected]> Cc: Mike Travis <[email protected]> Cc: Cliff Wickman <[email protected]> Cc: Jack Steiner <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-04-15Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds6-28/+67
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86/gart: Disable GART explicitly before initialization dma-debug: Cleanup for copy-loop in filter_write() x86/amd-iommu: Remove obsolete parameter documentation x86/amd-iommu: use for_each_pci_dev Revert "x86: disable IOMMUs on kernel crash" x86/amd-iommu: warn when issuing command to uninitialized cmd buffer x86/amd-iommu: enable iommu before attaching devices x86/amd-iommu: Use helper function to destroy domain x86/amd-iommu: Report errors in acpi parsing functions upstream x86/amd-iommu: Pt mode fix for domain_destroy x86/amd-iommu: Protect IOMMU-API map/unmap path x86/amd-iommu: Remove double NULL check in check_device
2010-04-14x86, UV: Improve BAU performance and error recoveryCliff Wickman2-442/+1075
- increase performance of the interrupt handler - release timed-out software acknowledge resources - recover from continuous-busy status due to a hardware issue - add a 'throttle' to keep a uvhub from sending more than a specified number of broadcasts concurrently (work around the hardware issue) - provide a 'nobau' boot command line option - rename 'pnode' and 'node' to 'uvhub' (the 'node' terminology is ambiguous) - add some new statistics about the scope of broadcasts, retries, the hardware issue and the 'throttle' - split off new function uv_bau_retry_msg() from uv_bau_process_message() per community coding style feedback. - simplify the argument list to uv_bau_process_message(), per community coding style feedback. Signed-off-by: Cliff Wickman <[email protected]> Cc: [email protected] Cc: Jack Steiner <[email protected]> Cc: Russ Anderson <[email protected]> Cc: Mike Travis <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-04-14lguest: stop using KVM hypercall mechanismRusty Russell3-38/+54
This is a partial revert of 4cd8b5e2a159 "lguest: use KVM hypercalls"; we revert to using (just as questionable but more reliable) int $15 for hypercalls. I didn't revert the register mapping, so we still use the same calling convention as kvm. KVM in more recent incarnations stopped injecting a fault when a guest tried to use the VMCALL instruction from ring 1, so lguest under kvm fails to make hypercalls. It was nice to share code with our KVM cousins, but this was overreach. Signed-off-by: Rusty Russell <[email protected]> Cc: Matias Zabaljauregui <[email protected]> Cc: Avi Kivity <[email protected]>
2010-04-14x86/microcode: Use nonseekable_open()Arnd Bergmann1-2/+2
No need to seek on this file, so prevent it outright so we can avoid using default_llseek - removes one more BKL usage. Signed-off-by: Arnd Bergmann <[email protected]> [drop useless llseek = no_llseek and smp_lock.h inclusion] Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Dmitry Adamushko <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-04-13Merge branch 'iommu/fixes' of ↵Ingo Molnar6-28/+67
git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent
2010-04-09powernow-k8: Fix frequency reportingMark Langsdorf1-1/+2
With F10, model 10, all valid frequencies are in the ACPI _PST table. Cc: <[email protected]> # 33.x 32.x Signed-off-by: Mark Langsdorf <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Thomas Renninger <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-04-09x86, cpufreq: Add APERF/MPERF support for AMD processorsMark Langsdorf5-44/+72
Starting with model 10 of Family 0x10, AMD processors may have support for APERF/MPERF. Add support for identifying it and using it within cpufreq. Move the APERF/MPERF functions out of the acpi-cpufreq code and into their own file so they can easily be shared. Signed-off-by: Mark Langsdorf <[email protected]> LKML-Reference: <20100401141956.GA1930@aftab> Signed-off-by: Borislav Petkov <[email protected]> Reviewed-by: Thomas Renninger <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-04-09x86: Unify APERF/MPERF supportBorislav Petkov2-6/+8
Initialize this CPUID flag feature in common code. It could be made a standalone function later, maybe, if more functionality is duplicated. Signed-off-by: Borislav Petkov <[email protected]> LKML-Reference: <[email protected]> Reviewed-by: Thomas Renninger <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-04-09powernow-k8: Add core performance boost supportBorislav Petkov2-12/+151
Starting with F10h, revE, AMD processors add support for a dynamic core boosting feature called Core Performance Boost. When a specific condition is present, a subset of the cores on a system are boosted beyond their P0 operating frequency to speed up the performance of single-threaded applications. In the normal case, the system comes out of reset with core boosting enabled. This patch adds a sysfs knob with which core boosting can be switched on or off for benchmarking purposes. While at it, make the CPB code hotplug-aware so that taking cores offline wouldn't interfere with boosting the remaining online cores. Furthermore, add cpu_online_mask hotplug protection as suggested by Andrew. Finally, cleanup the driver init codepath and update copyrights. Signed-off-by: Borislav Petkov <[email protected]> LKML-Reference: <[email protected]> Reviewed-by: Thomas Renninger <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-04-09x86, cpu: Add AMD core boosting feature flag to /proc/cpuinfoBorislav Petkov2-2/+4
By semi-popular demand, this adds the Core Performance Boost feature flag to /proc/cpuinfo. Possible use case for this is userspace tools like cpufreq-aperf, for example, so that they don't have to jump through hoops of accessing "/dev/cpu/%d/cpuid" in order to check for CPB hw support, or call cpuid from userspace. Signed-off-by: Borislav Petkov <[email protected]> LKML-Reference: <[email protected]> Reviewed-by: Thomas Renninger <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-04-08perf: Fix unsafe frame rewinding with hot regs fetchingFrederic Weisbecker1-2/+6
When we fetch the hot regs and rewind to the nth caller, it might happen that we dereference a frame pointer outside the kernel stack boundaries, like in this example: perf_trace_sched_switch+0xd5/0x120 schedule+0x6b5/0x860 retint_careful+0xd/0x21 Since we directly dereference a userspace frame pointer here while rewinding behind retint_careful, this may end up in a crash. Fix this by simply using probe_kernel_address() when we rewind the frame pointer. This issue will have a much more proper fix in the next version of the perf_arch_fetch_caller_regs() API that will only need to rewind to the first caller. Reported-by: Eric Dumazet <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]> Tested-by: Eric Dumazet <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: David Miller <[email protected]> Cc: Archs <[email protected]>
2010-04-08x86/PCI: ignore Consumer/Producer bit in ACPI window descriptionsBjorn Helgaas1-2/+1
ACPI Address Space Descriptors (used in _CRS) have a Consumer/Producer bit that is supposed to distinguish regions that are consumed directly by a device from those that are forwarded ("produced") by a bridge. But BIOSes have apparently not used this consistently, and Windows seems to ignore it, so I think Linux should ignore it as well. I can't point to any of these supposed broken BIOSes, but since we now rely on _CRS by default, I think it's safer to ignore this bit from the start. Here are details of my experiments with how Windows handles it: https://bugzilla.kernel.org/show_bug.cgi?id=15701 Signed-off-by: Bjorn Helgaas <[email protected]> Signed-off-by: Jesse Barnes <[email protected]>
2010-04-08Merge branch 'linus' into perf/coreIngo Molnar94-43/+125
Semantic conflict: arch/x86/kernel/cpu/perf_event_intel_ds.c Merge reason: pick up latest fixes, fix the conflict Signed-off-by: Ingo Molnar <[email protected]>
2010-04-08add random binaries to .gitignoreJan III Sobieski1-0/+3
Signed-off-by: Jan III Sobieski <[email protected]> Signed-off-by: Michal Marek <[email protected]>
2010-04-07Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf, x86: Enable Nehalem-EX support perf kmem: Fix breakage introduced by 5a0e3ad slab.h script
2010-04-07Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds7-19/+53
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip: x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boards x86: Increase CONFIG_NODES_SHIFT max to 10 ibft, x86: Change reserve_ibft_region() to find_ibft_region() x86, hpet: Fix bug in RTC emulation x86, hpet: Erratum workaround for read after write of HPET comparator bootmem, x86: Fix 32bit numa system without RAM on node 0 nobootmem, x86: Fix 32bit numa system without RAM on node 0 x86: Handle overlapping mptables x86: Make e820_remove_range to handle all covered case x86-32, resume: do a global tlb flush in S4 resume
2010-04-07x86/gart: Disable GART explicitly before initializationJoerg Roedel2-1/+17
If we boot into a crash-kernel the gart might still be enabled and its caches might be dirty. This can result in undefined behavior later. Fix it by explicitly disabling the gart hardware before initialization and flushing the caches after enablement. Signed-off-by: Joerg Roedel <[email protected]>
2010-04-07Merge branch 'amd-iommu/fixes' into iommu/fixesJoerg Roedel4-27/+50
2010-04-07x86/amd-iommu: use for_each_pci_devChris Wright1-1/+1
Replace open coded version with for_each_pci_dev Signed-off-by: Chris Wright <[email protected]> Signed-off-by: Joerg Roedel <[email protected]>
2010-04-07Revert "x86: disable IOMMUs on kernel crash"Chris Wright1-6/+0
This effectively reverts commit 61d047be99757fd9b0af900d7abce9a13a337488. Disabling the IOMMU can potetially allow DMA transactions to complete without being translated. Leave it enabled, and allow crash kernel to do the IOMMU reinitialization properly. Cc: [email protected] Cc: Joerg Roedel <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Neil Horman <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Chris Wright <[email protected]> Signed-off-by: Joerg Roedel <[email protected]>
2010-04-07x86/amd-iommu: warn when issuing command to uninitialized cmd bufferChris Wright3-2/+5
To catch future potential issues we can add a warning whenever we issue a command before the command buffer is fully initialized. Signed-off-by: Chris Wright <[email protected]> Signed-off-by: Joerg Roedel <[email protected]>
2010-04-07x86/amd-iommu: enable iommu before attaching devicesChris Wright1-2/+3
Hit another kdump problem as reported by Neil Horman. When initializaing the IOMMU, we attach devices to their domains before the IOMMU is fully (re)initialized. Attaching a device will issue some important invalidations. In the context of the newly kexec'd kdump kernel, the IOMMU may have stale cached data from the original kernel. Because we do the attach too early, the invalidation commands are placed in the new command buffer before the IOMMU is updated w/ that buffer. This leaves the stale entries in the kdump context and can renders device unusable. Simply enable the IOMMU before we do the attach. Cc: [email protected] Cc: Neil Horman <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Chris Wright <[email protected]> Signed-off-by: Joerg Roedel <[email protected]>
2010-04-06x86: Add optimized popcnt variantsBorislav Petkov4-4/+73
Add support for the hardware version of the Hamming weight function, popcnt, present in CPUs which advertize it under CPUID, Function 0x0000_0001_ECX[23]. On CPUs which don't support it, we fallback to the default lib/hweight.c sw versions. A synthetic benchmark comparing popcnt with __sw_hweight64 showed almost a 3x speedup on a F10h machine. Signed-off-by: Borislav Petkov <[email protected]> LKML-Reference: <20100318112015.GC11152@aftab> Signed-off-by: H. Peter Anvin <[email protected]>
2010-04-06perf, x86: Enable Nehalem-EX supportVince Weaver1-0/+1
According to Intel Software Devel Manual Volume 3B, the Nehalem-EX PMU is just like regular Nehalem (except for the uncore support, which is completely different). Signed-off-by: Vince Weaver <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: Lin Ming <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-04-06arch/x86: Add array variants for setting memory to wc caching.Pauli Nieminen2-8/+47
Setting single memory pages at a time to wc takes a lot time in cache flush. To reduce number of cache flush set_pages_array_wc and set_memory_array_wc can be used to set multiple pages to WC with single cache flush. This improves allocation performance for wc cached pages in drm/ttm. CC: Suresh Siddha <[email protected]> CC: Venkatesh Pallipadi <[email protected]> Signed-off-by: Pauli Nieminen <[email protected]> Signed-off-by: Dave Airlie <[email protected]>
2010-04-05Merge branch 'master' into export-slabhTejun Heo10-59/+137
2010-04-04perf: Drop the frame reliablity checkFrederic Weisbecker1-2/+1
It is useless now that we have a pure stack frame walker, as given addr are always reliable. Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ingo Molnar <[email protected]>
2010-04-04ACPI: pci_root: pass acpi_pci_root to arch-specific scanBjorn Helgaas1-1/+4
The acpi_pci_root structure contains all the individual items (acpi_device, domain, bus number) we pass to pci_acpi_scan_root(), so just pass the single acpi_pci_root pointer directly. This will make it easier to add _CBA support later. For _CBA, we need the entire downstream bus range, not just the base bus number. We have that in the acpi_pci_root structure, so passing the pointer makes it available to the arch-specific code. Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Kenji Kaneshige <[email protected]> Signed-off-by: Len Brown <[email protected]>
2010-04-02x86: Fix double enable_IR_x2apic() call on SMP kernel on !SMP boardsSuresh Siddha1-0/+2
Jan Grossmann reported kernel boot panic while booting SMP kernel on his system with a single core cpu. SMP kernels call enable_IR_x2apic() from native_smp_prepare_cpus() and on platforms where the kernel doesn't find SMP configuration we ended up again calling enable_IR_x2apic() from the APIC_init_uniprocessor() call in the smp_sanity_check(). Thus leading to kernel panic. Don't call enable_IR_x2apic() and default_setup_apic_routing() from APIC_init_uniprocessor() in CONFIG_SMP case. NOTE: this kind of non-idempotent and assymetric initialization sequence is rather fragile and unclean, we'll clean that up in v2.6.35. This is the minimal fix for v2.6.34. Reported-by: [email protected] Signed-off-by: Suresh Siddha <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: <[email protected]> # [v2.6.32.x, v2.6.33.x] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-04-02perf, x86: Add Nehalem programming quirk to WestmerePeter Zijlstra1-0/+2
According to the Xeon-5600 errata the Westmere suffers the same PMU programming bug as the original Nehalem did. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2010-04-02perf, x86: Fix __initconst vs constPeter Zijlstra4-11/+11
All variables that have __initconst should also be const. Suggested-by: Stephen Rothwell <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2010-04-02perf, x86: Fix up the ANY flag stuffPeter Zijlstra5-51/+74
Stephane noticed that the ANY flag was in generic arch code, and Cyrill reported that it broke the P4 code. Solve this by merging x86_pmu::raw_event into x86_pmu::hw_config and provide intel_pmu and amd_pmu specific versions of this callback. The intel_pmu one deals with the ANY flag, the amd_pmu adds the few extra event bits AMD64 has. Reported-by: Stephane Eranian <[email protected]> Reported-by: Cyrill Gorcunov <[email protected]> Acked-by: Robert Richter <[email protected]> Acked-by: Cyrill Gorcunov <[email protected]> Acked-by: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <1269968113.5258.442.camel@laptop> Signed-off-by: Ingo Molnar <[email protected]>
2010-04-02perf, x86: implement ARCH_PERFMON_EVENTSEL bit masksRobert Richter5-89/+45
ARCH_PERFMON_EVENTSEL bit masks are often used in the kernel. This patch adds macros for the bit masks and removes local defines. The function intel_pmu_raw_event() becomes x86_pmu_raw_event() which is generic for x86 models and same also for p6. Duplicate code is removed. Signed-off-by: Robert Richter <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-04-02perf, x86: Undo some some *_counter* -> *_event* renamesRobert Richter7-65/+65
The big rename: cdd6c48 perf: Do the big rename: Performance Counters -> Performance Events accidentally renamed some members of stucts that were named after registers in the spec. To avoid confusion this patch reverts some changes. The related specs are MSR descriptions in AMD's BKDGs and the ARCHITECTURAL PERFORMANCE MONITORING section in the Intel 64 and IA-32 Architectures Software Developer's Manuals. This patch does: $ sed -i -e 's:num_events:num_counters:g' \ arch/x86/include/asm/perf_event.h \ arch/x86/kernel/cpu/perf_event_amd.c \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_p6.c \ arch/x86/kernel/cpu/perf_event_p4.c \ arch/x86/oprofile/op_model_ppro.c $ sed -i -e 's:event_bits:cntval_bits:g' -e 's:event_mask:cntval_mask:g' \ arch/x86/kernel/cpu/perf_event_amd.c \ arch/x86/kernel/cpu/perf_event.c \ arch/x86/kernel/cpu/perf_event_intel.c \ arch/x86/kernel/cpu/perf_event_p6.c \ arch/x86/kernel/cpu/perf_event_p4.c Signed-off-by: Robert Richter <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-04-02Merge branch 'perf/urgent' into perf/coreIngo Molnar18-81/+219
Conflicts: arch/x86/kernel/cpu/perf_event.c Merge reason: Resolve the conflict, pick up fixes Signed-off-by: Ingo Molnar <[email protected]>
2010-04-02perf, x86: Fix callgraphs of 32-bit processes on 64-bit kernelsTorok Edwin2-5/+44
When profiling a 32-bit process on a 64-bit kernel, callgraph tracing stopped after the first function, because it has seen a garbage memory address (tried to interpret the frame pointer, and return address as a 64-bit pointer). Fix this by using a struct stack_frame with 32-bit pointers when the TIF_IA32 flag is set. Note that TIF_IA32 flag must be used, and not is_compat_task(), because the latter is only set when the 32-bit process is executing a syscall, which may not always be the case (when tracing page fault events for example). Signed-off-by: Török Edwin <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Acked-by: Frederic Weisbecker <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: [email protected] Cc: [email protected] LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-04-02perf, x86: Fix AMD hotplug & constraint initializationPeter Zijlstra2-36/+52
Commit 3f6da39 ("perf: Rework and fix the arch CPU-hotplug hooks") moved the amd northbridge allocation from CPUS_ONLINE to CPUS_PREPARE_UP however amd_nb_id() doesn't work yet on prepare so it would simply bail basically reverting to a state where we do not properly track node wide constraints - causing weird perf results. Fix up the AMD NorthBridge initialization code by allocating from CPU_UP_PREPARE and installing it from CPU_STARTING once we have the proper nb_id. It also properly deals with the allocation failing. Signed-off-by: Peter Zijlstra <[email protected]> [ robustify using amd_has_nb() ] Signed-off-by: Stephane Eranian <[email protected]> LKML-Reference: <1269353485.5109.48.camel@twins> Signed-off-by: Ingo Molnar <[email protected]>
2010-04-02x86: Move notify_cpu_starting() callback to a later stagePeter Zijlstra1-2/+2
Because we need to have cpu identification things done by the time we run CPU_STARTING notifiers. ( This init ordering will be relied on by the next fix. ) Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <1269353485.5109.48.camel@twins> Signed-off-by: Ingo Molnar <[email protected]>
2010-04-02Merge branch 'perf/urgent' of ↵Ingo Molnar2-3/+1
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into perf/urgent
2010-04-02x86: Increase CONFIG_NODES_SHIFT max to 10David Rientjes1-2/+2
Some larger systems require more than 512 nodes, so increase the maximum CONFIG_NODES_SHIFT to 10 for a new max of 1024 nodes. This was tested with numa=fake=64M on systems with more than 64GB of RAM. A total of 1022 nodes were initialized. Successfully builds with no additional warnings on x86_64 allyesconfig. ( No effect on any existing config. Newly enabled CONFIG_MAXSMP=y will see the new default. ) Signed-off-by: David Rientjes <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-04-01ibft, x86: Change reserve_ibft_region() to find_ibft_region()Yinghai Lu1-2/+12
This allows arch code could decide the way to reserve the ibft. And we should reserve ibft as early as possible, instead of BOOTMEM stage, in case the table is in RAM range and is not reserved by BIOS (this will often be the case.) Move to just after find_smp_config(). Also when CONFIG_NO_BOOTMEM=y, We will not have reserve_bootmem() anymore. -v2: fix typo about ibft pointed by Konrad Rzeszutek Wilk <[email protected]> Signed-off-by: Yinghai Lu <[email protected]> LKML-Reference: <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Peter Jones <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> CC: Jan Beulich <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-04-01x86, hpet: Fix bug in RTC emulationAlok Kataria1-0/+1
We think there exists a bug in the HPET code that emulates the RTC. In the normal case, when the RTC frequency is set, the rtc driver tells the hpet code about it here: int hpet_set_periodic_freq(unsigned long freq) { uint64_t clc; if (!is_hpet_enabled()) return 0; if (freq <= DEFAULT_RTC_INT_FREQ) hpet_pie_limit = DEFAULT_RTC_INT_FREQ / freq; else { clc = (uint64_t) hpet_clockevent.mult * NSEC_PER_SEC; do_div(clc, freq); clc >>= hpet_clockevent.shift; hpet_pie_delta = (unsigned long) clc; } return 1; } If freq is set to 64Hz (DEFAULT_RTC_INT_FREQ) or lower, then hpet_pie_limit (a static) is set to non-zero. Then, on every one-shot HPET interrupt, hpet_rtc_timer_reinit is called to compute the next timeout. Well, that function has this logic: if (!(hpet_rtc_flags & RTC_PIE) || hpet_pie_limit) delta = hpet_default_delta; else delta = hpet_pie_delta; Since hpet_pie_limit is not 0, hpet_default_delta is used. That corresponds to 64Hz. Now, if you set a different rtc frequency, you'll take the else path through hpet_set_periodic_freq, but unfortunately no one resets hpet_pie_limit back to 0. Boom....now you are stuck with 64Hz RTC interrupts forever. The patch below just resets the hpet_pie_limit value when requested freq is greater than DEFAULT_RTC_INT_FREQ, which we think fixes this problem. Signed-off-by: Alok N Kataria <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Daniel Hecht <[email protected]> Cc: Venkatesh Pallipadi <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-04-01x86, hpet: Erratum workaround for read after write of HPET comparatorPallipadi, Venkatesh1-1/+7
On Wed, Feb 24, 2010 at 03:37:04PM -0800, Justin Piszcz wrote: > Hello, > > Again, on the Intel DP55KG board: > > # uname -a > Linux host 2.6.33 #1 SMP Wed Feb 24 18:31:00 EST 2010 x86_64 GNU/Linux > > [ 1.237600] ------------[ cut here ]------------ > [ 1.237890] WARNING: at arch/x86/kernel/hpet.c:404 hpet_next_event+0x70/0x80() > [ 1.238221] Hardware name: > [ 1.238504] hpet: compare register read back failed. > [ 1.238793] Modules linked in: > [ 1.239315] Pid: 0, comm: swapper Not tainted 2.6.33 #1 > [ 1.239605] Call Trace: > [ 1.239886] <IRQ> [<ffffffff81056c13>] ? warn_slowpath_common+0x73/0xb0 > [ 1.240409] [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0 > [ 1.240699] [<ffffffff81056cb0>] ? warn_slowpath_fmt+0x40/0x50 > [ 1.240992] [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0 > [ 1.241281] [<ffffffff81041ad0>] ? hpet_next_event+0x70/0x80 > [ 1.241573] [<ffffffff81079608>] ? tick_dev_program_event+0x38/0xc0 > [ 1.241859] [<ffffffff81078e32>] ? tick_handle_oneshot_broadcast+0xe2/0x100 > [ 1.246533] [<ffffffff8102a67a>] ? timer_interrupt+0x1a/0x30 > [ 1.246826] [<ffffffff81085499>] ? handle_IRQ_event+0x39/0xd0 > [ 1.247118] [<ffffffff81087368>] ? handle_edge_irq+0xb8/0x160 > [ 1.247407] [<ffffffff81029f55>] ? handle_irq+0x15/0x20 > [ 1.247689] [<ffffffff810294a2>] ? do_IRQ+0x62/0xe0 > [ 1.247976] [<ffffffff8146be53>] ? ret_from_intr+0x0/0xa > [ 1.248262] <EOI> [<ffffffff8102f277>] ? mwait_idle+0x57/0x80 > [ 1.248796] [<ffffffff8102645c>] ? cpu_idle+0x5c/0xb0 > [ 1.249080] ---[ end trace db7f668fb6fef4e1 ]--- > > Is this something Intel has to fix or is it a bug in the kernel? This is a chipset erratum. Thomas: You mentioned we can retain this check only for known-buggy and hpet debug kind of options. But here is the simple workaround patch for this particular erratum. Some chipsets have a erratum due to which read immediately following a write of HPET comparator returns old comparator value instead of most recently written value. Erratum 15 in "Intel I/O Controller Hub 9 (ICH9) Family Specification Update" (http://www.intel.com/assets/pdf/specupdate/316973.pdf) Workaround for the errata is to read the comparator twice if the first one fails. Signed-off-by: Venkatesh Pallipadi <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]> Cc: Venkatesh Pallipadi <[email protected]> Cc: <[email protected]>
2010-04-01x86: Handle overlapping mptablesAndi Kleen1-2/+2
We found a system where the MP table MPC and MPF structures overlap. That doesn't really matter because the mptable is not used anyways with ACPI, but it leads to a panic in the early allocator due to the overlapping reservations in 2.6.33. Earlier kernels handled this without problems. Simply change these reservations to reserve_early_overlap_ok to avoid the panic. Reported-by: Thomas Renninger <[email protected]> Tested-by: Thomas Renninger <[email protected]> Signed-off-by: Andi Kleen <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]> Cc: <[email protected]>
2010-04-01x86,kgdb: Always initialize the hw breakpoint attributeJason Wessel1-1/+1
It is required to call hw_breakpoint_init() on an attr before using it in any other calls. This fixes the problem where kgdb will sometimes fail to initialize on x86_64. Signed-off-by: Jason Wessel <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: 2.6.33 <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Frederic Weisbecker <[email protected]>
2010-04-01perf: Use hot regs with software sched switch/migrate eventsFrederic Weisbecker1-2/+0
Scheduler's task migration events don't work because they always pass NULL regs perf_sw_event(). The event hence gets filtered in perf_swevent_add(). Scheduler's context switches events use task_pt_regs() to get the context when the event occured which is a wrong thing to do as this won't give us the place in the kernel where we went to sleep but the place where we left userspace. The result is even more wrong if we switch from a kernel thread. Use the hot regs snapshot for both events as they belong to the non-interrupt/exception based events family. Unlike page faults or so that provide the regs matching the exact origin of the event, we need to save the current context. This makes the task migration event working and fix the context switch callchains and origin ip. Example: perf record -a -e cs Before: 10.91% ksoftirqd/0 0 [k] 0000000000000000 | --- (nil) perf_callchain perf_prepare_sample __perf_event_overflow perf_swevent_overflow perf_swevent_add perf_swevent_ctx_event do_perf_sw_event __perf_sw_event perf_event_task_sched_out schedule run_ksoftirqd kthread kernel_thread_helper After: 23.77% hald-addon-stor [kernel.kallsyms] [k] schedule | --- schedule | |--60.00%-- schedule_timeout | wait_for_common | wait_for_completion | blk_execute_rq | scsi_execute | scsi_execute_req | sr_test_unit_ready | | | |--66.67%-- sr_media_change | | media_changed | | cdrom_media_changed | | sr_block_media_changed | | check_disk_change | | cdrom_open v2: Always build perf_arch_fetch_caller_regs() now that software events need that too. They don't need it from modules, unlike trace events, so we keep the EXPORT_SYMBOL in trace_event_perf.c Signed-off-by: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: David Miller <[email protected]>
2010-03-31x86: Make e820_remove_range to handle all covered caseYinghai Lu1-4/+20
Rusty found on lguest with trim_bios_range, max_pfn is not right anymore, and looks e820_remove_range does not work right. [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] LGUEST: 0000000000000000 - 0000000004000000 (usable) [ 0.000000] Notice: NX (Execute Disable) protection missing in CPU or disabled in BIOS! [ 0.000000] DMI not present or invalid. [ 0.000000] last_pfn = 0x3fa0 max_arch_pfn = 0x100000 [ 0.000000] init_memory_mapping: 0000000000000000-0000000003fa0000 root cause is: the e820_remove_range doesn't handle the all covered case. e820_remove_range(BIOS_START, BIOS_END - BIOS_START, ...) produces a bogus range as a result. Make it match e820_update_range() by handling that case too. Reported-by: Rusty Russell <[email protected]> Signed-off-by: Yinghai Lu <[email protected]> Tested-by: Rusty Russell <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-03-30x86-32, resume: do a global tlb flush in S4 resumeShaohua Li1-8/+7
Colin King reported a strange oops in S4 resume code path (see below). The test system has i5/i7 CPU. The kernel doesn't open PAE, so 4M page table is used. The oops always happen a virtual address 0xc03ff000, which is mapped to the last 4k of first 4M memory. Doing a global tlb flush fixes the issue. EIP: 0060:[<c0493a01>] EFLAGS: 00010086 CPU: 0 EIP is at copy_loop+0xe/0x15 EAX: 36aeb000 EBX: 00000000 ECX: 00000400 EDX: f55ad46c ESI: 0f800000 EDI: c03ff000 EBP: f67fbec4 ESP: f67fbea8 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 ... ... CR2: 00000000c03ff000 Tested-by: Colin Ian King <[email protected]> Signed-off-by: Shaohua Li <[email protected]> LKML-Reference: <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]> Cc: <[email protected]>