aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2010-05-09x86, hypervisor: Export the x86_hyper* symbolsH. Peter Anvin3-1/+3
Export x86_hyper and the related specific structures, allowing for hypervisor identification by modules. Signed-off-by: H. Peter Anvin <[email protected]> Cc: Greg KH <[email protected]> Cc: Hank Janssen <[email protected]> Cc: Alok Kataria <[email protected]> Cc: Ky Srinivasan <[email protected]> Cc: Dmitry Torokhov <[email protected]> LKML-Reference: <[email protected]>
2010-05-08Merge commit 'v2.6.34-rc6' into x86/cpuH. Peter Anvin107-260/+472
2010-05-08x86, perf: P4 PMU -- check for proper event index in RAW eventsCyrill Gorcunov1-1/+10
RAW events are special and we should be ready for user passing in insane event index values. Signed-off-by: Cyrill Gorcunov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Lin Ming <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-08x86, perf: P4 PMU -- Get rid of redundant check for array indexCyrill Gorcunov1-5/+0
The caller already has done such a check. And it was wrong anyway, it had to be '>=' rather than '>' Signed-off-by: Cyrill Gorcunov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Lin Ming <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-08x86, perf: P4 PMU -- protect sensible procedures from preemptionCyrill Gorcunov1-2/+6
Steven reported: | | I'm getting: | | Pid: 3477, comm: perf Not tainted 2.6.34-rc6 #2727 | Call Trace: | [<ffffffff811c7565>] debug_smp_processor_id+0xd5/0xf0 | [<ffffffff81019874>] p4_hw_config+0x2b/0x15c | [<ffffffff8107acbc>] ? trace_hardirqs_on_caller+0x12b/0x14f | [<ffffffff81019143>] hw_perf_event_init+0x468/0x7be | [<ffffffff810782fd>] ? debug_mutex_init+0x31/0x3c | [<ffffffff810c68b2>] T.850+0x273/0x42e | [<ffffffff810c6cab>] sys_perf_event_open+0x23e/0x3f1 | [<ffffffff81009e6a>] ? sysret_check+0x2e/0x69 | [<ffffffff81009e32>] system_call_fastpath+0x16/0x1b | | When running perf record in latest tip/perf/core | Due to the fact that p4 counters are shared between HT threads we synthetically divide the whole set of counters into two non-intersected subsets. And while we're "borrowing" counters from these subsets we should not be preempted (well, strictly speaking in p4_hw_config we just pre-set reference to the subset which allow to save some cycles in schedule routine if it happens on the same cpu). So use get_cpu/put_cpu pair. Also p4_pmu_schedule_events should use smp_processor_id rather than raw_ version. This allow us to catch up preemption issue (if there will ever be). Reported-by: Steven Rostedt <[email protected]> Tested-by: Steven Rostedt <[email protected]> Signed-off-by: Cyrill Gorcunov <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Lin Ming <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-08x86, perf: P4 PMU -- configure predefined eventsCyrill Gorcunov1-15/+14
If an event is not RAW we should not exit p4_hw_config early but call x86_setup_perfctr as well. Signed-off-by: Cyrill Gorcunov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Lin Ming <[email protected]> Cc: Robert Richter <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-07x86: Clean up the hypervisor layerH. Peter Anvin8-107/+117
Clean up the hypervisor layer and the hypervisor drivers, using an ops structure instead of an enumeration with if statements. The identity of the hypervisor, if needed, can be tested by testing the pointer value in x86_hyper. The MS-HyperV private state is moved into a normal global variable (it's per-system state, not per-CPU state). Being a normal bss variable, it will be left at all zero on non-HyperV platforms, and so can generally be tested for HyperV-specific features without additional qualification. Signed-off-by: H. Peter Anvin <[email protected]> Acked-by: Greg KH <[email protected]> Cc: Hank Janssen <[email protected]> Cc: Alok Kataria <[email protected]> Cc: Ky Srinivasan <[email protected]> LKML-Reference: <[email protected]>
2010-05-07x86, HyperV: fix up the license to mshyperv.cGreg Kroah-Hartman1-12/+1
This should have been GPLv2 only, we cut and pasted from the wrong file originally, sorry. Also removed some unneeded boilerplate license code, we all know where to find the GPLv2, and that there's no warranty as that is implicit from the license. Cc: Ky Srinivasan <[email protected]> Cc: Hank Janssen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-07x86: Avoid check hlt for newer cpusJacob Pan1-1/+1
Check hlt instruction was targeted for some older CPUs. It is an expensive operation in that it takes 4 ticks to break out the check. We can avoid such check completely for newer x86 cpus (family >= 5). [ hpa: corrected family > 5 to family >= 5 ] Signed-off-by: Jacob Pan <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-07perf, x86: implement group scheduling transactional APIsLin Ming1-113/+67
Convert to the transactional PMU API and remove the duplication of group_sched_in(). Reviewed-by: Stephane Eranian <[email protected]> Signed-off-by: Lin Ming <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Cc: David Miller <[email protected]> Cc: Paul Mackerras <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-07perf, x86: Improve the PEBS ABIPeter Zijlstra3-9/+24
Rename perf_event_attr::precise to perf_event_attr::precise_ip and widen it to 2 bits. This new field describes the required precision of the PERF_SAMPLE_IP field: 0 - SAMPLE_IP can have arbitrary skid 1 - SAMPLE_IP must have constant skid 2 - SAMPLE_IP requested to have 0 skid 3 - SAMPLE_IP must have 0 skid And modify the Intel PEBS code accordingly. The PEBS implementation now supports up to precise_ip == 2, where we perform the IP fixup. Also s/PERF_RECORD_MISC_EXACT/&_IP/ to clarify its meaning, this bit should be set for each PERF_SAMPLE_IP field known to match the actual instruction triggering the event. This new scheme allows for a PEBS mode that uses the buffer for more than a single event. Signed-off-by: Peter Zijlstra <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Stephane Eranian <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-07perf, x86: Consolidate some code repetitionPeter Zijlstra1-53/+44
Remove some duplicated logic. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-07perf, x86: Remove PEBS SAMPLE_RAW supportPeter Zijlstra1-14/+0
Its broken, we really should get PERF_SAMPLE_REGS sorted. Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-07perf, x86: Use weight instead of cmask in for_each_event_constraint()Robert Richter1-1/+1
There may exist constraints with a cmask set to zero. In this case for_each_event_constraint() will not work properly. Now weight is used instead of the cmask for loop exit detection. Weight is always a value other than zero since the default contains the HWEIGHT from the counter mask and in other cases a value of zero does not fit too. This is in preparation of ibs event constraints that wont have a cmask. Signed-off-by: Robert Richter <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-07perf, x86: Pass enable bit mask to __x86_pmu_enable_event()Robert Richter2-6/+8
To reuse this function for events with different enable bit masks, this mask is part of the function's argument list now. The function will be used later to control ibs events too. Signed-off-by: Robert Richter <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-07perf, x86: Call x86_setup_perfctr() from .hw_config()Robert Richter2-8/+3
The perfctr setup calls are in the corresponding .hw_config() functions now. This makes it possible to introduce config functions for other pmu events that are not perfctr specific. Also, all of a sudden the code looks much nicer. Signed-off-by: Robert Richter <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-07perf, x86: Move x86_setup_perfctr()Robert Richter1-61/+59
Move x86_setup_perfctr(), no other changes made. Signed-off-by: Robert Richter <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-07perf, x86: Move perfctr init code to x86_setup_perfctr()Robert Richter1-6/+14
Split __hw_perf_event_init() to configure pmu events other than perfctrs. Perfctr code is moved to a separate function x86_setup_perfctr(). This and the following patches refactor the code. Split in multiple patches for better review. Signed-off-by: Robert Richter <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-07Merge branch 'perf/urgent' into perf/coreIngo Molnar6-2/+23
Merge reason: Resolve patch dependency Signed-off-by: Ingo Molnar <[email protected]>
2010-05-06x86: Detect running on a Microsoft HyperV systemKy Srinivasan6-4/+91
This patch integrates HyperV detection within the framework currently used by VmWare. With this patch, we can avoid having to replicate the HyperV detection code in each of the Microsoft HyperV drivers. Reworked and tweaked by Greg K-H to build properly. Signed-off-by: K. Y. Srinivasan <[email protected]> LKML-Reference: <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Vadim Rozenfeld <[email protected]> Cc: Avi Kivity <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: "K.Prasad" <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Alan Cox <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: Hank Janssen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-06oprofile/x86: make AMD IBS hotplug capableRobert Richter3-41/+19
Current IBS code is not hotplug capable. An offline cpu might not be initialized or deinitialized properly. This patch fixes this by removing on_each_cpu() functions. The IBS init/deinit code is executed in the per-cpu functions model->setup_ctrs() and model->cpu_down() which are also called by hotplug notifiers. model->cpu_down() replaces model->exit() that became obsolete. Cc: Andi Kleen <[email protected]> Signed-off-by: Robert Richter <[email protected]>
2010-05-06oprofile/x86: notify cpus only when daemon is runningRobert Richter1-13/+3
This patch moves the cpu notifier registration from nmi_init() to nmi_setup(). The corresponding unregistration function is now in nmi_shutdown(). Thus, the hotplug code is only active, if the oprofile daemon is running. Cc: Andi Kleen <[email protected]> Signed-off-by: Robert Richter <[email protected]>
2010-05-06x86: Fix fake apicid to node mapping for numa emulationDavid Rientjes1-1/+2
With NUMA emulation, it's possible for a single cpu to be bound to multiple nodes since more than one may have affinity if allocated on a physical node that is local to the cpu. APIC ids must therefore be mapped to the lowest node ids to maintain generic kernel use of functions such as cpu_to_node() that determine device affinity. For example, if a device has proximity to physical node 1, for instance, and a cpu happens to be mapped to a higher emulated node id 8, the proximity may not be correctly determined by comparison in generic code even though the cpu may be truly local and allocated on physical node 1. When this happens, the true topology of the machine isn't accurately represented in the emulated environment; although this isn't critical to the system's uptime, any generic code that is NUMA aware benefits from the physical topology being accurately represented. This can affect any system that maps multiple APIC ids to a single node and is booted with numa=fake=N where N is greater than the number of physical nodes. Signed-off-by: David Rientjes <[email protected]> Cc: Yinghai Lu <[email protected]> Cc: Suresh Siddha <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-06x86, acpi/irq: Define gsi_end when X86_IO_APIC is undefinedEric W. Biederman1-0/+1
My recent changes introducing a global gsi_end variable failed to take into account the case of using acpi on a system not built to support IO_APICs, causing the build to fail. Define gsi_end to 15 when CONFIG_X86_IO_APIC is not set to avoid compile errors. Signed-off-by: Eric W. Biederman <[email protected]> Cc: Yinghai Lu <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2010-05-04Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds5-1/+22
git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip: powernow-k8: Fix frequency reporting x86: Fix parse_reservetop() build failure on certain configs x86: Fix NULL pointer access in irq_force_complete_move() for Xen guests x86: Fix 'reservetop=' functionality
2010-05-04Fix the x86_64 implementation of call_rwsem_wait()David Howells1-1/+1
The x86_64 call_rwsem_wait() treats the active state counter part of the R/W semaphore state as being 16-bit when it's actually 32-bit (it's half of the 64-bit state). It should do "decl %edx" not "decw %dx". Signed-off-by: David Howells <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-04x86, irq: Kill io_apic_renumber_irqEric W. Biederman4-31/+0
Now that the generic irq layer is performing the exact same remapping as io_apic_renumber_irq we can kill this weird es7000 specific function. Signed-off-by: Eric W. Biederman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-04x86, acpi/irq: Handle isa irqs that are not identity mapped to gsi's.Eric W. Biederman2-6/+59
ACPI irq source overrides are allowed for the 16 isa irqs and are allowed to map any gsi to any isa irq. A few motherboards have been seen to take advantage of this and put the isa irqs on the 2nd or 3rd ioapic. This causes some problems, most notably the fact that we can not use any gsi < 16. To correct this move the gsis that are not isa irqs and have a gsi number < 16 into the linux irq space just past gsi_end. This is what the es7000 platform is doing today. Moving only the low 16 gsis above the rest of the gsi's only penalizes weird platforms, leaving sane acpi implementations with a 1-1 mapping of gsis and irqs. Signed-off-by: Eric W. Biederman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-04x86, ioapic: Simplify probe_nr_irqs_gsi.Eric W. Biederman3-43/+3
Use the global gsi_end value now that all ioapics have valid gsi numbers instead of a combination of acpi_probe_gsi and walking all of the ioapics and couting their number of entries by hand if acpi_probe_gsi gave us an answer we did not like. Signed-off-by: Eric W. Biederman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-04x86, ioapic: Optimize pin_2_irqEric W. Biederman1-9/+4
Now that all ioapics have valid gsi_base values use this to accellerate pin_2_irq. In the case of acpi this also ensures that pin_2_irq will compute the same irq value for an ioapic pin as acpi will. Signed-off-by: Eric W. Biederman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-04x86, ioapic: Move nr_ioapic_registers calculation to mp_register_ioapic.Eric W. Biederman1-14/+8
Now that all ioapic registration happens in mp_register_ioapic we can move the calculation of nr_ioapic_registers there from enable_IO_APIC. The number of ioapic registers is already calucated in mp_register_ioapic so all that really needs to be done is to save the caluclated value in nr_ioapic_registers. Signed-off-by: Eric W. Biederman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-04x86, ioapic: In mpparse use mp_register_ioapicEric W. Biederman1-24/+1
Long ago MP_ioapic_info was the primary way of setting up our ioapic data structures and mp_register_ioapic was a compatibility shim for acpi code. Now the situation is reversed and and mp_register_ioapic is the primary way of setting up our ioapic data structures. Keep the setting up of ioapic data structures uniform by having mp_register_ioapic call mp_register_ioapic. This changes a few fields: - type: is now hardset to MP_IOAPIC but type had to bey MP_IOAPIC or MP_ioapic_info would not have been called. - flags: is now hard coded to MPC_APIC_USABLE. We require flags to contain at least MPC_APIC_USEBLE in MP_ioapic_info and we don't ever examine flags so dropping a few flags that might possibly exist that we have never used is harmless. - apicaddr: Unchanged - apicver: Read from the ioapic instead of using the cached hardware value in the MP table. The real hardware value will be more accurate. - apicid: Now verified to be unique and changed if it is not. If the BIOS got this right this is a noop. If the BIOS did not fixing things appears to be the better solution. This adds gsi_base and gsi_end values to our ioapics defined with the mpatable, which will make our lives simpler later since we can always assume gsi_base and gsi_end are valid. Signed-off-by: Eric W. Biederman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-04x86, ioapic: Teach mp_register_ioapic to compute a global gsi_endEric W. Biederman3-3/+8
Add the global variable gsi_end and teach mp_register_ioapic to keep it uptodate as we add more ioapics into the system. ioapics can only be added early in boot so the code that runs later can treat gsi_end as a constant. Remove the have hacks in sfi.c to second guess mp_register_ioapic by keeping t's own running total of how many gsi's have been seen, and instead use the gsi_end. Signed-off-by: Eric W. Biederman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-04x86, ioapic: Fix the types of gsi valuesEric W. Biederman2-7/+7
This patches fixes the types of gsi_base and gsi_end values in struct mp_ioapic_gsi, and the gsi parameter of mp_find_ioapic and mp_find_ioapic_pin A gsi is cannonically a u32, not an int. Signed-off-by: Eric W. Biederman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-04x86, ioapic: Fix io_apic_redir_entries to return the number of entries.Eric W. Biederman1-3/+7
io_apic_redir_entries has a huge conceptual bug. It returns the maximum redirection entry not the number of redirection entries. Which simply does not match what the name of the function. This just caught me and it caught Feng Tang, and Len Brown when they wrote sfi_parse_ioapic. Modify io_apic_redir_entries to actually return the number of redirection entries, and fix the callers so that they properly handle receiving the number of the number of redirection table entries, instead of the number of redirection table entries less one. While the usage in sfi.c does not show up in this patch it is fixed by virtue of the fact that io_apic_redir_entries now has the semantics sfi_parse_ioapic most reasonably expects. Signed-off-by: Eric W. Biederman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-04x86, ioapic: Only export mp_find_ioapic and mp_find_ioapic_pin in io_apic.hEric W. Biederman1-4/+0
Multiple declarations of the same function in different headers is a pain to maintain. Signed-off-by: Eric W. Biederman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-04x86, acpi/irq: Generalize mp_config_acpi_legacy_irqsEric W. Biederman1-12/+18
Remove the assumption that there is not an override for isa irq 0. Instead lookup the gsi and from that lookup the ioapic and pin of each isa irq indivdually. In general this should not have any behavioural affect but in perverse cases this gets all of the details correct, instead of doing something weird. Signed-off-by: Eric W. Biederman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-04x86, acpi/irq: Fix acpi_sci_ioapic_setup so it has both bus_irq and gsiEric W. Biederman1-5/+7
Currently acpi_sci_ioapic_setup calls mp_override_legacy_irq with bus_irq == gsi, which is wrong if we are comming from an override Instead pass the bus_irq into acpi_sci_ioapic_setup. This fix was inspired by a similar fix from: Yinghai Lu <[email protected]> Signed-off-by: Eric W. Biederman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-04x86, acpi/irq: Teach acpi_get_override_irq to take a gsi not an isa_irqEric W. Biederman1-9/+14
In perverse acpi implementations the isa irqs are not identity mapped to the first 16 gsi. Furthermore at least the extended interrupt resource capability may return gsi's and not isa irqs. So since what we get from acpi is a gsi teach acpi_get_overrride_irq to operate on a gsi instead of an isa_irq. Signed-off-by: Eric W. Biederman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-04x86, acpi/irq: Introduce apci_isa_irq_to_gsiEric W. Biederman1-0/+8
There are a number of cases where the current code makes the assumption that isa irqs identity map to the first 16 acpi global system intereupts. In most instances that assumption is correct as that is the required behaviour in dual i8259 mode and the default behavior in ioapic mode. However there are some systems out there that take advantage of acpis interrupt remapping for the isa irqs to have a completely different mapping of isa_irq to gsi. Introduce acpi_isa_irq_to_gsi to perform this mapping explicitly in the code that needs it. Initially this will be just the current assumed identity mapping to ensure it's introduction does not cause regressions. Signed-off-by: Eric W. Biederman <[email protected]> LKML-Reference: <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2010-05-04oprofile/x86: reordering some functionsRobert Richter1-67/+67
Reordering some functions. Necessary for the next patch. No functional changes. Cc: Andi Kleen <[email protected]> Signed-off-by: Robert Richter <[email protected]>
2010-05-04oprofile/x86: stop disabled counters in nmi handlerRobert Richter1-2/+10
This patch adds checks to the nmi handler. Now samples are only generated and counters reenabled, if the counters are running. Otherwise the counters are stopped, if oprofile is using the nmi. In other cases it will ignore the nmi notification. Cc: Andi Kleen <[email protected]> Signed-off-by: Robert Richter <[email protected]>
2010-05-04oprofile/x86: protect cpu hotplug sectionsRobert Richter1-6/+46
This patch reworks oprofile cpu hotplug code as follows: Introduce ctr_running variable to check, if counters are running or not. The state must be known for taking a cpu on or offline and when switching counters during counter multiplexing. Protect on_each_cpu() sections with get_online_cpus()/put_online_cpu() functions. This is necessary if notifiers or states are modified. Within these sections the cpu mask may not change. Switch only between counters in nmi_cpu_switch(), if counters are running. Otherwise the switch may restart a counter though they are disabled. Add nmi_cpu_setup() and nmi_cpu_shutdown() to cpu hotplug code. The function must also be called to avoid uninitialzed counter usage. Cc: Andi Kleen <[email protected]> Signed-off-by: Robert Richter <[email protected]>
2010-05-04oprofile/x86: remove CONFIG_SMP macrosRobert Richter1-6/+1
CPU notifier register functions also exist if CONFIG_SMP is disabled. This change is part of hotplug code rework and also necessary for later patches. Cc: Andi Kleen <[email protected]> Signed-off-by: Robert Richter <[email protected]>
2010-05-04oprofile/x86: fix uninitialized counter usage during cpu hotplugRobert Richter1-2/+8
This fixes a NULL pointer dereference that is triggered when taking a cpu offline after oprofile was initialized, e.g.: $ opcontrol --init $ opcontrol --start-daemon $ opcontrol --shutdown $ opcontrol --deinit $ echo 0 > /sys/devices/system/cpu/cpu1/online See the crash dump below. Though the counter has been disabled the cpu notifier is still active and trying to use already freed counter data. This fix is for linux-stable. To proper fix this, the hotplug code must be rewritten. Thus I will leave a WARN_ON_ONCE() message with this patch. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff8132ad57>] op_amd_stop+0x2d/0x8e PGD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu1/online CPU 1 Modules linked in: Pid: 0, comm: swapper Not tainted 2.6.34-rc5-oprofile-x86_64-standard-00210-g8c00f06 #16 Anaheim/Anaheim RIP: 0010:[<ffffffff8132ad57>] [<ffffffff8132ad57>] op_amd_stop+0x2d/0x8e RSP: 0018:ffff880001843f28 EFLAGS: 00010006 RAX: 0000000000000000 RBX: 0000000000000000 RCX: dead000000200200 RDX: ffff880001843f68 RSI: dead000000100100 RDI: 0000000000000000 RBP: ffff880001843f48 R08: 0000000000000000 R09: ffff880001843f08 R10: ffffffff8102c9a5 R11: ffff88000184ea80 R12: 0000000000000000 R13: ffff88000184f6c0 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fec6a92e6f0(0000) GS:ffff880001840000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000000 CR3: 000000000163b000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process swapper (pid: 0, threadinfo ffff88042fcd8000, task ffff88042fcd51d0) Stack: ffff880001843f48 0000000000000001 ffff88042e9f7d38 ffff880001843f68 <0> ffff880001843f58 ffffffff8132a602 ffff880001843f98 ffffffff810521b3 <0> ffff880001843f68 ffff880001843f68 ffff880001843f88 ffff88042fcd9fd8 Call Trace: <IRQ> [<ffffffff8132a602>] nmi_cpu_stop+0x21/0x23 [<ffffffff810521b3>] generic_smp_call_function_single_interrupt+0xdf/0x11b [<ffffffff8101804f>] smp_call_function_single_interrupt+0x22/0x31 [<ffffffff810029f3>] call_function_single_interrupt+0x13/0x20 <EOI> [<ffffffff8102c9a5>] ? wake_up_process+0x10/0x12 [<ffffffff81008701>] ? default_idle+0x22/0x37 [<ffffffff8100896d>] c1e_idle+0xdf/0xe6 [<ffffffff813f1170>] ? atomic_notifier_call_chain+0x13/0x15 [<ffffffff810012fb>] cpu_idle+0x4b/0x7e [<ffffffff813e8a4e>] start_secondary+0x1ae/0x1b2 Code: 89 e5 41 55 49 89 fd 41 54 45 31 e4 53 31 db 48 83 ec 08 89 df e8 be f8 ff ff 48 98 48 83 3c c5 10 67 7a 81 00 74 1f 49 8b 45 08 <42> 8b 0c 20 0f 32 48 c1 e2 20 25 ff ff bf ff 48 09 d0 48 89 c2 RIP [<ffffffff8132ad57>] op_amd_stop+0x2d/0x8e RSP <ffff880001843f28> CR2: 0000000000000000 ---[ end trace 679ac372d674b757 ]--- Kernel panic - not syncing: Fatal exception in interrupt Pid: 0, comm: swapper Tainted: G D 2.6.34-rc5-oprofile-x86_64-standard-00210-g8c00f06 #16 Call Trace: <IRQ> [<ffffffff813ebd6a>] panic+0x9e/0x10c [<ffffffff810474b0>] ? up+0x34/0x39 [<ffffffff81031ccc>] ? kmsg_dump+0x112/0x12c [<ffffffff813eeff1>] oops_end+0x81/0x8e [<ffffffff8101efee>] no_context+0x1f3/0x202 [<ffffffff8101f1b7>] __bad_area_nosemaphore+0x1ba/0x1e0 [<ffffffff81028d24>] ? enqueue_task_fair+0x16d/0x17a [<ffffffff810264dc>] ? activate_task+0x42/0x53 [<ffffffff8102c967>] ? try_to_wake_up+0x272/0x284 [<ffffffff8101f1eb>] bad_area_nosemaphore+0xe/0x10 [<ffffffff813f0f3f>] do_page_fault+0x1c8/0x37c [<ffffffff81028d24>] ? enqueue_task_fair+0x16d/0x17a [<ffffffff813ee55f>] page_fault+0x1f/0x30 [<ffffffff8102c9a5>] ? wake_up_process+0x10/0x12 [<ffffffff8132ad57>] ? op_amd_stop+0x2d/0x8e [<ffffffff8132ad46>] ? op_amd_stop+0x1c/0x8e [<ffffffff8132a602>] nmi_cpu_stop+0x21/0x23 [<ffffffff810521b3>] generic_smp_call_function_single_interrupt+0xdf/0x11b [<ffffffff8101804f>] smp_call_function_single_interrupt+0x22/0x31 [<ffffffff810029f3>] call_function_single_interrupt+0x13/0x20 <EOI> [<ffffffff8102c9a5>] ? wake_up_process+0x10/0x12 [<ffffffff81008701>] ? default_idle+0x22/0x37 [<ffffffff8100896d>] c1e_idle+0xdf/0xe6 [<ffffffff813f1170>] ? atomic_notifier_call_chain+0x13/0x15 [<ffffffff810012fb>] cpu_idle+0x4b/0x7e [<ffffffff813e8a4e>] start_secondary+0x1ae/0x1b2 ------------[ cut here ]------------ WARNING: at /local/rrichter/.source/linux/arch/x86/kernel/smp.c:118 native_smp_send_reschedule+0x27/0x53() Hardware name: Anaheim Modules linked in: Pid: 0, comm: swapper Tainted: G D 2.6.34-rc5-oprofile-x86_64-standard-00210-g8c00f06 #16 Call Trace: <IRQ> [<ffffffff81017f32>] ? native_smp_send_reschedule+0x27/0x53 [<ffffffff81030ee2>] warn_slowpath_common+0x77/0xa4 [<ffffffff81030f1e>] warn_slowpath_null+0xf/0x11 [<ffffffff81017f32>] native_smp_send_reschedule+0x27/0x53 [<ffffffff8102634b>] resched_task+0x60/0x62 [<ffffffff8102653a>] check_preempt_curr_idle+0x10/0x12 [<ffffffff8102c8ea>] try_to_wake_up+0x1f5/0x284 [<ffffffff8102c986>] default_wake_function+0xd/0xf [<ffffffff810a110d>] pollwake+0x57/0x5a [<ffffffff8102c979>] ? default_wake_function+0x0/0xf [<ffffffff81026be5>] __wake_up_common+0x46/0x75 [<ffffffff81026ed0>] __wake_up+0x38/0x50 [<ffffffff81031694>] printk_tick+0x39/0x3b [<ffffffff8103ac37>] update_process_times+0x3f/0x5c [<ffffffff8104dc63>] tick_periodic+0x5d/0x69 [<ffffffff8104dc90>] tick_handle_periodic+0x21/0x71 [<ffffffff81018fd0>] smp_apic_timer_interrupt+0x82/0x95 [<ffffffff81002853>] apic_timer_interrupt+0x13/0x20 [<ffffffff81030cb5>] ? panic_blink_one_second+0x0/0x7b [<ffffffff813ebdd6>] ? panic+0x10a/0x10c [<ffffffff810474b0>] ? up+0x34/0x39 [<ffffffff81031ccc>] ? kmsg_dump+0x112/0x12c [<ffffffff813eeff1>] ? oops_end+0x81/0x8e [<ffffffff8101efee>] ? no_context+0x1f3/0x202 [<ffffffff8101f1b7>] ? __bad_area_nosemaphore+0x1ba/0x1e0 [<ffffffff81028d24>] ? enqueue_task_fair+0x16d/0x17a [<ffffffff810264dc>] ? activate_task+0x42/0x53 [<ffffffff8102c967>] ? try_to_wake_up+0x272/0x284 [<ffffffff8101f1eb>] ? bad_area_nosemaphore+0xe/0x10 [<ffffffff813f0f3f>] ? do_page_fault+0x1c8/0x37c [<ffffffff81028d24>] ? enqueue_task_fair+0x16d/0x17a [<ffffffff813ee55f>] ? page_fault+0x1f/0x30 [<ffffffff8102c9a5>] ? wake_up_process+0x10/0x12 [<ffffffff8132ad57>] ? op_amd_stop+0x2d/0x8e [<ffffffff8132ad46>] ? op_amd_stop+0x1c/0x8e [<ffffffff8132a602>] ? nmi_cpu_stop+0x21/0x23 [<ffffffff810521b3>] ? generic_smp_call_function_single_interrupt+0xdf/0x11b [<ffffffff8101804f>] ? smp_call_function_single_interrupt+0x22/0x31 [<ffffffff810029f3>] ? call_function_single_interrupt+0x13/0x20 <EOI> [<ffffffff8102c9a5>] ? wake_up_process+0x10/0x12 [<ffffffff81008701>] ? default_idle+0x22/0x37 [<ffffffff8100896d>] ? c1e_idle+0xdf/0xe6 [<ffffffff813f1170>] ? atomic_notifier_call_chain+0x13/0x15 [<ffffffff810012fb>] ? cpu_idle+0x4b/0x7e [<ffffffff813e8a4e>] ? start_secondary+0x1ae/0x1b2 ---[ end trace 679ac372d674b758 ]--- Cc: Andi Kleen <[email protected]> Cc: stable <[email protected]> Signed-off-by: Robert Richter <[email protected]>
2010-05-04oprofile/x86: remove duplicate IBS capability checkRobert Richter1-2/+1
The check is already done in ibs_exit(). Signed-off-by: Robert Richter <[email protected]>
2010-05-04oprofile/x86: move IBS codeRobert Richter1-110/+110
Moving code to make future changes easier. This groups all IBS code together. Signed-off-by: Robert Richter <[email protected]>
2010-05-04oprofile/x86: return -EBUSY if counters are already reservedRobert Richter5-25/+44
In case a counter is already reserved by the watchdog or perf_event subsystem, oprofile ignored this counters silently. This case is handled now and oprofile_setup() now reports an error. Signed-off-by: Robert Richter <[email protected]>
2010-05-04oprofile/x86: moving shutdown functionsRobert Richter3-49/+46
Moving some code in preparation of the next patch. Signed-off-by: Robert Richter <[email protected]>
2010-05-04oprofile/x86: reserve counter msrs pairwiseRobert Richter2-43/+36
For AMD's and Intel's P6 generic performance counters have pairwise counter and control msrs. This patch changes the counter reservation in a way that both msrs must be registered. It joins some counter loops and also removes the unnecessary NUM_CONTROLS macro in the AMD implementation. Signed-off-by: Robert Richter <[email protected]>