aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
2013-08-27Merge branch 'pm-cpufreq'Rafael J. Wysocki1-29/+0
* pm-cpufreq: (60 commits) cpufreq: pmac32-cpufreq: remove device tree parsing for cpu nodes cpufreq: pmac64-cpufreq: remove device tree parsing for cpu nodes cpufreq: maple-cpufreq: remove device tree parsing for cpu nodes cpufreq: arm_big_little: remove device tree parsing for cpu nodes cpufreq: kirkwood-cpufreq: remove device tree parsing for cpu nodes cpufreq: spear-cpufreq: remove device tree parsing for cpu nodes cpufreq: highbank-cpufreq: remove device tree parsing for cpu nodes cpufreq: cpufreq-cpu0: remove device tree parsing for cpu nodes cpufreq: imx6q-cpufreq: remove device tree parsing for cpu nodes drivers/bus: arm-cci: avoid parsing DT for cpu device nodes ARM: mvebu: remove device tree parsing for cpu nodes ARM: topology: remove hwid/MPIDR dependency from cpu_capacity of/device: add helper to get cpu device node from logical cpu index driver/core: cpu: initialize of_node in cpu's device struture ARM: DT/kernel: define ARM specific arch_match_cpu_phys_id of: move of_get_cpu_node implementation to DT core library powerpc: refactor of_get_cpu_node to support other architectures openrisc: remove undefined of_get_cpu_node declaration microblaze: remove undefined of_get_cpu_node declaration cpufreq: fix bad unlock balance on !CONFIG_SMP ...
2013-08-27Merge branch 'acpi-assorted'Rafael J. Wysocki1-4/+7
* acpi-assorted: ACPI / osl: Kill macro INVALID_TABLE(). earlycpio.c: Fix the confusing comment of find_cpio_data(). ACPI / x86: Print Hot-Pluggable Field in SRAT. ACPI / thermal: Use THERMAL_TRIPS_NONE macro to replace number ACPI / thermal: Remove unused macros in the driver/acpi/thermal.c ACPI / thermal: Remove the unused lock of struct acpi_thermal ACPI / osl: Fix osi_setup_entries[] __initdata attribute location ACPI / numa: Fix __init attribute location in slit_valid() ACPI / dock: Fix __init attribute location in find_dock_and_bay() ACPI / Sleep: Fix incorrect placement of __initdata ACPI / processor: Fix incorrect placement of __initdata ACPI / EC: Fix incorrect placement of __initdata ACPI / scan: Drop unnecessary label from acpi_create_platform_device() ACPI: Move acpi_bus_get_device() from bus.c to scan.c ACPI / scan: Allow platform device creation without any IO resources ACPI: Cleanup sparse warning on acpi_os_initialize1() platform / thinkpad: Remove deprecated hotkey_report_mode parameter ACPI: Remove the old /proc/acpi/event interface
2013-08-27Merge branch 'acpi-sleep'Rafael J. Wysocki1-0/+10
* acpi-sleep: x86 / tboot / ACPI: Fail extended mode reduced hardware sleep xen / ACPI: notify xen when reduced hardware sleep is available ACPI / sleep: Introduce acpi_os_prepare_extended_sleep() for extended sleep path
2013-08-27Merge branch 'acpi-pci-hotplug'Rafael J. Wysocki1-0/+4
* acpi-pci-hotplug: (34 commits) ACPI / PM: Hold acpi_scan_lock over system PM transitions ACPI / hotplug / PCI: Fix NULL pointer dereference in cleanup_bridge() PCI / ACPI: Use dev_dbg() instead of dev_info() in acpi_pci_set_power_state() ACPI / hotplug / PCI: Get rid of check_sub_bridges() ACPI / hotplug / PCI: Clean up bridge_mutex usage ACPI / hotplug / PCI: Redefine enable_device() and disable_device() ACPI / hotplug / PCI: Sanitize acpiphp_get_(latch)|(adapter)_status() ACPI / hotplug / PCI: Get rid of unused constants in acpiphp.h ACPI / hotplug / PCI: Check for new devices on enabled slots ACPI / hotplug / PCI: Allow slots without new devices to be rescanned ACPI / hotplug / PCI: Do not check SLOT_ENABLED in enable_device() ACPI / hotplug / PCI: Do not exectute _PS0 and _PS3 directly ACPI / hotplug / PCI: Do not queue up event handling work items in vain ACPI / hotplug / PCI: Consolidate slot disabling and ejecting ACPI / hotplug / PCI: Drop redundant checks from check_hotplug_bridge() ACPI / hotplug / PCI: Rework namespace scanning and trimming routines ACPI / hotplug / PCI: Store parent in functions and bus in slots ACPI / hotplug / PCI: Drop handle field from struct acpiphp_bridge ACPI / hotplug / PCI: Drop handle field from struct acpiphp_func ACPI / hotplug / PCI: Embed function struct into struct acpiphp_context ...
2013-08-26Merge branch 'pci/yijing-mps-v8' into nextBjorn Helgaas1-7/+2
* pci/yijing-mps-v8: PCI: Warn if unsafe MPS settings detected PCI: Fix MPS peer-to-peer DMA comment syntax PCI: Don't restrict MPS for slots below Root Ports PCI: Simplify MPS test for Downstream Port PCI: Remove unnecessary check for pcie_get_mps() failure PCI: Simplify pcie_bus_configure_settings() interface PCI: Drop "PCI-E" prefix from Max Payload Size message
2013-08-26x86/ioapic: Check attr against the previous setting when programmed more ↵Liu Ping Fan2-4/+10
than once When programming ioapic pinX more than once, current code does not check whether the later attr (trigger & polarity) is the same as the former or not. This causes broken semantics which can be observed in a qemu q35 machine, where ioapic's ioredtbl[x] can never be set as low-active, even if the hpet driver registered it. And hpet driver may share a high-level active IRQ line with other devices. So in qemu, when hpet-dev asserts low-level as kernel expects, the kernel has no response. With this patch, we can observe an ioredtbl[x] set as low-active for hpet. Fix it by reporting -EBUSY to the caller, when attr is different. Signed-off-by: Liu Ping Fan <[email protected]> Cc: Kevin Hao <[email protected]> Cc: Len Brown <[email protected]> Cc: Yinghai Lu <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Made small readability edits to both the changelog and the code. ] Signed-off-by: Ingo Molnar <[email protected]>
2013-08-26kvm hypervisor: Simplify kvm_for_each_vcpu with kvm_irq_delivery_to_apicRaghavendra K T2-20/+10
Note that we are using APIC_DM_REMRD which has reserved usage. In future if APIC_DM_REMRD usage is standardized, then we should find some other way or go back to old method. Suggested-by: Gleb Natapov <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Gleb Natapov <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Gleb Natapov <[email protected]>
2013-08-26kvm hypervisor : Add a hypercall to KVM hypervisor to support pv-ticketlocksSrivatsa Vaddagiri3-2/+50
kvm_hc_kick_cpu allows the calling vcpu to kick another vcpu out of halt state. the presence of these hypercalls is indicated to guest via kvm_feature_pv_unhalt. Fold pv_unhalt flag into GET_MP_STATE ioctl to aid migration During migration, any vcpu that got kicked but did not become runnable (still in halted state) should be runnable after migration. Signed-off-by: Srivatsa Vaddagiri <[email protected]> Signed-off-by: Suzuki Poulose <[email protected]> [Raghu: Apic related changes, folding pvunhalted into vcpu_runnable Added flags for future use (suggested by Gleb)] [ Raghu: fold pv_unhalt flag as suggested by Eric Northup] Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Gleb Natapov <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Gleb Natapov <[email protected]>
2013-08-26kvm uapi: Add KICK_CPU and PV_UNHALT definition to uapiRaghavendra K T1-0/+1
this is needed by both guest and host. Originally-from: Srivatsa Vaddagiri <[email protected]> Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Gleb Natapov <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Gleb Natapov <[email protected]>
2013-08-23Merge back earlier 'pm-cpufreq' material.Rafael J. Wysocki1-29/+0
2013-08-22Kconfig: Remove hotplug enable hints in CONFIG_KEXEC help textsGeert Uytterhoeven1-3/+3
commit 40b313608ad4ea655addd2ec6cdd106477ae8e15 ("Finally eradicate CONFIG_HOTPLUG") removed remaining references to CONFIG_HOTPLUG, but missed a few plain English references in the CONFIG_KEXEC help texts. Remove them, too. Signed-off-by: Geert Uytterhoeven <[email protected]> Acked-by: Stephen Rothwell <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2013-08-22x86 get_unmapped_area: Access mmap_legacy_base through mm_struct memberRadu Caragea2-3/+5
This is the updated version of df54d6fa5427 ("x86 get_unmapped_area(): use proper mmap base for bottom-up direction") that only randomizes the mmap base address once. Signed-off-by: Radu Caragea <[email protected]> Reported-and-tested-by: Jeff Shorey <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Adrian Sendroiu <[email protected]> Cc: Greg KH <[email protected]> Cc: Kamal Mostafa <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-08-22Revert "x86 get_unmapped_area(): use proper mmap base for bottom-up direction"Linus Torvalds2-2/+2
This reverts commit df54d6fa54275ce59660453e29d1228c2b45a826. The commit isn't necessarily wrong, but because it recalculates the random mmap_base every time, it seems to confuse user memory allocators that expect contiguous mmap allocations even when the mmap address isn't specified. In particular, the MATLAB Java runtime seems to be unhappy. See https://bugzilla.kernel.org/show_bug.cgi?id=60774 So we'll want to apply the random offset only once, and Radu has a patch for that. Revert this older commit in order to apply the other one. Reported-by: Jeff Shorey <[email protected]> Cc: Radu Caragea <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-08-22PCI: Simplify pcie_bus_configure_settings() interfaceBjorn Helgaas1-7/+2
Based on a patch by Jon Mason (see URL below). All users of pcie_bus_configure_settings() pass arguments of the form "bus, bus->self->pcie_mpss". The "mpss" argument is redundant since we can easily look it up internally. In addition, all callers check "bus->self" for NULL, which we can also do internally. This patch simplifies the interface and the callers. No functional change. Reference: http://lkml.kernel.org/r/[email protected] Signed-off-by: Bjorn Helgaas <[email protected]>
2013-08-22x86/asmlinkage: Fix warning in xen asmlinkage changeAndi Kleen1-6/+6
Current code uses asmlinkage for functions without arguments. This adds an implicit regparm(0) which creates a warning when assigning the function to pointers. Use __visible for the functions without arguments. This avoids having to add regparm(0) to function pointers. Since they have no arguments it does not make any difference. Signed-off-by: Andi Kleen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-08-21Merge tag 'stable/for-linus-3.11-rc6-tag' of ↵Linus Torvalds2-2/+31
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen bug-fixes from Konrad Rzeszutek Wilk: - On ARM did not have balanced calls to get/put_cpu. - Fix to make tboot + Xen + Linux correctly. - Fix events VCPU binding issues. - Fix a vCPU online race where IPIs are sent to not-yet-online vCPU. * tag 'stable/for-linus-3.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/smp: initialize IPI vectors before marking CPU online xen/events: mask events when changing their VCPU binding xen/events: initialize local per-cpu mask for all possible events x86/xen: do not identity map UNUSABLE regions in the machine E820 xen/arm: missing put_cpu in xen_percpu_init
2013-08-21Merge tag 'tegra-for-3.12-soc' of ↵Kevin Hilman3-31/+24
git://git.kernel.org/pub/scm/linux/kernel/git/swarren/linux-tegra into next/soc From: Stephen Warren: ARM: tegra: core SoC enhancements for 3.12 This branch includes a number of enhancements to core SoC support for Tegra devices. The major new features are: * Adds a new CPU-power-gated cpuidle state for Tegra114. * Adds initial system suspend support for Tegra114, initially supporting just CPU-power-gating during suspend. * Adds "LP1" suspend mode support for all of Tegra20/30/114. This mode both gates CPU power, and places the DRAM into self-refresh mode. * A new DT-driven PCIe driver to Tegra20/30. The driver is also moved from arch/arm/mach-tegra/ to drivers/pci/host/. The PCIe driver work depends on the following tag from Thomas Petazzoni: git://git.infradead.org/linux-mvebu.git mis-3.12.2 ... which is merged into the middle of this pull request. * tag 'tegra-for-3.12-soc' of git://git.kernel.org/pub/scm/linux/kernel/git/swarren/linux-tegra: (33 commits) ARM: tegra: disable LP2 cpuidle state if PCIe is enabled MAINTAINERS: Add myself as Tegra PCIe maintainer PCI: tegra: set up PADS_REFCLK_CFG1 PCI: tegra: Add Tegra 30 PCIe support PCI: tegra: Move PCIe driver to drivers/pci/host PCI: msi: add default MSI operations for !HAVE_GENERIC_HARDIRQS platforms ARM: tegra: add LP1 suspend support for Tegra114 ARM: tegra: add LP1 suspend support for Tegra20 ARM: tegra: add LP1 suspend support for Tegra30 ARM: tegra: add common LP1 suspend support clk: tegra114: add LP1 suspend/resume support ARM: tegra: config the polarity of the request of sys clock ARM: tegra: add common resume handling code for LP1 resuming ARM: pci: add ->add_bus() and ->remove_bus() hooks to hw_pci of: pci: add registry of MSI chips PCI: Introduce new MSI chip infrastructure PCI: remove ARCH_SUPPORTS_MSI kconfig option PCI: use weak functions for MSI arch-specific functions ARM: tegra: unify Tegra's Kconfig a bit more ARM: tegra: remove the limitation that Tegra114 can't support suspend ... Signed-off-by: Kevin Hilman <[email protected]>
2013-08-21crypto: xor - Check for osxsave as well as avx in crypto/xorJohn Haxby1-2/+2
This affects xen pv guests with sufficiently old versions of xen and sufficiently new hardware. On such a system, a guest with a btrfs root won't even boot. Signed-off-by: John Haxby <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2013-08-21crypto: camellia-x86-64 - replace commas by semicolons and adjust code alignmentJulia Lawall1-24/+24
Adjust alignment and replace commas by semicolons in automatically generated code. Signed-off-by: Julia Lawall <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2013-08-20xen/pvhvm: Initialize xen panic handler for PVHVM guestsVaughan Cao1-0/+2
kernel use callback linked in panic_notifier_list to notice others when panic happens. NORET_TYPE void panic(const char * fmt, ...){ ... atomic_notifier_call_chain(&panic_notifier_list, 0, buf); } When Xen becomes aware of this, it will call xen_reboot(SHUTDOWN_crash) to send out an event with reason code - SHUTDOWN_crash. xen_panic_handler_init() is defined to register on panic_notifier_list but we only call it in xen_arch_setup which only be called by PV, this patch is necessary for PVHVM. Without this patch, setting 'on_crash=coredump-restart' in PVHVM guest config file won't lead a vmcore to be generate when the guest panics. It can be reproduced with 'echo c > /proc/sysrq-trigger'. Signed-off-by: Vaughan Cao <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]> Acked-by: Joe Jin <[email protected]>
2013-08-20xen/m2p: use GNTTABOP_unmap_and_replace to reinstate the original mappingStefano Stabellini1-6/+15
GNTTABOP_unmap_grant_ref unmaps a grant and replaces it with a 0 mapping instead of reinstating the original mapping. Doing so separately would be racy. To unmap a grant and reinstate the original mapping atomically we use GNTTABOP_unmap_and_replace. GNTTABOP_unmap_and_replace doesn't work with GNTMAP_contains_pte, so don't use it for kmaps. GNTTABOP_unmap_and_replace zeroes the mapping passed in new_addr so we have to reinstate it, however that is a per-cpu mapping only used for balloon scratch pages, so we can be sure that it's not going to be accessed while the mapping is not valid. Signed-off-by: Stefano Stabellini <[email protected]> Reviewed-by: David Vrabel <[email protected]> Acked-by: Konrad Rzeszutek Wilk <[email protected]> CC: [email protected] CC: [email protected] [v1: Konrad fixed up the conflicts] Conflicts: arch/x86/xen/p2m.c
2013-08-20xen/smp: initialize IPI vectors before marking CPU onlineChuck Anderson1-2/+9
An older PVHVM guest (v3.0 based) crashed during vCPU hot-plug with: kernel BUG at drivers/xen/events.c:1328! RCU has detected that a CPU has not entered a quiescent state within the grace period. It needs to send the CPU a reschedule IPI if it is not offline. rcu_implicit_offline_qs() does this check: /* * If the CPU is offline, it is in a quiescent state. We can * trust its state not to change because interrupts are disabled. */ if (cpu_is_offline(rdp->cpu)) { rdp->offline_fqs++; return 1; } Else the CPU is online. Send it a reschedule IPI. The CPU is in the middle of being hot-plugged and has been marked online (!cpu_is_offline()). See start_secondary(): set_cpu_online(smp_processor_id(), true); ... per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE; start_secondary() then waits for the CPU bringing up the hot-plugged CPU to mark it as active: /* * Wait until the cpu which brought this one up marked it * online before enabling interrupts. If we don't do that then * we can end up waking up the softirq thread before this cpu * reached the active state, which makes the scheduler unhappy * and schedule the softirq thread on the wrong cpu. This is * only observable with forced threaded interrupts, but in * theory it could also happen w/o them. It's just way harder * to achieve. */ while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask)) cpu_relax(); /* enable local interrupts */ local_irq_enable(); The CPU being hot-plugged will be marked active after it has been fully initialized by the CPU managing the hot-plug. In the Xen PVHVM case xen_smp_intr_init() is called to set up the hot-plugged vCPU's XEN_RESCHEDULE_VECTOR. The hot-plugging CPU is marked online, not marked active and does not have its IPI vectors set up. rcu_implicit_offline_qs() sees the hot-plugging cpu is !cpu_is_offline() and tries to send it a reschedule IPI: This will lead to: kernel BUG at drivers/xen/events.c:1328! xen_send_IPI_one() xen_smp_send_reschedule() rcu_implicit_offline_qs() rcu_implicit_dynticks_qs() force_qs_rnp() force_quiescent_state() __rcu_process_callbacks() rcu_process_callbacks() __do_softirq() call_softirq() do_softirq() irq_exit() xen_evtchn_do_upcall() because xen_send_IPI_one() will attempt to use an uninitialized IRQ for the XEN_RESCHEDULE_VECTOR. There is at least one other place that has caused the same crash: xen_smp_send_reschedule() wake_up_idle_cpu() add_timer_on() clocksource_watchdog() call_timer_fn() run_timer_softirq() __do_softirq() call_softirq() do_softirq() irq_exit() xen_evtchn_do_upcall() xen_hvm_callback_vector() clocksource_watchdog() uses cpu_online_mask to pick the next CPU to handle a watchdog timer: /* * Cycle through CPUs to check if the CPUs stay synchronized * to each other. */ next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask); if (next_cpu >= nr_cpu_ids) next_cpu = cpumask_first(cpu_online_mask); watchdog_timer.expires += WATCHDOG_INTERVAL; add_timer_on(&watchdog_timer, next_cpu); This resulted in an attempt to send an IPI to a hot-plugging CPU that had not initialized its reschedule vector. One option would be to make the RCU code check to not check for CPU offline but for CPU active. As becoming active is done after a CPU is online (in older kernels). But Srivatsa pointed out that "the cpu_active vs cpu_online ordering has been completely reworked - in the online path, cpu_active is set *before* cpu_online, and also, in the cpu offline path, the cpu_active bit is reset in the CPU_DYING notification instead of CPU_DOWN_PREPARE." Drilling in this the bring-up path: "[brought up CPU].. send out a CPU_STARTING notification, and in response to that, the scheduler sets the CPU in the cpu_active_mask. Again, this mask is better left to the scheduler alone, since it has the intelligence to use it judiciously." The conclusion was that: " 1. At the IPI sender side: It is incorrect to send an IPI to an offline CPU (cpu not present in the cpu_online_mask). There are numerous places where we check this and warn/complain. 2. At the IPI receiver side: It is incorrect to let the world know of our presence (by setting ourselves in global bitmasks) until our initialization steps are complete to such an extent that we can handle the consequences (such as receiving interrupts without crashing the sender etc.) " (from Srivatsa) As the native code enables the interrupts at some point we need to be able to service them. In other words a CPU must have valid IPI vectors if it has been marked online. It doesn't need to handle the IPI (interrupts may be disabled) but needs to have valid IPI vectors because another CPU may find it in cpu_online_mask and attempt to send it an IPI. This patch will change the order of the Xen vCPU bring-up functions so that Xen vectors have been set up before start_secondary() is called. It also will not continue to bring up a Xen vCPU if xen_smp_intr_init() fails to initialize it. Orabug 13823853 Signed-off-by Chuck Anderson <[email protected]> Acked-by: Srivatsa S. Bhat <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2013-08-20x86/xen: during early setup, only 1:1 map the ISA regionDavid Vrabel1-5/+11
During early setup, when the reserved regions and MMIO holes are being setup as 1:1 in the p2m, clear any mappings instead of making them 1:1 (execept for the ISA region which is expected to be mapped). This fixes a regression introduced in 3.5 by 83d51ab473dd (xen/setup: update VA mapping when releasing memory during setup) which caused hosts with tboot to fail to boot. tboot marks a region in the e820 map as unusable and the dom0 kernel would attempt to map this region and Xen does not permit unusable regions to be mapped by guests. (XEN) 0000000000000000 - 0000000000060000 (usable) (XEN) 0000000000060000 - 0000000000068000 (reserved) (XEN) 0000000000068000 - 000000000009e000 (usable) (XEN) 0000000000100000 - 0000000000800000 (usable) (XEN) 0000000000800000 - 0000000000972000 (unusable) tboot marked this region as unusable. (XEN) 0000000000972000 - 00000000cf200000 (usable) (XEN) 00000000cf200000 - 00000000cf38f000 (reserved) (XEN) 00000000cf38f000 - 00000000cf3ce000 (ACPI data) (XEN) 00000000cf3ce000 - 00000000d0000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fe000000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000630000000 (usable) Signed-off-by: David Vrabel <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2013-08-20x86/xen: disable premption when enabling local irqsDavid Vrabel1-13/+12
If CONFIG_PREEMPT is enabled then xen_enable_irq() (and xen_restore_fl()) could be preempted and rescheduled on a different VCPU in between the clear of the mask and the check for pending events. This may result in events being lost as the upcall will check for pending events on the wrong VCPU. Fix this by disabling preemption around the unmask and check for events. Signed-off-by: David Vrabel <[email protected]> Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2013-08-20x86/xen: do not identity map UNUSABLE regions in the machine E820David Vrabel1-0/+22
If there are UNUSABLE regions in the machine memory map, dom0 will attempt to map them 1:1 which is not permitted by Xen and the kernel will crash. There isn't anything interesting in the UNUSABLE region that the dom0 kernel needs access to so we can avoid making the 1:1 mapping and treat it as RAM. We only do this for dom0, as that is where tboot case shows up. A PV domU could have an UNUSABLE region in its pseudo-physical map and would need to be handled in another patch. This fixes a boot failure on hosts with tboot. tboot marks a region in the e820 map as unusable and the dom0 kernel would attempt to map this region and Xen does not permit unusable regions to be mapped by guests. (XEN) 0000000000000000 - 0000000000060000 (usable) (XEN) 0000000000060000 - 0000000000068000 (reserved) (XEN) 0000000000068000 - 000000000009e000 (usable) (XEN) 0000000000100000 - 0000000000800000 (usable) (XEN) 0000000000800000 - 0000000000972000 (unusable) tboot marked this region as unusable. (XEN) 0000000000972000 - 00000000cf200000 (usable) (XEN) 00000000cf200000 - 00000000cf38f000 (reserved) (XEN) 00000000cf38f000 - 00000000cf3ce000 (ACPI data) (XEN) 00000000cf3ce000 - 00000000d0000000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fe000000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 0000000630000000 (usable) Signed-off-by: David Vrabel <[email protected]> [v1: Altered the patch and description with domU's with UNUSABLE regions] Signed-off-by: Konrad Rzeszutek Wilk <[email protected]>
2013-08-20x86/mm: Fix boot crash with DEBUG_PAGE_ALLOC=y and more than 512G RAMYinghai Lu1-2/+2
Dave Hansen reported that systems between 500G and 600G RAM crash early if DEBUG_PAGEALLOC is selected. > [ 0.000000] init_memory_mapping: [mem 0x00000000-0x000fffff] > [ 0.000000] [mem 0x00000000-0x000fffff] page 4k > [ 0.000000] BRK [0x02086000, 0x02086fff] PGTABLE > [ 0.000000] BRK [0x02087000, 0x02087fff] PGTABLE > [ 0.000000] BRK [0x02088000, 0x02088fff] PGTABLE > [ 0.000000] init_memory_mapping: [mem 0xe80ee00000-0xe80effffff] > [ 0.000000] [mem 0xe80ee00000-0xe80effffff] page 4k > [ 0.000000] BRK [0x02089000, 0x02089fff] PGTABLE > [ 0.000000] BRK [0x0208a000, 0x0208afff] PGTABLE > [ 0.000000] Kernel panic - not syncing: alloc_low_page: ran out of memory It turns out that we missed increasing needed pages in BRK to mapping initial 2M and [0,1M) when we switched to use the #PF handler to set memory mappings: > commit 8170e6bed465b4b0c7687f93e9948aca4358a33b > Author: H. Peter Anvin <[email protected]> > Date: Thu Jan 24 12:19:52 2013 -0800 > > x86, 64bit: Use a #PF handler to materialize early mappings on demand Before that, we had the maping from [0,512M) in head_64.S, and we can spare two pages [0-1M). After that change, we can not reuse pages anymore. When we have more than 512M ram, we need an extra page for pgd page with [512G, 1024g). Increase pages in BRK for page table to solve the boot crash. Reported-by: Dave Hansen <[email protected]> Bisected-by: Dave Hansen <[email protected]> Tested-by: Dave Hansen <[email protected]> Signed-off-by: Yinghai Lu <[email protected]> Cc: <[email protected]> # v3.9 and later Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-08-20x86/ioapic/kcrash: Prevent crash_kexec() from deadlocking on ioapic_lockYoshihiro YUNOMAE3-1/+10
Prevent crash_kexec() from deadlocking on ioapic_lock. When crash_kexec() is executed on a CPU, the CPU will take ioapic_lock in disable_IO_APIC(). So if the cpu gets an NMI while locking ioapic_lock, a deadlock will happen. In this patch, ioapic_lock is zapped/initialized before disable_IO_APIC(). You can reproduce this deadlock the following way: 1. Add mdelay(1000) after raw_spin_lock_irqsave() in native_ioapic_set_affinity()@arch/x86/kernel/apic/io_apic.c Although the deadlock can occur without this modification, it will increase the potential of the deadlock problem. 2. Build and install the kernel 3. Set up the OS which will run panic() and kexec when NMI is injected # echo "kernel.unknown_nmi_panic=1" >> /etc/sysctl.conf # vim /etc/default/grub add "nmi_watchdog=0 crashkernel=256M" in GRUB_CMDLINE_LINUX line # grub2-mkconfig 4. Reboot the OS 5. Run following command for each vcpu on the guest # while true; do echo <CPU num> > /proc/irq/<IO-APIC-edge or IO-APIC-fasteoi>/smp_affinitity; done; By running this command, cpus will get ioapic_lock for setting affinity. 6. Inject NMI (push a dump button or execute 'virsh inject-nmi <domain>' if you use VM). After injecting NMI, panic() is called in an nmi-handler context. Then, kexec will normally run in panic(), but the operation will be stopped by deadlock on ioapic_lock in crash_kexec()->machine_crash_shutdown()-> native_machine_crash_shutdown()->disable_IO_APIC()->clear_IO_APIC()-> clear_IO_APIC_pin()->ioapic_read_entry(). Signed-off-by: Yoshihiro YUNOMAE <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Konrad Rzeszutek Wilk <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: Hidehiro Kawai <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Zhang Yanfei <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: [email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Seiji Aguchi <[email protected]> Link: http://lkml.kernel.org/r/20130820070107.28245.83806.stgit@yunodevel Signed-off-by: Ingo Molnar <[email protected]>
2013-08-19Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds5-46/+34
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two AMD microcode loader fixes and an OLPC firmware support fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86, microcode, AMD: Fix early microcode loading x86, microcode, AMD: Make cpu_has_amd_erratum() use the correct struct cpuinfo_x86 x86: Don't clear olpc_ofw_header when sentinel is detected
2013-08-19x86/kvm/guest: Fix sparse warning: "symbol 'klock_waiting' was not declared ↵Raghavendra K T1-1/+1
as static" It was not declared as static since it was thought to be used by pv-flushtlb earlier. Signed-off-by: Raghavendra K T <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Cc: Jiri Kosina <[email protected]> Link: http://lkml.kernel.org/r/1376645921-8056-1-git-send-email-raghavendra.kt@linux.vnet.ibm.com Signed-off-by: Ingo Molnar <[email protected]>
2013-08-16perf/x86/intel/uncore: Enable EV_SEL_EXT bit for PCUYan, Zheng2-1/+2
This patch adds support for the SNB-EP PCU uncore PMU extra_sel_bit (bit 21) which is missing from the documentation in Table-2.75 of Intel Xeon Processor E5-2600 Product Family Uncore Performance Monitoring Guide. It is referred to later in Table-2.81. Without this selection bit explicitly enabled by the kernel, some events such as COREx_TRANSITION_CYCLES do not count correctly. Signed-off-by: Yan, Zheng <[email protected]> Reviewed-by: Stephane Eranian <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-08-16perf/x86/intel/uncore: Add filter support for QPI boxesYan, Zheng2-18/+123
The QPI uncore boxes have two pairs of MATCH/MASK registers that user to filter packet traffic serviced by QPI link layer. These registers are in auxiliary PCI devices. This patch adds the auxiliary PCI devices to snbep_uncore_pci_ids and adds field definitions for the MATCH/MASK registers. Signed-off-by: Yan, Zheng <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-08-16perf/x86/intel/uncore: Add auxiliary pci device supportYan, Zheng2-52/+68
The QPI uncore boxes have two pairs of MATCH/MASK registers that user to filter packet traffic serviced by QPI link layer. These registers are in auxiliary PCI devices. This patch changes the meaning of (struct pci_device_id)->driver_data. The first 8 bits are device index of the same uncore type, the second 8 bytes are uncore type index. Auxiliary PCI device's type is defined as UNCORE_EXTRA_PCI_DEV(0xff) Signed-off-by: Yan, Zheng <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2013-08-16Merge tag 'v3.11-rc5' into sched/coreIngo Molnar9-806/+22
Merge Linux 3.11-rc5, to pick up the latest fixes. Signed-off-by: Ingo Molnar <[email protected]>
2013-08-15Merge tag 'v3.11-rc5' into perf/coreIngo Molnar11-807/+29
Merge Linux 3.11-rc5, to sync up with the latest upstream fixes since -rc1. Signed-off-by: Ingo Molnar <[email protected]>
2013-08-14ACPI / x86: Print Hot-Pluggable Field in SRAT.Tang Chen1-4/+7
The Hot-Pluggable field in SRAT suggests if the memory could be hotplugged while the system is running. Print it as well when parsing SRAT will help users to know which memory is hotpluggable. Signed-off-by: Tang Chen <[email protected]> Reviewed-by: Wanpeng Li <[email protected]> Reviewed-by: Zhang Yanfei <[email protected]> Acked-by: Tejun Heo <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2013-08-14Merge back earlier 'pm-cpufreq' materialRafael J. Wysocki1-29/+0
2013-08-14Merge branch 'akpm' (patches from Andrew Morton)Linus Torvalds6-4/+98
Merge a bunch of fixes from Andrew Morton. * emailed patches from Andrew Morton <[email protected]>: fs/proc/task_mmu.c: fix buffer overflow in add_page_map() arch: *: Kconfig: add "kernel/Kconfig.freezer" to "arch/*/Kconfig" ocfs2: fix null pointer dereference in ocfs2_dir_foreach_blk_id() x86 get_unmapped_area(): use proper mmap base for bottom-up direction ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page ocfs2: Revert 40bd62e to avoid regression in extended allocation drivers/rtc/rtc-stmp3xxx.c: provide timeout for potentially endless loop polling a HW bit hugetlb: fix lockdep splat caused by pmd sharing aoe: adjust ref of head for compound page tails microblaze: fix clone syscall mm: save soft-dirty bits on file pages mm: save soft-dirty bits on swapped pages memcg: don't initialize kmem-cache destroying work for root caches
2013-08-14kvm: Paravirtual ticketlocks support for linux guests running on KVM hypervisorSrivatsa Vaddagiri2-2/+274
During smp_boot_cpus paravirtualied KVM guest detects if the hypervisor has required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so, support for pv-ticketlocks is registered via pv_lock_ops. Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu. Signed-off-by: Srivatsa Vaddagiri <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Suzuki Poulose <[email protected]> [Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related changes, addition of safe_halt for irq enabled case, bailout spinning in nmi case(Gleb)] Signed-off-by: Raghavendra K T <[email protected]> Acked-by: Gleb Natapov <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-14crypto: make tables used from assembler __visibleAndi Kleen1-8/+8
Tables used from assembler should be marked __visible to let the compiler know. Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2013-08-14Merge tag 'amd_ucode_fixes' of ↵Ingo Molnar4-44/+32
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent Pull AMD microcode fixes from Borislav Petkov: " Those are basically two fixes which correct the AMD early ucode loader from accessing cpu_data too early, i.e. before smp_store_cpu_info() has copied the boot_cpu_data ontop and overwritten an already empty structure (which we shouldn't access that early in the first place anyway). The second patch is kinda largish for that late in the game but it shouldn't be problematic because we're simply switching from using cpu_data to use the CPU family number directly and thus again, not use uninitialized cpu_data structure. " Signed-off-by: Ingo Molnar <[email protected]>
2013-08-14Merge tag 'amd_f15_m30' of ↵Ingo Molnar5-10/+30
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/ras Pull AMD F15h, model 0x30 and later enablement stuff, more specifically EDAC support, from Borislav Petkov. Signed-off-by: Ingo Molnar <[email protected]>
2013-08-14x86/boot: Fix a sanity check in printf.cDan Carpenter1-1/+1
Prior to 9b706aee7d ("x86: trivial printk optimizations") this was 36 because it had 26 characters and 10 digits but now it's just 16 hex digits so the sanity check needs updated. This function is always called with a valid "base" so it doesn't make a difference to how the kernel works, it's just a cleanup. Reported-by: Alexey Petrenko <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2013-08-13x86: avoid remapping data in parse_setup_data()Linn Crosetto3-13/+13
Type SETUP_PCI, added by setup_efi_pci(), may advertise a ROM size larger than early_memremap() is able to handle, which is currently limited to 256kB. If this occurs it leads to a NULL dereference in parse_setup_data(). To avoid this, remap the setup_data header and allow parsing functions for individual types to handle their own data remapping. Signed-off-by: Linn Crosetto <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Acked-by: Yinghai Lu <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-13x86: Use memblock_set_current_limit() to set limit for memblock.Tang Chen1-2/+2
In setup_arch() of x86, it set memblock.current_limit directly. We should use memblock_set_current_limit(). If the implementation is changed, it is easy to maintain. Signed-off-by: Tang Chen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-13x86 get_unmapped_area(): use proper mmap base for bottom-up directionRadu Caragea2-2/+2
When the stack is set to unlimited, the bottomup direction is used for mmap-ings but the mmap_base is not used and thus effectively renders ASLR for mmapings along with PIE useless. Cc: Michel Lespinasse <[email protected]> Cc: Oleg Nesterov <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: Adrian Sendroiu <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-08-13mm: save soft-dirty bits on file pagesCyrill Gorcunov4-2/+68
Andy reported that if file page get reclaimed we lose the soft-dirty bit if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get encoded into pte entry. Thus when #pf happens on such non-present pte we can restore it back. Reported-by: Andy Lutomirski <[email protected]> Signed-off-by: Cyrill Gorcunov <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: Matt Mackall <[email protected]> Cc: Xiao Guangrong <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Wanpeng Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-08-13mm: save soft-dirty bits on swapped pagesCyrill Gorcunov2-0/+28
Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set get swapped out, the bit is getting lost and no longer available when pte read back. To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in pte entry for the page being swapped out. When such page is to be read back from a swap cache we check for bit presence and if it's there we clear it and restore the former _PAGE_SOFT_DIRTY bit back. One of the problem was to find a place in pte entry where we can save the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The _PAGE_PSE was chosen for that, it doesn't intersect with swap entry format stored in pte. Reported-by: Andy Lutomirski <[email protected]> Signed-off-by: Cyrill Gorcunov <[email protected]> Acked-by: Pavel Emelyanov <[email protected]> Cc: Matt Mackall <[email protected]> Cc: Xiao Guangrong <[email protected]> Cc: Marcelo Tosatti <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: Wanpeng Li <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2013-08-13Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "Two small fixlets" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Add Haswell ULT model number used in Macbook Air and other systems perf/x86: Fix intel QPI uncore event definitions
2013-08-13x86, boot: Fix warning due to undeclared strlen()Fred Chen1-0/+1
Below is a patch that fixes sparse error "arch/x86/boot/string.c:119:8: warning: symbol 'strlen' was not declared." by declaring it in arch/x86/boot/boot.h. Signed-off-by: Fred Chen <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: H. Peter Anvin <[email protected]>
2013-08-13sched: fix the theoretical signal_wake_up() vs schedule() raceOleg Nesterov1-4/+0
This is only theoretical, but after try_to_wake_up(p) was changed to check p->state under p->pi_lock the code like __set_current_state(TASK_INTERRUPTIBLE); schedule(); can miss a signal. This is the special case of wait-for-condition, it relies on try_to_wake_up/schedule interaction and thus it does not need mb() between __set_current_state() and if(signal_pending). However, this __set_current_state() can move into the critical section protected by rq->lock, now that try_to_wake_up() takes another lock we need to ensure that it can't be reordered with "if (signal_pending(current))" check inside that section. The patch is actually one-liner, it simply adds smp_wmb() before spin_lock_irq(rq->lock). This is what try_to_wake_up() already does by the same reason. We turn this wmb() into the new helper, smp_mb__before_spinlock(), for better documentation and to allow the architectures to change the default implementation. While at it, kill smp_mb__after_lock(), it has no callers. Perhaps we can also add smp_mb__before/after_spinunlock() for prepare_to_wait(). Signed-off-by: Oleg Nesterov <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>