aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-01-15ptr_ring: document usage around __ptr_ring_peekMichael S. Tsirkin1-4/+10
This explains why is the net usage of __ptr_ring_peek actually ok without locks. Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: John Fastabend <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-01-159p: add missing module license for xen transportStephen Hemminger1-0/+4
The 9P of Xen module is missing required license and module information. See https://bugzilla.kernel.org/show_bug.cgi?id=198109 Reported-by: Alan Bartlett <[email protected]> Fixes: 868eb122739a ("xen/9pfs: introduce Xen 9pfs transport driver") Signed-off-by: Stephen Hemminger <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-01-15ring-buffer: Bring back context level recursive checksSteven Rostedt (VMware)1-17/+45
Commit 1a149d7d3f45 ("ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler") replaced the context level recursion checks with a simple counter. This would prevent the ring buffer code from recursively calling itself more than the max number of contexts that exist (Normal, softirq, irq, nmi). But this change caused a lockup in a specific case, which was during suspend and resume using a global clock. Adding a stack dump to see where this occurred, the issue was in the trace global clock itself: trace_buffer_lock_reserve+0x1c/0x50 __trace_graph_entry+0x2d/0x90 trace_graph_entry+0xe8/0x200 prepare_ftrace_return+0x69/0xc0 ftrace_graph_caller+0x78/0xa8 queued_spin_lock_slowpath+0x5/0x1d0 trace_clock_global+0xb0/0xc0 ring_buffer_lock_reserve+0xf9/0x390 The function graph tracer traced queued_spin_lock_slowpath that was called by trace_clock_global. This pointed out that the trace_clock_global() is not reentrant, as it takes a spin lock. It depended on the ring buffer recursive lock from letting that happen. By removing the context detection and adding just a max number of allowable recursions, it allowed the trace_clock_global() to be entered again and try to retake the spinlock it already held, causing a deadlock. Fixes: 1a149d7d3f45 ("ring-buffer: Rewrite trace_recursive_(un)lock() to be simpler") Reported-by: David Weinehall <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2018-01-15cfg80211: check dev_set_name() return valueJohannes Berg1-1/+7
syzbot reported a warning from rfkill_alloc(), and after a while I think that the reason is that it was doing fault injection and the dev_set_name() failed, leaving the name NULL, and we didn't check the return value and got to rfkill_alloc() with a NULL name. Since we really don't want a NULL name, we ought to check the return value. Fixes: fb28ad35906a ("net: struct device - replace bus_id with dev_name(), dev_set_name()") Reported-by: [email protected] Signed-off-by: Johannes Berg <[email protected]>
2018-01-15mac80211_hwsim: validate number of different channelsJohannes Berg3-2/+7
When creating a new radio on the fly, hwsim allows this to be done with an arbitrary number of channels, but cfg80211 only supports a limited number of simultaneous channels, leading to a warning. Fix this by validating the number - this requires moving the define for the maximum out to a visible header file. Reported-by: [email protected] Fixes: b59ec8dd4394 ("mac80211_hwsim: fix number of channels in interface combinations") Signed-off-by: Johannes Berg <[email protected]>
2018-01-15mac80211_hwsim: add workqueue to wait for deferred radio deletion on mod unloadBenjamin Beichler1-2/+10
When closing multiple wmediumd instances with many radios and try to unload the mac80211_hwsim module, it may happen that the work items live longer than the module. To wait especially for this deletion work items, add a work queue, otherwise flush_scheduled_work would be necessary. Signed-off-by: Benjamin Beichler <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2018-01-15nl80211: take RCU read lock when calling ieee80211_bss_get_ie()Dominik Brodowski1-4/+7
As ieee80211_bss_get_ie() derefences an RCU to return ssid_ie, both the call to this function and any operation on this variable need protection by the RCU read lock. Fixes: 44905265bc15 ("nl80211: don't expose wdev->ssid for most interfaces") Signed-off-by: Dominik Brodowski <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2018-01-15cfg80211: fully initialize old channel for eventJohannes Berg1-2/+1
Paul reported that he got a report about undefined behaviour that seems to me to originate in using uninitialized memory when the channel structure here is used in the event code in nl80211 later. He never reported whether this fixed it, and I wasn't able to trigger this so far, but we should do the right thing and fully initialize the on-stack structure anyway. Reported-by: Paul Menzel <[email protected]> Signed-off-by: Johannes Berg <[email protected]>
2018-01-15x86/retpoline: Add LFENCE to the retpoline/RSB filling RSB macrosTom Lendacky1-1/+5
The PAUSE instruction is currently used in the retpoline and RSB filling macros as a speculation trap. The use of PAUSE was originally suggested because it showed a very, very small difference in the amount of cycles/time used to execute the retpoline as compared to LFENCE. On AMD, the PAUSE instruction is not a serializing instruction, so the pause/jmp loop will use excess power as it is speculated over waiting for return to mispredict to the correct target. The RSB filling macro is applicable to AMD, and, if software is unable to verify that LFENCE is serializing on AMD (possible when running under a hypervisor), the generic retpoline support will be used and, so, is also applicable to AMD. Keep the current usage of PAUSE for Intel, but add an LFENCE instruction to the speculation trap for AMD. The same sequence has been adopted by GCC for the GCC generated retpolines. Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Acked-by: David Woodhouse <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Paul Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Tim Chen <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Dan Williams <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Kees Cook <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-15x86/retpoline: Fill RSB on context switch for affected CPUsDavid Woodhouse4-0/+59
On context switch from a shallow call stack to a deeper one, as the CPU does 'ret' up the deeper side it may encounter RSB entries (predictions for where the 'ret' goes to) which were populated in userspace. This is problematic if neither SMEP nor KPTI (the latter of which marks userspace pages as NX for the kernel) are active, as malicious code in userspace may then be executed speculatively. Overwrite the CPU's return prediction stack with calls which are predicted to return to an infinite loop, to "capture" speculation if this happens. This is required both for retpoline, and also in conjunction with IBRS for !SMEP && !KPTI. On Skylake+ the problem is slightly different, and an *underflow* of the RSB may cause errant branch predictions to occur. So there it's not so much overwrite, as *filling* the RSB to attempt to prevent it getting empty. This is only a partial solution for Skylake+ since there are many other conditions which may result in the RSB becoming empty. The full solution on Skylake+ is to use IBRS, which will prevent the problem even when the RSB becomes empty. With IBRS, the RSB-stuffing will not be required on context switch. [ tglx: Added missing vendor check and slighty massaged comments and changelog ] Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Arjan van de Ven <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Paul Turner <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2018-01-15x86/kasan: Panic if there is not enough memory to bootAndrey Ryabinin1-10/+14
Currently KASAN doesn't panic in case it don't have enough memory to boot. Instead, it crashes in some random place: kernel BUG at arch/x86/mm/physaddr.c:27! RIP: 0010:__phys_addr+0x268/0x276 Call Trace: kasan_populate_shadow+0x3f2/0x497 kasan_init+0x12e/0x2b2 setup_arch+0x2825/0x2a2c start_kernel+0xc8/0x15f4 x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0x72/0x75 secondary_startup_64+0xa5/0xb0 Use memblock_virt_alloc_try_nid() for allocations without failure fallback. It will panic with an out of memory message. Reported-by: kernel test robot <[email protected]> Signed-off-by: Andrey Ryabinin <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Dmitry Vyukov <[email protected]> Cc: [email protected] Cc: Alexander Potapenko <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-14Linux 4.15-rc8Linus Torvalds1-1/+1
2018-01-14Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixlet from Thomas Gleixner. Remove a warning about lack of compiler support for retpoline that most people can't do anything about, so it just annoys them needlessly. * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/retpoline: Remove compile time warning
2018-01-14Merge tag 'powerpc-4.15-7' of ↵Linus Torvalds21-36/+561
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "One fix for an oops at boot if we take a hotplug interrupt before we are ready to handle it. The bulk is patches to implement mitigation for Meltdown, see the change logs for more details. Thanks to: Nicholas Piggin, Michael Neuling, Oliver O'Halloran, Jon Masters, Jose Ricardo Ziviani, David Gibson" * tag 'powerpc-4.15-7' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/powernv: Check device-tree for RFI flush settings powerpc/pseries: Query hypervisor for RFI flush settings powerpc/64s: Support disabling RFI flush with no_rfi_flush and nopti powerpc/64s: Add support for RFI flush of L1-D cache powerpc/64s: Convert slb_miss_common to use RFI_TO_USER/KERNEL powerpc/64: Convert fast_exception_return to use RFI_TO_USER/KERNEL powerpc/64: Convert the syscall exit path to use RFI_TO_USER/KERNEL powerpc/64s: Simple RFI macro conversions powerpc/64: Add macros for annotating the destination of rfid/hrfid powerpc/pseries: Add H_GET_CPU_CHARACTERISTICS flags & wrapper powerpc/pseries: Make RAS IRQ explicitly dependent on DLPAR WQ
2018-01-14timers: Unconditionally check deferrable baseThomas Gleixner1-1/+1
When the timer base is checked for expired timers then the deferrable base must be checked as well. This was missed when making the deferrable base independent of base::nohz_active. Fixes: ced6d5c11d3e ("timers: Use deferrable base independent of base::nohz_active") Signed-off-by: Thomas Gleixner <[email protected]> Cc: Anna-Maria Gleixner <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Siewior <[email protected]> Cc: Paul McKenney <[email protected]> Cc: [email protected] Cc: [email protected]
2018-01-14x86/retpoline: Remove compile time warningThomas Gleixner1-2/+0
Remove the compile time warning when CONFIG_RETPOLINE=y and the compiler does not have retpoline support. Linus rationale for this is: It's wrong because it will just make people turn off RETPOLINE, and the asm updates - and return stack clearing - that are independent of the compiler are likely the most important parts because they are likely the ones easiest to target. And it's annoying because most people won't be able to do anything about it. The number of people building their own compiler? Very small. So if their distro hasn't got a compiler yet (and pretty much nobody does), the warning is just annoying crap. It is already properly reported as part of the sysfs interface. The compile-time warning only encourages bad things. Fixes: 76b043848fd2 ("x86/retpoline: Add initial retpoline support") Requested-by: Linus Torvalds <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: [email protected] Cc: Linus Torvalds <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Tim Chen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Link: https://lkml.kernel.org/r/CA+55aFzWgquv4i6Mab6bASqYXg3ErV3XDFEYf=GEcCDQg5uAtw@mail.gmail.com
2018-01-14x86/idt: Mark IDT tables __initconstAndi Kleen1-6/+6
const variables must use __initconst, not __initdata. Fix this up for the IDT tables, which got it consistently wrong. Fixes: 16bc18d895ce ("x86/idt: Move 32-bit idt_descr to C code") Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-14Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds1-0/+1
Pull NVMe fix from Jens Axboe: "Just a single fix for nvme over fabrics that should go into 4.15" * 'for-linus' of git://git.kernel.dk/linux-block: nvme-fabrics: initialize default host->id in nvmf_host_default()
2018-01-14futex: Prevent overflow by strengthen input validationLi Jinyue1-0/+3
UBSAN reports signed integer overflow in kernel/futex.c: UBSAN: Undefined behaviour in kernel/futex.c:2041:18 signed integer overflow: 0 - -2147483648 cannot be represented in type 'int' Add a sanity check to catch negative values of nr_wake and nr_requeue. Signed-off-by: Li Jinyue <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-14Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds44-100/+1525
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 pti updates from Thomas Gleixner: "This contains: - a PTI bugfix to avoid setting reserved CR3 bits when PCID is disabled. This seems to cause issues on a virtual machine at least and is incorrect according to the AMD manual. - a PTI bugfix which disables the perf BTS facility if PTI is enabled. The BTS AUX buffer is not globally visible and causes the CPU to fault when the mapping disappears on switching CR3 to user space. A full fix which restores BTS on PTI is non trivial and will be worked on. - PTI bugfixes for EFI and trusted boot which make sure that the user space visible page table entries have the NX bit cleared - removal of dead code in the PTI pagetable setup functions - add PTI documentation - add a selftest for vsyscall to verify that the kernel actually implements what it advertises. - a sysfs interface to expose vulnerability and mitigation information so there is a coherent way for users to retrieve the status. - the initial spectre_v2 mitigations, aka retpoline: + The necessary ASM thunk and compiler support + The ASM variants of retpoline and the conversion of affected ASM code + Make LFENCE serializing on AMD so it can be used as speculation trap + The RSB fill after vmexit - initial objtool support for retpoline As I said in the status mail this is the most of the set of patches which should go into 4.15 except two straight forward patches still on hold: - the retpoline add on of LFENCE which waits for ACKs - the RSB fill after context switch Both should be ready to go early next week and with that we'll have covered the major holes of spectre_v2 and go back to normality" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (28 commits) x86,perf: Disable intel_bts when PTI security/Kconfig: Correct the Documentation reference for PTI x86/pti: Fix !PCID and sanitize defines selftests/x86: Add test_vsyscall x86/retpoline: Fill return stack buffer on vmexit x86/retpoline/irq32: Convert assembler indirect jumps x86/retpoline/checksum32: Convert assembler indirect jumps x86/retpoline/xen: Convert Xen hypercall indirect jumps x86/retpoline/hyperv: Convert assembler indirect jumps x86/retpoline/ftrace: Convert ftrace assembler indirect jumps x86/retpoline/entry: Convert entry assembler indirect jumps x86/retpoline/crypto: Convert crypto assembler indirect jumps x86/spectre: Add boot time option to select Spectre v2 mitigation x86/retpoline: Add initial retpoline support objtool: Allow alternatives to be ignored objtool: Detect jumps to retpoline thunks x86/pti: Make unpoison of pgd for trusted boot work for real x86/alternatives: Fix optimize_nops() checking sysfs/cpu: Fix typos in vulnerability documentation x86/cpu/AMD: Use LFENCE_RDTSC in preference to MFENCE_RDTSC ...
2018-01-14futex: Avoid violating the 10th rule of futexPeter Zijlstra3-23/+87
Julia reported futex state corruption in the following scenario: waiter waker stealer (prio > waiter) futex(WAIT_REQUEUE_PI, uaddr, uaddr2, timeout=[N ms]) futex_wait_requeue_pi() futex_wait_queue_me() freezable_schedule() <scheduled out> futex(LOCK_PI, uaddr2) futex(CMP_REQUEUE_PI, uaddr, uaddr2, 1, 0) /* requeues waiter to uaddr2 */ futex(UNLOCK_PI, uaddr2) wake_futex_pi() cmp_futex_value_locked(uaddr2, waiter) wake_up_q() <woken by waker> <hrtimer_wakeup() fires, clears sleeper->task> futex(LOCK_PI, uaddr2) __rt_mutex_start_proxy_lock() try_to_take_rt_mutex() /* steals lock */ rt_mutex_set_owner(lock, stealer) <preempted> <scheduled in> rt_mutex_wait_proxy_lock() __rt_mutex_slowlock() try_to_take_rt_mutex() /* fails, lock held by stealer */ if (timeout && !timeout->task) return -ETIMEDOUT; fixup_owner() /* lock wasn't acquired, so, fixup_pi_state_owner skipped */ return -ETIMEDOUT; /* At this point, we've returned -ETIMEDOUT to userspace, but the * futex word shows waiter to be the owner, and the pi_mutex has * stealer as the owner */ futex_lock(LOCK_PI, uaddr2) -> bails with EDEADLK, futex word says we're owner. And suggested that what commit: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state") removes from fixup_owner() looks to be just what is needed. And indeed it is -- I completely missed that requeue_pi could also result in this case. So we need to restore that, except that subsequent patches, like commit: 16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock") changed all the locking rules. Even without that, the sequence: - if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) { - locked = 1; - goto out; - } - raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock); - owner = rt_mutex_owner(&q->pi_state->pi_mutex); - if (!owner) - owner = rt_mutex_next_owner(&q->pi_state->pi_mutex); - raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock); - ret = fixup_pi_state_owner(uaddr, q, owner); already suggests there were races; otherwise we'd never have to look at next_owner. So instead of doing 3 consecutive wait_lock sections with who knows what races, we do it all in a single section. Additionally, the usage of pi_state->owner in fixup_owner() was only safe because only the rt_mutex owner would modify it, which this additional case wrecks. Luckily the values can only change away and not to the value we're testing, this means we can do a speculative test and double check once we have the wait_lock. Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state") Reported-by: Julia Cartwright <[email protected]> Reported-by: Gratian Crisan <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Julia Cartwright <[email protected]> Tested-by: Gratian Crisan <[email protected]> Cc: Darren Hart <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller3-4/+61
Daniel Borkmann says: ==================== pull-request: bpf 2018-01-13 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) Follow-up fix to the recent BPF out-of-bounds speculation fix that prevents max_entries overflows and an undefined behavior on 32 bit archs on index_mask calculation, from Daniel. 2) Reject unsupported BPF_ARSH opcode in 32 bit ALU mode that was otherwise throwing an unknown opcode warning in the interpreter, from Daniel. 3) Typo fix in one of the user facing verbose() messages that was added during the BPF out-of-bounds speculation fix, from Colin. ==================== Signed-off-by: David S. Miller <[email protected]>
2018-01-14Revert "x86/apic: Remove init_bsp_APIC()"Ville Syrjälä3-0/+53
This reverts commit b371ae0d4a194b178817b0edfb6a7395c7aec37a. It causes boot hangs on old P3/P4 systems when the local APIC is enforced in UP mode. Reported-by: Meelis Roos <[email protected]> Signed-off-by: Ville Syrjälä <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Dou Liyang <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-14x86/mm/pkeys: Fix fill_sig_info_pkeyEric W. Biederman1-3/+4
SEGV_PKUERR is a signal specific si_code which happens to have the same numeric value as several others: BUS_MCEERR_AR, ILL_ILLTRP, FPE_FLTOVF, TRAP_HWBKPT, CLD_TRAPPED, POLL_ERR, SEGV_THREAD_ID, as such it is not safe to just test the si_code the signal number must also be tested to prevent a false positive in fill_sig_info_pkey. This error was by inspection, and BUS_MCEERR_AR appears to be a real candidate for confusion. So pass in si_signo and check for SIG_SEGV to verify that it is actually a SEGV_PKUERR Fixes: 019132ff3daf ("x86/mm/pkeys: Fill in pkey field in siginfo") Signed-off-by: "Eric W. Biederman" <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Dave Hansen <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Al Viro <[email protected]> cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-14x86/tsc: Print tsc_khz, when it differs from cpu_khzLen Brown1-0/+6
If CPU and TSC frequency are the same the printout of the CPU frequency is valid for the TSC as well: tsc: Detected 2900.000 MHz processor If the TSC frequency is different there is no information in dmesg. Add a conditional printout: tsc: Detected 2904.000 MHz TSC Signed-off-by: Len Brown <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/537b342debcd8e8aebc8d631015dcdf9f9ba8a26.1513920414.git.len.brown@intel.com
2018-01-14x86/tsc: Fix erroneous TSC rate on Skylake XeonLen Brown1-1/+0
The INTEL_FAM6_SKYLAKE_X hardcoded crystal_khz value of 25MHZ is problematic: - SKX workstations (with same model # as server variants) use a 24 MHz crystal. This results in a -4.0% time drift rate on SKX workstations. - SKX servers subject the crystal to an EMI reduction circuit that reduces its actual frequency by (approximately) -0.25%. This results in -1 second per 10 minute time drift as compared to network time. This issue can also trigger a timer and power problem, on configurations that use the LAPIC timer (versus the TSC deadline timer). Clock ticks scheduled with the LAPIC timer arrive a few usec before the time they are expected (according to the slow TSC). This causes Linux to poll-idle, when it should be in an idle power saving state. The idle and clock code do not graciously recover from this error, sometimes resulting in significant polling and measurable power impact. Stop using native_calibrate_tsc() for INTEL_FAM6_SKYLAKE_X. native_calibrate_tsc() will return 0, boot will run with tsc_khz = cpu_khz, and the TSC refined calibration will update tsc_khz to correct for the difference. [ tglx: Sanitized change log ] Fixes: 6baf3d61821f ("x86/tsc: Add additional Intel CPU models to the crystal quirk list") Signed-off-by: Len Brown <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Prarit Bhargava <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/ff6dcea166e8ff8f2f6a03c17beab2cb436aa779.1513920414.git.len.brown@intel.com
2018-01-14x86/tsc: Future-proof native_calibrate_tsc()Len Brown1-0/+2
If the crystal frequency cannot be determined via CPUID(15).crystal_khz or the built-in table then native_calibrate_tsc() will still set the X86_FEATURE_TSC_KNOWN_FREQ flag which prevents the refined TSC calibration. As a consequence such systems use cpu_khz for the TSC frequency which is incorrect when cpu_khz != tsc_khz resulting in time drift. Return early when the crystal frequency cannot be retrieved without setting the X86_FEATURE_TSC_KNOWN_FREQ flag. This ensures that the refined TSC calibration is invoked. [ tglx: Steam-blastered changelog. Sigh ] Fixes: 4ca4df0b7eb0 ("x86/tsc: Mark TSC frequency determined by CPUID as known") Signed-off-by: Len Brown <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Bin Gao <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/0fe2503aa7d7fc69137141fc705541a78101d2b9.1513920414.git.len.brown@intel.com
2018-01-14x86,perf: Disable intel_bts when PTIPeter Zijlstra1-0/+18
The intel_bts driver does not use the 'normal' BTS buffer which is exposed through the cpu_entry_area but instead uses the memory allocated for the perf AUX buffer. This obviously comes apart when using PTI because then the kernel mapping; which includes that AUX buffer memory; disappears. Fixing this requires to expose a mapping which is visible in all context and that's not trivial. As a quick fix disable this driver when PTI is enabled to prevent malfunction. Fixes: 385ce0ea4c07 ("x86/mm/pti: Add Kconfig") Reported-by: Vince Weaver <[email protected]> Reported-by: Robert Święcki <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Vince Weaver <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-14security/Kconfig: Correct the Documentation reference for PTIW. Trevor King1-1/+1
When the config option for PTI was added a reference to documentation was added as well. But the documentation did not exist at that point. The final documentation has a different file name. Fix it up to point to the proper file. Fixes: 385ce0ea ("x86/mm/pti: Add Kconfig") Signed-off-by: W. Trevor King <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Dave Hansen <[email protected]> Cc: [email protected] Cc: [email protected] Cc: James Morris <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/3009cc8ccbddcd897ec1e0cb6dda524929de0d14.1515799398.git.wking@tremily.us
2018-01-14x86/pti: Fix !PCID and sanitize definesThomas Gleixner3-21/+23
The switch to the user space page tables in the low level ASM code sets unconditionally bit 12 and bit 11 of CR3. Bit 12 is switching the base address of the page directory to the user part, bit 11 is switching the PCID to the PCID associated with the user page tables. This fails on a machine which lacks PCID support because bit 11 is set in CR3. Bit 11 is reserved when PCID is inactive. While the Intel SDM claims that the reserved bits are ignored when PCID is disabled, the AMD APM states that they should be cleared. This went unnoticed as the AMD APM was not checked when the code was developed and reviewed and test systems with Intel CPUs never failed to boot. The report is against a Centos 6 host where the guest fails to boot, so it's not yet clear whether this is a virt issue or can happen on real hardware too, but thats irrelevant as the AMD APM clearly ask for clearing the reserved bits. Make sure that on non PCID machines bit 11 is not set by the page table switching code. Andy suggested to rename the related bits and masks so they are clearly describing what they should be used for, which is done as well for clarity. That split could have been done with alternatives but the macro hell is horrible and ugly. This can be done on top if someone cares to remove the extra orq. For now it's a straight forward fix. Fixes: 6fd166aae78c ("x86/mm: Use/Fix PCID to optimize user/kernel switches") Reported-by: Laura Abbott <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: stable <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Willy Tarreau <[email protected]> Cc: David Woodhouse <[email protected]> Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801140009150.2371@nanos
2018-01-13Merge tag 'usb-4.15-rc8' of ↵Linus Torvalds9-33/+63
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some small USB fixes and device ids for 4.15-rc8 Nothing major, small fixes for various devices, some resolutions for bugs found by fuzzers, and the usual handful of new device ids. All of these have been in linux-next with no reported issues" * tag 'usb-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: Documentation: usb: fix typo in UVC gadgetfs config command usb: misc: usb3503: make sure reset is low for at least 100us uas: ignore UAS for Norelsys NS1068(X) chips USB: UDC core: fix double-free in usb_add_gadget_udc_release USB: fix usbmon BUG trigger usbip: vudc_tx: fix v_send_ret_submit() vulnerability to null xfer buffer usbip: remove kernel addresses from usb device and urb debug msgs usbip: fix vudc_rx: harden CMD_SUBMIT path to handle malicious input USB: serial: cp210x: add new device ID ELV ALC 8xxx USB: serial: cp210x: add IDs for LifeScan OneTouch Verio IQ
2018-01-13Merge tag 'staging-4.15-rc8' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fix from Greg KH: "Here is a single android ashmem bugfix that resolves a reported issue in that interface. It's been in linux-next this week with no reported issues" * tag 'staging-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: android: ashmem: fix a race condition in ASHMEM_SET_SIZE ioctl
2018-01-13Merge tag 'char-misc-4.15-rc8' of ↵Linus Torvalds2-10/+14
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc fixes from Greg KH: "Here are two bugfixes for some driver bugs for 4.15-rc8 The first is a bluetooth security bug that has been ignored by the Bluetooth developers for months for no obvious reason at all, so I've taken it through my tree. The second is a simple double-free bug in the mux subsystem. Both have been in linux-next for a while with no reported issues" * tag 'char-misc-4.15-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: mux: core: fix double get_device() Bluetooth: Prevent stack info leak from the EFS element.
2018-01-13Merge tag 'kbuild-fixes-v4.15' of ↵Linus Torvalds4-30/+42
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - fix cross-compilation for architectures that setup CROSS_COMPILE in their arch Makefile - fix Kconfig rational operators for bool / tristate - drop a gperf-generated file from .gitignore * tag 'kbuild-fixes-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: genksyms: drop *.hash.c from .gitignore kconfig: fix relational operators for bool and tristate symbols kbuild: move cc-option and cc-disable-warning after incl. arch Makefile
2018-01-13Merge tag 'apparmor-pr-2018-01-12' of ↵Linus Torvalds3-25/+40
git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor Pull apparmor regression fixes from John Johansen: "This fixes a couple bugs I have been working with Matthew Garrett on this week. Specifically a regression in the handling of a conflicting profile attachment and label match restrictions for ptrace when profiles are stacked. Summary: - fix ptrace label match when matching stacked labels - fix regression in profile conflict logic" * tag 'apparmor-pr-2018-01-12' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: apparmor: Fix regression in profile conflict logic apparmor: fix ptrace label match when matching stacked labels
2018-01-13Merge tag 'pci-v4.15-fixes-2' of ↵Linus Torvalds4-11/+30
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fixes from Bjorn Helgaas: "Fix AMD boot regression due to 64-bit window conflicting with system memory (Christian König)" * tag 'pci-v4.15-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: x86/PCI: Move and shrink AMD 64-bit window to avoid conflict x86/PCI: Add "pci=big_root_window" option for AMD 64-bit windows
2018-01-13Merge branch 'akpm' (patches from Andrew)Linus Torvalds6-7/+9
Merge misc fixlets from Andrew Morton: "4 fixes" * emailed patches from Andrew Morton <[email protected]>: tools/objtool/Makefile: don't assume sync-check.sh is executable kdump: write correct address of mem_section into vmcoreinfo kmemleak: allow to coexist with fault injection MAINTAINERS, nilfs2: change project home URLs
2018-01-13tools/objtool/Makefile: don't assume sync-check.sh is executableAndrew Morton1-1/+1
patch(1) loses the x bit. So if a user follows our patching instructions in Documentation/admin-guide/README.rst, their kernel will not compile. Fixes: 3bd51c5a371de ("objtool: Move kernel headers/code sync check to a script") Reported-by: Nicolas Bock <[email protected]> Reported-by Joakim Tjernlund <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Josh Poimboeuf <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-13kdump: write correct address of mem_section into vmcoreinfoKirill A. Shutemov2-1/+3
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is. But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer". We've stepped onto this in kdump code. VMCOREINFO_SYMBOL(mem_section) writes down address of pointer into vmcoreinfo, not array as we wanted. Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the situation correctly for both cases. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") Acked-by: Baoquan He <[email protected]> Acked-by: Dave Young <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Dave Young <[email protected]> Cc: Baoquan He <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-13kmemleak: allow to coexist with fault injectionDmitry Vyukov1-1/+1
kmemleak does one slab allocation per user allocation. So if slab fault injection is enabled to any degree, kmemleak instantly fails to allocate and turns itself off. However, it's useful to use kmemleak with fault injection to find leaks on error paths. On the other hand, checking kmemleak itself is not so useful because (1) it's a debugging tool and (2) it has a very regular allocation pattern (basically a single allocation site, so it either works or not). Turn off fault injection for kmemleak allocations. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Dmitry Vyukov <[email protected]> Cc: Catalin Marinas <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-13MAINTAINERS, nilfs2: change project home URLsRyusuke Konishi2-4/+4
The domain of NILFS project home was changed to "nilfs.sourceforge.io" to enable https access (the previous domain "nilfs.sourceforge.net" is redirected to the new one). Modify URLs of the project home to reflect this change and to replace their protocol from http to https. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-13genksyms: drop *.hash.c from .gitignoreMasahiro Yamada1-1/+0
This is a left-over of commit bb3290d91695 ("Remove gperf usage from toolchain"). We do not generate a hash function any more. Signed-off-by: Masahiro Yamada <[email protected]>
2018-01-13kdump: Write the correct address of mem_section into vmcoreinfoKirill A. Shutemov2-1/+3
Depending on configuration mem_section can now be an array or a pointer to an array allocated dynamically. In most cases, we can continue to refer to it as 'mem_section' regardless of what it is. But there's one exception: '&mem_section' means "address of the array" if mem_section is an array, but if mem_section is a pointer, it would mean "address of the pointer". We've stepped onto this in the kdump code: VMCOREINFO_SYMBOL(mem_section) writes down the address of pointer into vmcoreinfo, not the array as we wanted, breaking kdump. Let's introduce VMCOREINFO_SYMBOL_ARRAY() that would handle the situation correctly for both cases. Mike Galbraith <[email protected]> Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Baoquan He <[email protected]> Acked-by: Dave Young <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-13selftests/x86: Add test_vsyscallAndy Lutomirski2-1/+501
This tests that the vsyscall entries do what they're expected to do. It also confirms that attempts to read the vsyscall page behave as expected. If changes are made to the vsyscall code or its memory map handling, running this test in all three of vsyscall=none, vsyscall=emulate, and vsyscall=native are helpful. (Because it's easy, this also compares the vsyscall results to their vDSO equivalents.) Note to KAISER backporters: please test this under all three vsyscall modes. Also, in the emulate and native modes, make sure that test_vsyscall_64 agrees with the command line or config option as to which mode you're in. It's quite easy to mess up the kernel such that native mode accidentally emulates or vice versa. Greg, etc: please backport this to all your Meltdown-patched kernels. It'll help make sure the patches didn't regress vsyscalls. CSigned-off-by: Andy Lutomirski <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/2b9c5a174c1d60fd7774461d518aa75598b1d8fd.1515719552.git.luto@kernel.org Signed-off-by: Ingo Molnar <[email protected]>
2018-01-12apparmor: Fix regression in profile conflict logicMatthew Garrett1-4/+5
The intended behaviour in apparmor profile matching is to flag a conflict if two profiles match equally well. However, right now a conflict is generated if another profile has the same match length even if that profile doesn't actually match. Fix the logic so we only generate a conflict if the profiles match. Fixes: 844b8292b631 ("apparmor: ensure that undecidable profile attachments fail") Cc: Stable <[email protected]> Signed-off-by: Matthew Garrett <[email protected]> Signed-off-by: John Johansen <[email protected]>
2018-01-12apparmor: fix ptrace label match when matching stacked labelsJohn Johansen2-21/+35
Given a label with a profile stack of A//&B or A//&C ... A ptrace rule should be able to specify a generic trace pattern with a rule like ptrace trace A//&**, however this is failing because while the correct label match routine is called, it is being done post label decomposition so it is always being done against a profile instead of the stacked label. To fix this refactor the cross check to pass the full peer label in to the label_match. Fixes: 290f458a4f16 ("apparmor: allow ptrace checks to be finer grained than just capability") Cc: Stable <[email protected]> Reported-by: Matthew Garrett <[email protected]> Tested-by: Matthew Garrett <[email protected]> Signed-off-by: John Johansen <[email protected]>
2018-01-12Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-3/+12
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Two pending (non-PTI) x86 fixes: - an Intel-MID crash fix - and an Intel microcode loader blacklist quirk to avoid a problematic revision" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/intel-mid: Revert "Make 'bt_sfi_data' const" x86/microcode/intel: Extend BDW late-loading with a revision check
2018-01-12Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds3-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "A Kconfig fix, a build fix and a membarrier bug fix" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: membarrier: Disable preemption when calling smp_call_function_many() sched/isolation: Make CONFIG_CPU_ISOLATION=y depend on SMP or COMPILE_TEST ia64, sched/cputime: Fix build error if CONFIG_VIRT_CPU_ACCOUNTING_NATIVE=y
2018-01-12Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds6-16/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "No functional effects intended: removes leftovers from recent lockdep and refcounts work" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/refcounts: Remove stale comment from the ARCH_HAS_REFCOUNT Kconfig entry locking/lockdep: Remove cross-release leftovers locking/Documentation: Remove stale crossrelease_fullstack parameter
2018-01-12Merge tag 'for-linus-4.15-rc8-tag' of ↵Linus Torvalds3-10/+8
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "This contains two build fixes for clang and two fixes for rather unlikely situations in the Xen gntdev driver" * tag 'for-linus-4.15-rc8-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/gntdev: Fix partial gntdev_mmap() cleanup xen/gntdev: Fix off-by-one error when unmapping with holes x86: xen: remove the use of VLAIS x86/xen/time: fix section mismatch for xen_init_time_ops()