aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2014-06-23DMA, CMA: fix possible memory leakJoonsoo Kim1-1/+11
We should free memory for bitmap when we find zone mismatch, otherwise this memory will leak. Additionally, I copy code comment from PPC KVM's CMA code to inform why we need to check zone mis-match. Signed-off-by: Joonsoo Kim <[email protected]> Acked-by: Zhang Yanfei <[email protected]> Reviewed-by: Michal Nazarewicz <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Acked-by: Minchan Kim <[email protected]> Cc: "Aneesh Kumar K.V" <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Gleb Natapov <[email protected]> Cc: Alexander Graf <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23slab: fix oops when reading /proc/slab_allocatorsJoonsoo Kim1-19/+71
Commit b1cb0982bdd6 ("change the management method of free objects of the slab") introduced a bug on slab leak detector ('/proc/slab_allocators'). This detector works like as following decription. 1. traverse all objects on all the slabs. 2. determine whether it is active or not. 3. if active, print who allocate this object. but that commit changed the way how to manage free objects, so the logic determining whether it is active or not is also changed. In before, we regard object in cpu caches as inactive one, but, with this commit, we mistakenly regard object in cpu caches as active one. This intoduces kernel oops if DEBUG_PAGEALLOC is enabled. If DEBUG_PAGEALLOC is enabled, kernel_map_pages() is used to detect who corrupt free memory in the slab. It unmaps page table mapping if object is free and map it if object is active. When slab leak detector check object in cpu caches, it mistakenly think this object active so try to access object memory to retrieve caller of allocation. At this point, page table mapping to this object doesn't exist, so oops occurs. Following is oops message reported from Dave. It blew up when something tried to read /proc/slab_allocators (Just cat it, and you should see the oops below) Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC Modules linked in: [snip...] CPU: 1 PID: 9386 Comm: trinity-c33 Not tainted 3.14.0-rc5+ #131 task: ffff8801aa46e890 ti: ffff880076924000 task.ti: ffff880076924000 RIP: 0010:[<ffffffffaa1a8f4a>] [<ffffffffaa1a8f4a>] handle_slab+0x8a/0x180 RSP: 0018:ffff880076925de0 EFLAGS: 00010002 RAX: 0000000000001000 RBX: 0000000000000000 RCX: 000000005ce85ce7 RDX: ffffea00079be100 RSI: 0000000000001000 RDI: ffff880107458000 RBP: ffff880076925e18 R08: 0000000000000001 R09: 0000000000000000 R10: 0000000000000000 R11: 000000000000000f R12: ffff8801e6f84000 R13: ffffea00079be100 R14: ffff880107458000 R15: ffff88022bb8d2c0 FS: 00007fb769e45740(0000) GS:ffff88024d040000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8801e6f84ff8 CR3: 00000000a22db000 CR4: 00000000001407e0 DR0: 0000000002695000 DR1: 0000000002695000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000070602 Call Trace: leaks_show+0xce/0x240 seq_read+0x28e/0x490 proc_reg_read+0x3d/0x80 vfs_read+0x9b/0x160 SyS_read+0x58/0xb0 tracesys+0xd4/0xd9 Code: f5 00 00 00 0f 1f 44 00 00 48 63 c8 44 3b 0c 8a 0f 84 e3 00 00 00 83 c0 01 44 39 c0 72 eb 41 f6 47 1a 01 0f 84 e9 00 00 00 89 f0 <4d> 8b 4c 04 f8 4d 85 c9 0f 84 88 00 00 00 49 8b 7e 08 4d 8d 46 RIP handle_slab+0x8a/0x180 To fix the problem, I introduce an object status buffer on each slab. With this, we can track object status precisely, so slab leak detector would not access active object and no kernel oops would occur. Memory overhead caused by this fix is only imposed to CONFIG_DEBUG_SLAB_LEAK which is mainly used for debugging, so memory overhead isn't big problem. Signed-off-by: Joonsoo Kim <[email protected]> Reported-by: Dave Jones <[email protected]> Reported-by: Tetsuo Handa <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23shmem: fix faulting into a hole while it's punchedHugh Dickins1-4/+52
Trinity finds that mmap access to a hole while it's punched from shmem can prevent the madvise(MADV_REMOVE) or fallocate(FALLOC_FL_PUNCH_HOLE) from completing, until the reader chooses to stop; with the puncher's hold on i_mutex locking out all other writers until it can complete. It appears that the tmpfs fault path is too light in comparison with its hole-punching path, lacking an i_data_sem to obstruct it; but we don't want to slow down the common case. Extend shmem_fallocate()'s existing range notification mechanism, so shmem_fault() can refrain from faulting pages into the hole while it's punched, waiting instead on i_mutex (when safe to sleep; or repeatedly faulting when not). [[email protected]: coding-style fixes] Signed-off-by: Hugh Dickins <[email protected]> Reported-by: Sasha Levin <[email protected]> Tested-by: Sasha Levin <[email protected]> Cc: Dave Jones <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23mm: let mm_find_pmd fix buggy race with THP faultHugh Dickins4-13/+20
Trinity has reported: BUG: unable to handle kernel NULL pointer dereference at 0000000000000018 IP: __lock_acquire (kernel/locking/lockdep.c:3070 (discriminator 1)) CPU: 6 PID: 16173 Comm: trinity-c364 Tainted: G W 3.15.0-rc1-next-20140415-sasha-00020-gaa90d09 #398 lock_acquire (arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602) _raw_spin_lock (include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:151) remove_migration_pte (mm/migrate.c:137) rmap_walk (mm/rmap.c:1628 mm/rmap.c:1699) remove_migration_ptes (mm/migrate.c:224) migrate_pages (mm/migrate.c:922 mm/migrate.c:960 mm/migrate.c:1126) migrate_misplaced_page (mm/migrate.c:1733) __handle_mm_fault (mm/memory.c:3762 mm/memory.c:3812 mm/memory.c:3925) handle_mm_fault (mm/memory.c:3948) __get_user_pages (mm/memory.c:1851) __mlock_vma_pages_range (mm/mlock.c:255) __mm_populate (mm/mlock.c:711) SyS_mlockall (include/linux/mm.h:1799 mm/mlock.c:817 mm/mlock.c:791) I believe this comes about because, whereas collapsing and splitting THP functions take anon_vma lock in write mode (which excludes concurrent rmap walks), faulting THP functions (write protection and misplaced NUMA) do not - and mostly they do not need to. But they do use a pmdp_clear_flush(), set_pmd_at() sequence which, for an instant (indeed, for a long instant, given the inter-CPU TLB flush in there), leaves *pmd neither present not trans_huge. Which can confuse a concurrent rmap walk, as when removing migration ptes, seen in the dumped trace. Although that rmap walk has a 4k page to insert, anon_vmas containing THPs are in no way segregated from 4k-page anon_vmas, so the 4k-intent mm_find_pmd() does need to cope with that instant when a trans_huge pmd is temporarily absent. I don't think we need strengthen the locking at the THP end: it's easily handled with an ACCESS_ONCE() before testing both conditions. And since mm_find_pmd() had only one caller who wanted a THP rather than a pmd, let's slightly repurpose it to fail when it hits a THP or non-present pmd, and open code split_huge_page_address() again. Signed-off-by: Hugh Dickins <[email protected]> Reported-by: Sasha Levin <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Bob Liu <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dave Jones <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23mm: thp: fix DEBUG_PAGEALLOC oops in copy_page_rep()Hugh Dickins1-4/+35
Trinity has for over a year been reporting a CONFIG_DEBUG_PAGEALLOC oops in copy_page_rep() called from copy_user_huge_page() called from do_huge_pmd_wp_page(). I believe this is a DEBUG_PAGEALLOC false positive, due to the source page being split, and a tail page freed, while copy is in progress; and not a problem without DEBUG_PAGEALLOC, since the pmd_same() check will prevent a miscopy from being made visible. Fix by adding get_user_huge_page() and put_user_huge_page(): reducing to the usual get_page() and put_page() on head page in the usual config; but get and put references to all of the tail pages when DEBUG_PAGEALLOC. [[email protected]: coding-style fixes] Signed-off-by: Hugh Dickins <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reported-by: Sasha Levin <[email protected]> Tested-by: Sasha Levin <[email protected]> Cc: Dave Jones <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23kernel/watchdog.c: print traces for all cpus on lockup detectionAaron Tomlin5-0/+73
A 'softlockup' is defined as a bug that causes the kernel to loop in kernel mode for more than a predefined period to time, without giving other tasks a chance to run. Currently, upon detection of this condition by the per-cpu watchdog task, debug information (including a stack trace) is sent to the system log. On some occasions, we have observed that the "victim" rather than the actual "culprit" (i.e. the owner/holder of the contended resource) is reported to the user. Often this information has proven to be insufficient to assist debugging efforts. To avoid loss of useful debug information, for architectures which support NMI, this patch makes it possible to improve soft lockup reporting. This is accomplished by issuing an NMI to each cpu to obtain a stack trace. If NMI is not supported we just revert back to the old method. A sysctl and boot-time parameter is available to toggle this feature. [[email protected]: add CONFIG_SMP in certain areas] [[email protected]: additional CONFIG_SMP=n optimisations] [[email protected]: fix warning] Signed-off-by: Aaron Tomlin <[email protected]> Signed-off-by: Don Zickus <[email protected]> Cc: David S. Miller <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Jan Moskyto Matejka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23nmi: provide the option to issue an NMI back trace to every cpu but currentAaron Tomlin5-13/+38
Sometimes it is preferred not to use the trigger_all_cpu_backtrace() routine when one wants to avoid capturing a back trace for current. For instance if one was previously captured recently. This patch provides a new routine namely trigger_allbutself_cpu_backtrace() which offers the flexibility to issue an NMI to every cpu but current and capture a back trace accordingly. Patch x86 and sparc to support new routine. [[email protected]: add stub in #else clause] [[email protected]: don't print message in single processor case, wrap with get/put_cpu based on Oleg's suggestion] [[email protected]: undo C99ism] Signed-off-by: Aaron Tomlin <[email protected]> Signed-off-by: Don Zickus <[email protected]> Acked-by: David S. Miller <[email protected]> Cc: Mateusz Guzik <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23Documentation/accounting/getdelays.c: add missing null-terminate after ↵Rickard Strandqvist1-0/+1
strncpy call Added a guaranteed null-terminate after call to strncpy. This was partly found using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <[email protected]> Acked-by: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23drivers/memstick/host/rtsx_pci_ms.c: add cancel_work when remove driverMicky Ching1-0/+1
Add cancel_work_sync() in rtsx_pci_ms_drv_remove() to cancel pending request work when removing the driver. Signed-off-by: Micky Ching <[email protected]> Cc: Samuel Ortiz <[email protected]> says: Cc: Maxim Levitsky <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Alex Dubov <[email protected]> Cc: Roger Tseng <[email protected]> Cc: Wei WANG <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23mm, pcp: allow restoring percpu_pagelist_fraction defaultDavid Rientjes3-15/+31
Oleg reports a division by zero error on zero-length write() to the percpu_pagelist_fraction sysctl: divide error: 0000 [#1] SMP DEBUG_PAGEALLOC CPU: 1 PID: 9142 Comm: badarea_io Not tainted 3.15.0-rc2-vm-nfs+ #19 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff8800d5aeb6e0 ti: ffff8800d87a2000 task.ti: ffff8800d87a2000 RIP: 0010: percpu_pagelist_fraction_sysctl_handler+0x84/0x120 RSP: 0018:ffff8800d87a3e78 EFLAGS: 00010246 RAX: 0000000000000f89 RBX: ffff88011f7fd000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000000000000010 RBP: ffff8800d87a3e98 R08: ffffffff81d002c8 R09: ffff8800d87a3f50 R10: 000000000000000b R11: 0000000000000246 R12: 0000000000000060 R13: ffffffff81c3c3e0 R14: ffffffff81cfddf8 R15: ffff8801193b0800 FS: 00007f614f1e9740(0000) GS:ffff88011f440000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f614f1fa000 CR3: 00000000d9291000 CR4: 00000000000006e0 Call Trace: proc_sys_call_handler+0xb3/0xc0 proc_sys_write+0x14/0x20 vfs_write+0xba/0x1e0 SyS_write+0x46/0xb0 tracesys+0xe1/0xe6 However, if the percpu_pagelist_fraction sysctl is set by the user, it is also impossible to restore it to the kernel default since the user cannot write 0 to the sysctl. This patch allows the user to write 0 to restore the default behavior. It still requires a fraction equal to or larger than 8, however, as stated by the documentation for sanity. If a value in the range [1, 7] is written, the sysctl will return EINVAL. This successfully solves the divide by zero issue at the same time. Signed-off-by: David Rientjes <[email protected]> Reported-by: Oleg Drokin <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23lib/Kconfig.debug: let FRAME_POINTER exclude SCORE, just like exclude most ↵Chen Gang1-2/+2
of other architectures The related warning: scripts/kconfig/conf --allmodconfig Kconfig warning: (FAULT_INJECTION_STACKTRACE_FILTER && LATENCYTOP && KMEMCHECK && LOCKDEP) selects FRAME_POINTER which has unmet direct dependencies (DEBUG_KERNEL && (CRIS || M68K || FRV || UML || AVR32 || SUPERH || BLACKFIN || MN10300 || METAG) || ARCH_WANT_FRAME_POINTERS) Signed-off-by: Chen Gang <[email protected]> Cc: Chen Liqin <[email protected]> Cc: Lennox Wu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23kernel/watchdog.c: remove preemption restrictions when restarting lockup ↵Don Zickus1-2/+0
detector Peter Wu noticed the following splat on his machine when updating /proc/sys/kernel/watchdog_thresh: BUG: sleeping function called from invalid context at mm/slub.c:965 in_atomic(): 1, irqs_disabled(): 0, pid: 1, name: init 3 locks held by init/1: #0: (sb_writers#3){.+.+.+}, at: [<ffffffff8117b663>] vfs_write+0x143/0x180 #1: (watchdog_proc_mutex){+.+.+.}, at: [<ffffffff810e02d3>] proc_dowatchdog+0x33/0x110 #2: (cpu_hotplug.lock){.+.+.+}, at: [<ffffffff810589c2>] get_online_cpus+0x32/0x80 Preemption disabled at:[<ffffffff810e0384>] proc_dowatchdog+0xe4/0x110 CPU: 0 PID: 1 Comm: init Not tainted 3.16.0-rc1-testing #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: dump_stack+0x4e/0x7a __might_sleep+0x11d/0x190 kmem_cache_alloc_trace+0x4e/0x1e0 perf_event_alloc+0x55/0x440 perf_event_create_kernel_counter+0x26/0xe0 watchdog_nmi_enable+0x75/0x140 update_timers_all_cpus+0x53/0xa0 proc_dowatchdog+0xe4/0x110 proc_sys_call_handler+0xb3/0xc0 proc_sys_write+0x14/0x20 vfs_write+0xad/0x180 SyS_write+0x49/0xb0 system_call_fastpath+0x16/0x1b NMI watchdog: disabled (cpu0): hardware events not enabled What happened is after updating the watchdog_thresh, the lockup detector is restarted to utilize the new value. Part of this process involved disabling preemption. Once preemption was disabled, perf tried to allocate a new event (as part of the restart). This caused the above BUG_ON as you can't sleep with preemption disabled. The preemption restriction seemed agressive as we are not doing anything on that particular cpu, but with all the online cpus (which are protected by the get_online_cpus lock). Remove the restriction and the BUG_ON goes away. Signed-off-by: Don Zickus <[email protected]> Acked-by: Michal Hocko <[email protected]> Reported-by: Peter Wu <[email protected]> Tested-by: Peter Wu <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: <[email protected]> [3.13+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23MAINTAINERS: SLAB maintainer updateChristoph Lameter2-3/+9
As discussed in various threads on the side: Remove one inactive maintainer, add two new ones and update my email address. Plus add Andrew. And fix the glob to include files like mm/slab_common.c Signed-off-by: Christoph Lameter <[email protected]> Acked-by: David Rientjes <[email protected]> Acked-by: Joonsoo Kim <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Matt Mackall <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23hugetlb: fix copy_hugetlb_page_range() to handle migration/hwpoisoned entryNaoya Horiguchi1-28/+43
There's a race between fork() and hugepage migration, as a result we try to "dereference" a swap entry as a normal pte, causing kernel panic. The cause of the problem is that copy_hugetlb_page_range() can't handle "swap entry" family (migration entry and hwpoisoned entry) so let's fix it. [[email protected]: coding-style fixes] Signed-off-by: Naoya Horiguchi <[email protected]> Acked-by: Hugh Dickins <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: <[email protected]> [2.6.37+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23tmpfs: ZERO_RANGE and COLLAPSE_RANGE not currently supportedHugh Dickins1-0/+3
I was well aware of FALLOC_FL_ZERO_RANGE and FALLOC_FL_COLLAPSE_RANGE support being added to fallocate(); but didn't realize until now that I had been too stupid to future-proof shmem_fallocate() against new additions. -EOPNOTSUPP instead of going on to ordinary fallocation. Signed-off-by: Hugh Dickins <[email protected]> Reviewed-by: Lukas Czerner <[email protected]> Cc: <[email protected]> [3.15] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23mm, hotplug: probe interface is available on several platformsDavid Rientjes1-9/+6
Documentation/memory-hotplug.txt incorrectly states that the memory driver "probe" interface is only supported on powerpc and is vague about its application on x86. Clarify the platforms that make this interface available if memory hotplug is enabled. Signed-off-by: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23kexec: save PG_head_mask in VMCOREINFOPetr Tesarik2-0/+4
To allow filtering of huge pages, makedumpfile must be able to identify them in the dump. This can be done by checking the appropriate page flag, so communicate its value to makedumpfile through the VMCOREINFO interface. There's only one small catch. Depending on how many page flags are available on a given architecture, this bit can be called PG_head or PG_compound. I sent a similar patch back in 2012, but Eric Biederman did not like using an #ifdef. So, this time I'm adding a common symbol (PG_head_mask) instead. See https://lkml.org/lkml/2012/11/28/91 for the previous version. Signed-off-by: Petr Tesarik <[email protected]> Acked-by: Vivek Goyal <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Fengguang Wu <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Alexey Kardashevskiy <[email protected]> Cc: Sasha Levin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23CPU hotplug, smp: flush any pending IPI callbacks before CPU offlineSrivatsa S. Bhat1-8/+49
There is a race between the CPU offline code (within stop-machine) and the smp-call-function code, which can lead to getting IPIs on the outgoing CPU, *after* it has gone offline. Specifically, this can happen when using smp_call_function_single_async() to send the IPI, since this API allows sending asynchronous IPIs from IRQ disabled contexts. The exact race condition is described below. During CPU offline, in stop-machine, we don't enforce any rule in the _DISABLE_IRQ stage, regarding the order in which the outgoing CPU and the other CPUs disable their local interrupts. Due to this, we can encounter a situation in which an IPI is sent by one of the other CPUs to the outgoing CPU (while it is *still* online), but the outgoing CPU ends up noticing it only *after* it has gone offline. CPU 1 CPU 2 (Online CPU) (CPU going offline) Enter _PREPARE stage Enter _PREPARE stage Enter _DISABLE_IRQ stage = Got a device interrupt, and | Didn't notice the IPI the interrupt handler sent an | since interrupts were IPI to CPU 2 using | disabled on this CPU. smp_call_function_single_async() | = Enter _DISABLE_IRQ stage Enter _RUN stage Enter _RUN stage = Busy loop with interrupts | Invoke take_cpu_down() disabled. | and take CPU 2 offline = Enter _EXIT stage Enter _EXIT stage Re-enable interrupts Re-enable interrupts The pending IPI is noted immediately, but alas, the CPU is offline at this point. This of course, makes the smp-call-function IPI handler code running on CPU 2 unhappy and it complains about "receiving an IPI on an offline CPU". One real example of the scenario on CPU 1 is the block layer's complete-request call-path: __blk_complete_request() [interrupt-handler] raise_blk_irq() smp_call_function_single_async() However, if we look closely, the block layer does check that the target CPU is online before firing the IPI. So in this case, it is actually the unfortunate ordering/timing of events in the stop-machine phase that leads to receiving IPIs after the target CPU has gone offline. In reality, getting a late IPI on an offline CPU is not too bad by itself (this can happen even due to hardware latencies in IPI send-receive). It is a bug only if the target CPU really went offline without executing all the callbacks queued on its list. (Note that a CPU is free to execute its pending smp-call-function callbacks in a batch, without waiting for the corresponding IPIs to arrive for each one of those callbacks). So, fixing this issue can be broken up into two parts: 1. Ensure that a CPU goes offline only after executing all the callbacks queued on it. 2. Modify the warning condition in the smp-call-function IPI handler code such that it warns only if an offline CPU got an IPI *and* that CPU had gone offline with callbacks still pending in its queue. Achieving part 1 is straight-forward - just flush (execute) all the queued callbacks on the outgoing CPU in the CPU_DYING stage[1], including those callbacks for which the source CPU's IPIs might not have been received on the outgoing CPU yet. Once we do this, an IPI that arrives late on the CPU going offline (either due to the race mentioned above, or due to hardware latencies) will be completely harmless, since the outgoing CPU would have executed all the queued callbacks before going offline. Overall, this fix (parts 1 and 2 put together) additionally guarantees that we will see a warning only when the *IPI-sender code* is buggy - that is, if it queues the callback _after_ the target CPU has gone offline. [1]. The CPU_DYING part needs a little more explanation: by the time we execute the CPU_DYING notifier callbacks, the CPU would have already been marked offline. But we want to flush out the pending callbacks at this stage, ignoring the fact that the CPU is offline. So restructure the IPI handler code so that we can by-pass the "is-cpu-offline?" check in this particular case. (Of course, the right solution here is to fix CPU hotplug to mark the CPU offline _after_ invoking the CPU_DYING notifiers, but this requires a lot of audit to ensure that this change doesn't break any existing code; hence lets go with the solution proposed above until that is done). [[email protected]: coding-style fixes] Signed-off-by: Srivatsa S. Bhat <[email protected]> Suggested-by: Frederic Weisbecker <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Gautham R Shenoy <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rafael J. Wysocki <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Thomas Gleixner <[email protected]> Tested-by: Sachin Kamat <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23mm: nommu: per-thread vma cache fixSteven Miao1-1/+1
mm could be removed from current task struct, using previous vma->vm_mm It will crash on blackfin after updated to Linux 3.15. The commit "mm: per-thread vma caching" caused the crash. mm could be removed from current task struct before mmput()-> exit_mmap()-> delete_vma_from_mm() the detailed fault information: NULL pointer access Kernel OOPS in progress Deferred Exception context CURRENT PROCESS: COMM=modprobe PID=278 CPU=0 invalid mm return address: [0x000531de]; contents of: 0x000531b0: c727 acea 0c42 181d 0000 0000 0000 a0a8 0x000531c0: b090 acaa 0c42 1806 0000 0000 0000 a0e8 0x000531d0: b0d0 e801 0000 05b3 0010 e522 0046 [a090] 0x000531e0: 6408 b090 0c00 17cc 3042 e3ff f37b 2fc8 CPU: 0 PID: 278 Comm: modprobe Not tainted 3.15.0-ADI-2014R1-pre-00345-gea9f446 #25 task: 0572b720 ti: 0569e000 task.ti: 0569e000 Compiled for cpu family 0x27fe (Rev 0), but running on:0x0000 (Rev 0) ADSP-BF609-0.0 500(MHz CCLK) 125(MHz SCLK) (mpu off) Linux version 3.15.0-ADI-2014R1-pre-00345-gea9f446 (steven@steven-OptiPlex-390) (gcc version 4.3.5 (ADI-trunk/svn-5962) ) #25 Tue Jun 10 17:47:46 CST 2014 SEQUENCER STATUS: Not tainted SEQSTAT: 00000027 IPEND: 8008 IMASK: ffff SYSCFG: 2806 EXCAUSE : 0x27 physical IVG3 asserted : <0xffa00744> { _trap + 0x0 } physical IVG15 asserted : <0xffa00d68> { _evt_system_call + 0x0 } logical irq 6 mapped : <0xffa003bc> { _bfin_coretmr_interrupt + 0x0 } logical irq 7 mapped : <0x00008828> { _bfin_fault_routine + 0x0 } logical irq 11 mapped : <0x00007724> { _l2_ecc_err + 0x0 } logical irq 13 mapped : <0x00008828> { _bfin_fault_routine + 0x0 } logical irq 39 mapped : <0x00150788> { _bfin_twi_interrupt_entry + 0x0 } logical irq 40 mapped : <0x00150788> { _bfin_twi_interrupt_entry + 0x0 } RETE: <0x00000000> /* Maybe null pointer? */ RETN: <0x0569fe50> /* kernel dynamic memory (maybe user-space) */ RETX: <0x00000480> /* Maybe fixed code section */ RETS: <0x00053384> { _exit_mmap + 0x28 } PC : <0x000531de> { _delete_vma_from_mm + 0x92 } DCPLB_FAULT_ADDR: <0x00000008> /* Maybe null pointer? */ ICPLB_FAULT_ADDR: <0x000531de> { _delete_vma_from_mm + 0x92 } PROCESSOR STATE: R0 : 00000004 R1 : 0569e000 R2 : 00bf3db4 R3 : 00000000 R4 : 057f9800 R5 : 00000001 R6 : 0569ddd0 R7 : 0572b720 P0 : 0572b854 P1 : 00000004 P2 : 00000000 P3 : 0569dda0 P4 : 0572b720 P5 : 0566c368 FP : 0569fe5c SP : 0569fd74 LB0: 057f523f LT0: 057f523e LC0: 00000000 LB1: 0005317c LT1: 00053172 LC1: 00000002 B0 : 00000000 L0 : 00000000 M0 : 0566f5bc I0 : 00000000 B1 : 00000000 L1 : 00000000 M1 : 00000000 I1 : ffffffff B2 : 00000001 L2 : 00000000 M2 : 00000000 I2 : 00000000 B3 : 00000000 L3 : 00000000 M3 : 00000000 I3 : 057f8000 A0.w: 00000000 A0.x: 00000000 A1.w: 00000000 A1.x: 00000000 USP : 056ffcf8 ASTAT: 02003024 Hardware Trace: 0 Target : <0x00003fb8> { _trap_c + 0x0 } Source : <0xffa006d8> { _exception_to_level5 + 0xa0 } JUMP.L 1 Target : <0xffa00638> { _exception_to_level5 + 0x0 } Source : <0xffa004f2> { _bfin_return_from_exception + 0x6 } RTX 2 Target : <0xffa004ec> { _bfin_return_from_exception + 0x0 } Source : <0xffa00590> { _ex_trap_c + 0x70 } JUMP.S 3 Target : <0xffa00520> { _ex_trap_c + 0x0 } Source : <0xffa0076e> { _trap + 0x2a } JUMP (P4) 4 Target : <0xffa00744> { _trap + 0x0 } FAULT : <0x000531de> { _delete_vma_from_mm + 0x92 } P0 = W[P2 + 2] Source : <0x000531da> { _delete_vma_from_mm + 0x8e } P2 = [P4 + 0x18] 5 Target : <0x000531da> { _delete_vma_from_mm + 0x8e } Source : <0x00053176> { _delete_vma_from_mm + 0x2a } IF CC JUMP pcrel 6 Target : <0x0005314c> { _delete_vma_from_mm + 0x0 } Source : <0x00053380> { _exit_mmap + 0x24 } JUMP.L 7 Target : <0x00053378> { _exit_mmap + 0x1c } Source : <0x00053394> { _exit_mmap + 0x38 } IF !CC JUMP pcrel (BP) 8 Target : <0x00053390> { _exit_mmap + 0x34 } Source : <0xffa020e0> { __cond_resched + 0x20 } RTS 9 Target : <0xffa020c0> { __cond_resched + 0x0 } Source : <0x0005338c> { _exit_mmap + 0x30 } JUMP.L 10 Target : <0x0005338c> { _exit_mmap + 0x30 } Source : <0x0005333a> { _delete_vma + 0xb2 } RTS 11 Target : <0x00053334> { _delete_vma + 0xac } Source : <0x0005507a> { _kmem_cache_free + 0xba } RTS 12 Target : <0x00055068> { _kmem_cache_free + 0xa8 } Source : <0x0005505e> { _kmem_cache_free + 0x9e } IF !CC JUMP pcrel (BP) 13 Target : <0x00055052> { _kmem_cache_free + 0x92 } Source : <0x0005501a> { _kmem_cache_free + 0x5a } IF CC JUMP pcrel 14 Target : <0x00054ff4> { _kmem_cache_free + 0x34 } Source : <0x00054fce> { _kmem_cache_free + 0xe } IF CC JUMP pcrel (BP) 15 Target : <0x00054fc0> { _kmem_cache_free + 0x0 } Source : <0x00053330> { _delete_vma + 0xa8 } JUMP.L Kernel Stack Stack info: SP: [0x0569ff24] <0x0569ff24> /* kernel dynamic memory (maybe user-space) */ Memory from 0x0569ff20 to 056a0000 0569ff20: 00000001 [04e8da5a] 00008000 00000000 00000000 056a0000 04e8da5a 04e8da5a 0569ff40: 04eb9eea ffa00dce 02003025 04ea09c5 057f523f 04ea09c4 057f523e 00000000 0569ff60: 00000000 00000000 00000000 00000000 00000000 00000000 00000001 00000000 0569ff80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0569ffa0: 0566f5bc 057f8000 057f8000 00000001 04ec0170 056ffcf8 056ffd04 057f9800 0569ffc0: 04d1d498 057f9800 057f8fe4 057f8ef0 00000001 057f928c 00000001 00000001 0569ffe0: 057f9800 00000000 00000008 00000007 00000001 00000001 00000001 <00002806> Return addresses in stack: address : <0x00002806> { _show_cpuinfo + 0x2d2 } Modules linked in: Kernel panic - not syncing: Kernel exception [ end Kernel panic - not syncing: Kernel exception Signed-off-by: Steven Miao <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: <[email protected]> [3.15.x] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2014-06-23x86_32, signal: Fix vdso rt_sigreturnAndy Lutomirski1-1/+1
This commit: commit 6f121e548f83674ab4920a4e60afb58d4f61b829 Author: Andy Lutomirski <[email protected]> Date: Mon May 5 12:19:34 2014 -0700 x86, vdso: Reimplement vdso.so preparation in build-time C Contained this obvious typo: - restorer = VDSO32_SYMBOL(current->mm->context.vdso, rt_sigreturn); + restorer = current->mm->context.vdso + + selected_vdso32->sym___kernel_sigreturn; Note the missing 'rt_' in the new code. Fix it. Signed-off-by: Andy Lutomirski <[email protected]> Link: http://lkml.kernel.org/r/1eb40ad923acde2e18357ef2832867432e70ac42.1403361010.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <[email protected]>
2014-06-23x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)Andy Lutomirski1-2/+8
The bad syscall nr paths are their own incomprehensible route through the entry control flow. Rearrange them to work just like syscalls that return -ENOSYS. This fixes an OOPS in the audit code when fast-path auditing is enabled and sysenter gets a bad syscall nr (CVE-2014-4508). This has probably been broken since Linux 2.6.27: af0575bba0 i386 syscall audit fast-path Cc: [email protected] Cc: Roland McGrath <[email protected]> Reported-by: Toralf Förster <[email protected]> Signed-off-by: Andy Lutomirski <[email protected]> Link: http://lkml.kernel.org/r/e09c499eade6fc321266dd6b54da7beb28d6991c.1403558229.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <[email protected]>
2014-06-23ARM: dts: kirkwood: fix phy-connection-type for GuruplugSebastian Hesselbarth1-2/+2
Commit eeb845459a72e792a959278b858f9c417e9995bd ("ARM: dts: kirkwood: set Guruplug phy-connection-type to rgmii-id") added phy-connection-type properties to ethernet PHY nodes. Actually, the property has to be set for the ethernet port node instead. Fix it by moving the corresponding properties to the correct nodes. Signed-off-by: Sebastian Hesselbarth <[email protected]> Link: https://lkml.kernel.org/r/1403555115-13111-1-git-send-email-sebastian.hesselbarth@gmail.com Fixes: eeb845459a72: ('ARM: dts: kirkwood: set Guruplug phy-connection-type to rgmii-id') Cc: <[email protected]> # v3.16+ Signed-off-by: Jason Cooper <[email protected]>
2014-06-23be2net: fix qnq mode detection on VFsSuresh Reddy3-6/+4
The driver (on PF or VF) needs to detect if the function is in qnq mode for a HW hack in be_rx_compl_get() to work. The driver queries this information using the GET_PROFILE_CONFIG cmd (since the commit below can caused this regression.) But this cmd is not available on VFs and so the VFs fail to detect qnq mode. This causes vlan traffic to not work. The fix is to use the the adapter->function_mode value queried via QUERY_FIRMWARE_CONFIG cmd on both PFs and VFs to detect the qnq mode. Also QNQ_MODE was incorrectly named FLEX10_MODE; correcting that too as the fix reads much better with the name change. Fixes: f93f160b5 ("refactor multi-channel config code for Skyhawk-R chip") Signed-off-by: Suresh Reddy <[email protected]> Signed-off-by: Sathya Perla <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-23lz4: ensure length does not wrapGreg Kroah-Hartman1-0/+2
Given some pathologically compressed data, lz4 could possibly decide to wrap a few internal variables, causing unknown things to happen. Catch this before the wrapping happens and abort the decompression. Reported-by: "Don A. Bailey" <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2014-06-23lzo: properly check for overrunsGreg Kroah-Hartman1-21/+41
The lzo decompressor can, if given some really crazy data, possibly overrun some variable types. Modify the checking logic to properly detect overruns before they happen. Reported-by: "Don A. Bailey" <[email protected]> Tested-by: "Don A. Bailey" <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2014-06-23drm/i915: default to having backlight if VBT not availableJani Nikula1-3/+3
Apparently there are Apple laptops with magic smoke for a VBIOS, which we fail to find and use. Default to having and setting up backlight in this case. This fixes a regression introduced by commit c675949ec58ca50d5a3ae3c757892f1560f6e896 Author: Jani Nikula <[email protected]> Date: Wed Apr 9 11:31:37 2014 +0300 drm/i915: do not setup backlight if not available according to VBT Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=77831 Reported-and-tested-by: Matteo Cypriani <[email protected]> Cc: [email protected] # 3.15+ Reviewed-by: Imre Deak <[email protected]> Signed-off-by: Jani Nikula <[email protected]>
2014-06-23Merge tag 'imx-fixes-3.16' of ↵Arnd Bergmann14-48/+73
git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into fixes Pull "i.MX fixes for 3.16" from Shawn Guo: - Use GPIO for card CD/WP on imx51-babbage and eukrea-mbimxsd51, because controller base CD/WP is not working in esdhc driver due to runtime PM support - A couple of random ventana gw5xxx board fixes - Add IMX_IPUV3_CORE back to defconfig, which gets lost when moving IPUv3 driver out of staging tree - Fix enet/fec clock selection on imx6sl - Fix display node on imx53-m53evk board - A couple of Cubox-i updates from Russell, which were omitted from the merge window due to dependency * tag 'imx-fixes-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux: ARM: dts: imx51-eukrea-mbimxsd51-baseboard: unbreak esdhc. ARM: dts: imx51-babbage: Fix esdhc setup ARM: dts: mx5: Move the display out of soc {} node ARM: dts: mx5: Fix IPU port node placement ARM: imx_v6_v7_defconfig: Enable CONFIG_IMX_IPUV3_CORE ARM: dts: hummingboard/cubox-i: move usb otg configuration to platform level ARM: dts: cubox-i: add support for PWM-driven front panel LED ARM: dts: imx6: ventana: correct gw52xx sgtl5000 clock source ARM: dts: imx6qdl-gw5xxx: Fix Linear Technology vendor prefix ARM: dts: imx6: ventana: fix include typo ARM: dts: imx6sl: correct the fec ipg clock source ARM: imx6sl: add missing enet clock for imx6sl
2014-06-23ALSA: hda - hdmi: call overridden init on resumePierre Ossman1-1/+1
We need to call the proper init function in case it has been overridden, as it might restore things that the generic routing doesn't know anything about. E.g. AMD cards have special verbs that need resetting. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=77901 Fixes: 5a61358433b1 ('ALSA: hda - hdmi: Add ATI/AMD multi-channel audio support') Signed-off-by: Pierre Ossman <[email protected]> Cc: <[email protected]> [v3.13+] Signed-off-by: Takashi Iwai <[email protected]>
2014-06-23ALSA: hda - Fix usage of "model" module parameterDavid Henningsson1-0/+1
A recent refactoring broke the possibility to manually specify model name as a module parameter. This patch restores the desired functionality. Fixes: c21c8cf77f47 ('ALSA: hda - Add fixup_forced flag') Reported-by: Kent Baxley <[email protected]> Signed-off-by: David Henningsson <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2014-06-23ALSA: compress: fix the struct alignment to 4 bytesVinod Koul2-14/+14
In 64bit systems the compiler can default align to 8bytes causing mis-match with 32bit usermode. Avoid this is future by ensuring all the structures shared with usermode are packed and aligned to 4 bytes irrespective of arch used [coding style fixes by tiwai] Signed-off-by: Vinod Koul <[email protected]> Signed-off-by: Takashi Iwai <[email protected]>
2014-06-23rbd: handle parent_overlap on writes correctlyIlya Dryomov1-1/+9
The following check in rbd_img_obj_request_submit() rbd_dev->parent_overlap <= obj_request->img_offset allows the fall through to the non-layered write case even if both parent_overlap and obj_request->img_offset belong to the same RADOS object. This leads to data corruption, because the area to the left of parent_overlap ends up unconditionally zero-filled instead of being populated with parent data. Suppose we want to write 1M to offset 6M of image bar, which is a clone of foo@snap; object_size is 4M, parent_overlap is 5M: rbd_data.<id>.0000000000000001 ---------------------|----------------------|------------ | should be copyup'ed | should be zeroed out | write ... ---------------------|----------------------|------------ 4M 5M 6M parent_overlap obj_request->img_offset 4..5M should be copyup'ed from foo, yet it is zero-filled, just like 5..6M is. Given that the only striping mode kernel client currently supports is chunking (i.e. stripe_unit == object_size, stripe_count == 1), round parent_overlap up to the next object boundary for the purposes of the overlap check. Cc: [email protected] # 3.10+ Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Josh Durgin <[email protected]>
2014-06-23drm/i915: cache hw power well enabled stateImre Deak4-27/+22
Jesse noticed that the punit communication needed to query the VLV power well status can cause substantial delays. Since we can query the state frequently, for example during I2C transfers, maintain a cached version of the HW state to get rid of this delay. This fixes at least one reported regression where boot time increased by ~4 seconds due to frequent power well state queries on VLV during eDP EDID read. This regression has been introduced in commit bb4932c4f17b68f34645ffbcf845e4c29d17290b Author: Imre Deak <[email protected]> Date: Mon Apr 14 20:24:33 2014 +0300 drm/i915: vlv: check port power domain instead of only D0 for eDP VDD on Reported-by: Jesse Barnes <[email protected]> Signed-off-by: Imre Deak <[email protected]> Reviewed-by: Jesse Barnes <[email protected]> Signed-off-by: Daniel Vetter <[email protected]>
2014-06-23vhost-scsi: don't open-code kvfreeMichael S. Tsirkin1-10/+2
Now that we have kvfree, use it in vhost-scsi instead of the open-coded version. Cc: Nicholas Bellinger <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2014-06-23vhost-net: don't open-code kvfreeRomain Francoise1-10/+2
Commit 23cc5a991c ("vhost-net: extend device allocation to vmalloc") added another open-coded version of kvfree (which is available since v3.15-rc5), nuke it. Signed-off-by: Romain Francoise <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2014-06-23ARC: fix build warning in devtreeVineet Gupta1-1/+1
Signed-off-by: Vineet Gupta <[email protected]>
2014-06-22of: mdio: fixup of_phy_register_fixed_link parsing of new bindingsRichard Retanubun1-3/+5
Fixes commit 3be2a49e5c08 ("of: provide a binding for fixed link PHYs") Fix the parsing of the new fixed link dts bindings for duplex, pause, and asym_pause by using the correct device node pointer. Signed-off-by: Richard Retanubun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-22at86rf230: fix irq setupPhoebe Buckheister1-1/+4
Commit 8eba0eefae24953962067 ("at86rf230: remove irq_type in request_irq") removed the trigger configuration when requesting an irq, and instead relied on the interrupt trigger to be properly configured already. This does not seem to be an assumption that can be safely made, since boards disable all interrupt triggers on boot. On these boards, force the irq to trigger on rising edge, which is also the default for the chip. Signed-off-by: Phoebe Buckheister <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-22net: phy: at803x: fix coccinelle warningsFengguang Wu1-1/+1
drivers/net/phy/at803x.c:196:26-32: ERROR: application of sizeof to pointer sizeof when applied to a pointer typed expression gives the size of the pointer Generated by: scripts/coccinelle/misc/noderef.cocci Signed-off-by: Fengguang Wu <[email protected]> Acked-by: Daniel Mack <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-22net/mlx4_core: Fix the error flow when probing with invalid VF configurationOr Gerlitz1-1/+2
Single ported VF are currently not supported on configurations where one or both ports are IB. When we hit this case, the relevant flow in the driver didn't return error and jumped to the wrong label. Fix that. Fixes: dd41cc3 ('net/mlx4: Adapt num_vfs/probed_vf params for single port VF') Reported-by: Shirley Ma <[email protected]> Signed-off-by: Or Gerlitz <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-22tulip: Poll link status more frequently for Comet chipsOndrej Zary1-1/+1
It now takes up to 60 seconds to detect cable (un)plug on ADMtek Comet chips. That's too slow and might cause people to think that it doesn't work at all. Poll link status every 2 seconds instead of 60 for ADMtek Comet chips. That should be fast enough while not stressing the system too much. Tested with ADMtek AN983B. Signed-off-by: Ondrej Zary <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-06-22Revert "block: add __init to elv_register"Jens Axboe2-2/+2
This reverts commit b5097e956a4d2919ee248d6481e4204c5568ed5c. The original commit is buggy, we do use the registration functions at runtime, for instance when loading IO schedulers through sysfs. Reported-by: Damien Wyart <[email protected]>
2014-06-22Revert "block: add __init to blkcg_policy_register"Jens Axboe2-3/+3
This reverts commit a2d445d440003f2d70ee4cd4970ea82ace616fee. The original commit is buggy, we do use the registration functions at runtime for modular builds.
2014-06-22blkcg: fix use-after-free in __blkg_release_rcu() by making blkcg_gq refcnt ↵Tejun Heo2-15/+9
an atomic_t Hello, So, this patch should do. Joe, Vivek, can one of you guys please verify that the oops goes away with this patch? Jens, the original thread can be read at http://thread.gmane.org/gmane.linux.kernel/1720729 The fix converts blkg->refcnt from int to atomic_t. It does some overhead but it should be minute compared to everything else which is going on and the involved cacheline bouncing, so I think it's highly unlikely to cause any noticeable difference. Also, the refcnt in question should be converted to a perpcu_ref for blk-mq anyway, so the atomic_t is likely to go away pretty soon anyway. Thanks. ------- 8< ------- __blkg_release_rcu() may be invoked after the associated request_queue is released with a RCU grace period inbetween. As such, the function and callbacks invoked from it must not dereference the associated request_queue. This is clearly indicated in the comment above the function. Unfortunately, while trying to fix a different issue, 2a4fd070ee85 ("blkcg: move bulk of blkcg_gq release operations to the RCU callback") ignored this and added [un]locking of @blkg->q->queue_lock to __blkg_release_rcu(). This of course can cause oops as the request_queue may be long gone by the time this code gets executed. general protection fault: 0000 [#1] SMP CPU: 21 PID: 30 Comm: rcuos/21 Not tainted 3.15.0 #1 Hardware name: Stratus ftServer 6400/G7LAZ, BIOS BIOS Version 6.3:57 12/25/2013 task: ffff880854021de0 ti: ffff88085403c000 task.ti: ffff88085403c000 RIP: 0010:[<ffffffff8162e9e5>] [<ffffffff8162e9e5>] _raw_spin_lock_irq+0x15/0x60 RSP: 0018:ffff88085403fdf0 EFLAGS: 00010086 RAX: 0000000000020000 RBX: 0000000000000010 RCX: 0000000000000000 RDX: 000060ef80008248 RSI: 0000000000000286 RDI: 6b6b6b6b6b6b6b6b RBP: ffff88085403fdf0 R08: 0000000000000286 R09: 0000000000009f39 R10: 0000000000020001 R11: 0000000000020001 R12: ffff88103c17a130 R13: ffff88103c17a080 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88107fca0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000006e5ab8 CR3: 000000000193d000 CR4: 00000000000407e0 Stack: ffff88085403fe18 ffffffff812cbfc2 ffff88103c17a130 0000000000000000 ffff88103c17a130 ffff88085403fec0 ffffffff810d1d28 ffff880854021de0 ffff880854021de0 ffff88107fcaec58 ffff88085403fe80 ffff88107fcaec30 Call Trace: [<ffffffff812cbfc2>] __blkg_release_rcu+0x72/0x150 [<ffffffff810d1d28>] rcu_nocb_kthread+0x1e8/0x300 [<ffffffff81091d81>] kthread+0xe1/0x100 [<ffffffff8163813c>] ret_from_fork+0x7c/0xb0 Code: ff 47 04 48 8b 7d 08 be 00 02 00 00 e8 55 48 a4 ff 5d c3 0f 1f 00 66 66 66 66 90 55 48 89 e5 +fa 66 66 90 66 66 90 b8 00 00 02 00 <f0> 0f c1 07 89 c2 c1 ea 10 66 39 c2 75 02 5d c3 83 e2 fe 0f +b7 RIP [<ffffffff8162e9e5>] _raw_spin_lock_irq+0x15/0x60 RSP <ffff88085403fdf0> The request_queue locking was added because blkcg_gq->refcnt is an int protected with the queue lock and __blkg_release_rcu() needs to put the parent. Let's fix it by making blkcg_gq->refcnt an atomic_t and dropping queue locking in the function. Given the general heavy weight of the current request_queue and blkcg operations, this is unlikely to cause any noticeable overhead. Moreover, blkcg_gq->refcnt is likely to be converted to percpu_ref in the near future, so whatever (most likely negligible) overhead it may add is temporary. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Joe Lawrence <[email protected]> Acked-by: Vivek Goyal <[email protected]> Link: http://lkml.kernel.org/g/[email protected] Cc: [email protected] Signed-off-by: Jens Axboe <[email protected]>
2014-06-22Merge tag 'samsung-fixes-1' of ↵Arnd Bergmann5-25/+20
git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung into fixes Merge Samsung fixes for 3.16 from Kukjin Kim: - use WFI macro in platform_do_lowpower because exynos cpuhotplug includes a hardcoded WFI instruction and it causes compile error in Thumb-2 mode. - fix GIC reg sizes for exynos4 SoCs - remove reset timer counter value during boot and resume for mct to fix a big jump in printk timestamps - fix pm code to check cortex-A9 for another exynos SoCs - don't rely on firmware's secondary_cpu_start for mcpm * tag 'samsung-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kgene/linux-samsung: ARM: EXYNOS: Don't rely on firmware's secondary_cpu_start for mcpm ARM: EXYNOS: fix pm code to check for cortex A9 rather than the SoC clocksource: exynos_mct: Don't reset the counter during boot and resume ARM: dts: fix reg sizes of GIC for exynos4 ARM: EXYNOS: Use wfi macro in platform_do_lowpower Signed-off-by: Arnd Bergmann <[email protected]>
2014-06-22drm/msm: fix IOMMU cleanup for -EPROBE_DEFERStephane Viau5-7/+44
If probe fails after IOMMU is attached, we need to detach in order to clean up properly. Before this change, IOMMU faults would occur if the probe failed (-EPROBE_DEFER). Signed-off-by: Stephane Viau <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2014-06-22drm/msm: use PAGE_ALIGNED instead of IS_ALIGNED(PAGE_SIZE)Fabian Frederick1-1/+1
use mm.h definition Cc: David Airlie <[email protected]> Cc: Rob Clark <[email protected]> Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2014-06-22drm/msm/hdmi: set hdp clock rate before prepare_enableStephane Viau3-0/+11
The clock driver usually complains when a clock is being prepared before setting its rate. It is the case here for "core_clk" which needs to be set at 19.2 MHz before we attempt a prepare_enable(). Signed-off-by: Stephane Viau <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2014-06-22drm/msm: storage class should be before const qualifierPeter Griffin1-1/+1
The C99 specification states in section 6.11.5: The placement of a storage-class specifier other than at the beginning of the declaration specifiers in a declaration is an obsolescent feature. Signed-off-by: Peter Griffin <[email protected]> Cc: David Airlie <[email protected]> Cc: [email protected] Cc: [email protected] Signed-off-by: Rob Clark <[email protected]>
2014-06-22drm/msm: Replace type of paddr to uint32_t.Matwey V. Kornilov1-1/+1
This patch helps to avoid the following build issue: drivers/gpu/drm/msm/msm_fbdev.c:108:2: error: passing argument 3 of 'msm_gem_get_iova_locked' from incompatible pointer type [-Werror] msm_gem_get_iova_locked(fbdev->bo, 0, &paddr); ^ In file included from drivers/gpu/drm/msm/msm_fbdev.c:18:0: drivers/gpu/drm/msm/msm_drv.h:153:5: note: expected 'uint32_t *' but argument is of type 'dma_addr_t *' int msm_gem_get_iova_locked(struct drm_gem_object *obj, int id, ^ Signed-off-by: Matwey V. Kornilov <[email protected]> Signed-off-by: Rob Clark <[email protected]>
2014-06-22regulator: bcm590xx: fix vbus nameGraham Williams1-0/+5
The vbus regulator was not getting its name set. This results in the sysfs entry being empty. The lack of a bcm590xx_regs[] table entry also upsets Coverity runs. Add the table entry so the name gets set properly. Signed-off-by: Graham Williams <[email protected]> Acked-by: Matt Porter <[email protected]> Signed-off-by: Mark Brown <[email protected]>