blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2014-08-13	Input: atmel_mxt_ts - simplify mxt_initialize a bit	Dmitry Torokhov	1	-39/+42
	I think having control flow with 2 goto/labels/flags is quite hard to read, this version is a bit more readable IMO. Signed-off-by: Dmitry Torokhov <[email protected]> Signed-off-by: Nick Dyer <[email protected]>
2014-08-13	powerpc/thp: Add tracepoints to track hugepage invalidate	Aneesh Kumar K.V	3	-0/+98
	Add tracepoint to track hugepage invalidate. This help us in debugging difficult to track bugs. Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/mm: Use read barrier when creating real_pte	Aneesh Kumar K.V	1	-5/+25
	On ppc64 we support 4K hash pte with 64K page size. That requires us to track the hash pte slot information on a per 4k basis. We do that by storing the slot details in the second half of pte page. The pte bit _PAGE_COMBO is used to indicate whether the second half need to be looked while building real_pte. We need to use read memory barrier while doing that so that load of hidx is not reordered w.r.t _PAGE_COMBO check. On the store side we already do a lwsync in __hash_page_4K CC: <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/thp: Use ACCESS_ONCE when loading pmdp	Aneesh Kumar K.V	1	-1/+3
	We would get wrong results in compiler recomputed old_pmd. Avoid that by using ACCESS_ONCE CC: <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/thp: Invalidate with vpn in loop	Aneesh Kumar K.V	1	-16/+7
	As per ISA, for 4k base page size we compare 14..65 bits of VA specified with the entry_VA in tlb. That implies we need to make sure we do a tlbie with all the possible 4k va we used to access the 16MB hugepage. With 64k base page size we compare 14..57 bits of VA. Hence we cannot ignore the lower 24 bits of va while tlbie .We also cannot tlb invalidate a 16MB entry with just one tlbie instruction because we don't track which va was used to instantiate the tlb entry. CC: <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/thp: Handle combo pages in invalidate	Aneesh Kumar K.V	3	-5/+13
	If we changed base page size of the segment, either via sub_page_protect or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash table entries. We do a lazy hash page table flush for all mapped pages in the demoted segment. This happens when we handle hash page fault for these pages. We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte, that implies that we could possibly have older 64K hash pte entries in the hash page table and we need to invalidate those entries. Use _PAGE_COMBO to determine the page size with which we should invalidate the hash table entries on unmap. CC: <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/thp: Invalidate old 64K based hash page mapping before insert of 4k pte	Aneesh Kumar K.V	1	-9/+70
	If we changed base page size of the segment, either via sub_page_protect or via remap_4k_pfn, we do a demote_segment which doesn't flush the hash table entries. We do a lazy hash page table flush for all mapped pages in the demoted segment. This happens when we handle hash page fault for these pages. We use _PAGE_COMBO bit along with _PAGE_HASHPTE to indicate whether a pte is backed by 4K hash pte. If we find _PAGE_COMBO not set on the pte, that implies that we could possibly have older 64K hash pte entries in the hash page table and we need to invalidate those entries. Handle this correctly for 16M pages CC: <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/thp: Don't recompute vsid and ssize in loop on invalidate	Aneesh Kumar K.V	4	-43/+26
	The segment identifier and segment size will remain the same in the loop, So we can compute it outside. We also change the hugepage_invalidate interface so that we can use it the later patch CC: <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/thp: Add write barrier after updating the valid bit	Aneesh Kumar K.V	1	-1/+4
	With hugepages, we store the hpte valid information in the pte page whose address is stored in the second half of the PMD. Use a write barrier to make sure clearing pmd busy bit and updating hpte valid info are ordered properly. CC: <[email protected]> Signed-off-by: Aneesh Kumar K.V <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc: reorder per-cpu NUMA information's initialization	Nishanth Aravamudan	2	-9/+15
	There is an issue currently where NUMA information is used on powerpc (and possibly ia64) before it has been read from the device-tree, which leads to large slab consumption with CONFIG_SLUB and memoryless nodes. NUMA powerpc non-boot CPU's cpu_to_node/cpu_to_mem is only accurate after start_secondary(), similar to ia64, which is invoked via smp_init(). Commit 6ee0578b4daae ("workqueue: mark init_workqueues() as early_initcall()") made init_workqueues() be invoked via do_pre_smp_initcalls(), which is obviously before the secondary processors are online. Additionally, the following commits changed init_workqueues() to use cpu_to_node to determine the node to use for kthread_create_on_node: bce903809ab3f ("workqueue: add wq_numa_tbl_len and wq_numa_possible_cpumask[]") f3f90ad469342 ("workqueue: determine NUMA node of workers accourding to the allowed cpumask") Therefore, when init_workqueues() runs, it sees all CPUs as being on Node 0. On LPARs or KVM guests where Node 0 is memoryless, this leads to a high number of slab deactivations (http://www.spinics.net/lists/linux-mm/msg67489.html). Fix this by initializing the powerpc-specific CPU<->node/local memory node mapping as early as possible, which on powerpc is do_init_bootmem(). Currently that function initializes the mapping for the boot CPU, but we extend it to setup the mapping for all possible CPUs. Then, in smp_prepare_cpus(), we can correspondingly set the per-cpu values for all possible CPUs. That ensures that before the early_initcalls run (and really as early as possible), the per-cpu NUMA mapping is accurate. While testing memoryless nodes on PowerKVM guests with a fix to the workqueue logic to use cpu_to_mem() instead of cpu_to_node(), with a guest topology of: available: 2 nodes (0-1) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 node 0 size: 0 MB node 0 free: 0 MB node 1 cpus: 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 node 1 size: 16336 MB node 1 free: 15329 MB node distances: node 0 1 0: 10 40 1: 40 10 the slab consumption decreases from Slab: 932416 kB SUnreclaim: 902336 kB to Slab: 395264 kB SUnreclaim: 359424 kB And we a corresponding increase in the slab efficiency from slab mem objs slabs used active active ------------------------------------------------------------ kmalloc-16384 337 MB 11.28% 100.00% task_struct 288 MB 9.93% 100.00% to slab mem objs slabs used active active ------------------------------------------------------------ kmalloc-16384 37 MB 100.00% 100.00% task_struct 31 MB 100.00% 100.00% Powerpc didn't support memoryless nodes until recently (64bb80d87f01 "powerpc/numa: Enable CONFIG_HAVE_MEMORYLESS_NODES" and 8c272261194d "powerpc/numa: Enable USE_PERCPU_NUMA_NODE_ID"). Those commits also helped improve memory consumption with these kind of environments. Signed-off-by: Nishanth Aravamudan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/perf/hv-24x7: Use kmem_cache_free	Himangi Saraogi	1	-1/+1
	Free memory allocated using kmem_cache_zalloc using kmem_cache_free rather than kfree. The Coccinelle semantic patch that makes this change is as follows: // <smpl> @@ expression x,E,c; @@ x = \(kmem_cache_alloc\\|kmem_cache_zalloc\\|kmem_cache_alloc_node\)(c,...) ... when != x = E when != &x ?-kfree(x) +kmem_cache_free(c,x) // </smpl> Signed-off-by: Himangi Saraogi <[email protected]> Acked-by: Julia Lawall <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/pseries/hvcserver: Fix endian issue in hvcs_get_partner_info	Thomas Falcon	1	-2/+2
	A buffer returned by H_VTERM_PARTNER_INFO contains device information in big endian format, causing problems for little endian architectures. This patch ensures that they are in cpu endian. Signed-off-by: Thomas Falcon <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc: Hard disable interrupts in xmon	Anton Blanchard	1	-0/+3
	xmon only soft disables interrupts. This seems like a bad idea - we certainly don't want decrementer and PMU exceptions going off when we are debugging something inside xmon. This issue was uncovered when the hard lockup detector went off inside xmon. To ensure we wont get a spurious hard lockup warning, I also call touch_nmi_watchdog() when exiting xmon. Signed-off-by: Anton Blanchard <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc: remove duplicate definition of TEXASR_FS	Nishanth Aravamudan	1	-2/+1
	It appears that commits 7f06f21d40a6 ("powerpc/tm: Add checking to treclaim/trechkpt") and e4e38121507a ("KVM: PPC: Book3S HV: Add transactional memory support") both added definitions of TEXASR_FS. Remove one of them. At the same time, fix the alignment of the remaining definition (should be tab-separated like the rest of the #defines). Signed-off-by: Nishanth Aravamudan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/pseries: Avoid deadlock on removing ddw	Gavin Shan	1	-6/+14
	Function remove_ddw() could be called in of_reconfig_notifier and we potentially remove the dynamic DMA window property, which invokes of_reconfig_notifier again. Eventually, it leads to the deadlock as following backtrace shows. The patch fixes the above issue by deferring releasing the dynamic DMA window property while releasing the device node. ============================================= [ INFO: possible recursive locking detected ] 3.16.0+ #428 Tainted: G W --------------------------------------------- drmgr/2273 is trying to acquire lock: ((of_reconfig_chain).rwsem){.+.+..}, at: [<c000000000091890>] \ .__blocking_notifier_call_chain+0x40/0x78 but task is already holding lock: ((of_reconfig_chain).rwsem){.+.+..}, at: [<c000000000091890>] \ .__blocking_notifier_call_chain+0x40/0x78 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock((of_reconfig_chain).rwsem); lock((of_reconfig_chain).rwsem); * DEADLOCK * May be due to missing lock nesting notation 2 locks held by drmgr/2273: #0: (sb_writers#4){.+.+.+}, at: [<c0000000001cbe70>] \ .vfs_write+0xb0/0x1f8 #1: ((of_reconfig_chain).rwsem){.+.+..}, at: [<c000000000091890>] \ .__blocking_notifier_call_chain+0x40/0x78 stack backtrace: CPU: 17 PID: 2273 Comm: drmgr Tainted: G W 3.16.0+ #428 Call Trace: [c0000000137e7000] [c000000000013d9c] .show_stack+0x88/0x148 (unreliable) [c0000000137e70b0] [c00000000083cd34] .dump_stack+0x7c/0x9c [c0000000137e7130] [c0000000000b8afc] .__lock_acquire+0x128c/0x1c68 [c0000000137e7280] [c0000000000b9a4c] .lock_acquire+0xe8/0x104 [c0000000137e7350] [c00000000083588c] .down_read+0x4c/0x90 [c0000000137e73e0] [c000000000091890] .__blocking_notifier_call_chain+0x40/0x78 [c0000000137e7490] [c000000000091900] .blocking_notifier_call_chain+0x38/0x48 [c0000000137e7520] [c000000000682a28] .of_reconfig_notify+0x34/0x5c [c0000000137e75b0] [c000000000682a9c] .of_property_notify+0x4c/0x54 [c0000000137e7650] [c000000000682bf0] .of_remove_property+0x30/0xd4 [c0000000137e76f0] [c000000000052a44] .remove_ddw+0x144/0x168 [c0000000137e7790] [c000000000053204] .iommu_reconfig_notifier+0x30/0xe0 [c0000000137e7820] [c00000000009137c] .notifier_call_chain+0x6c/0xb4 [c0000000137e78c0] [c0000000000918ac] .__blocking_notifier_call_chain+0x5c/0x78 [c0000000137e7970] [c000000000091900] .blocking_notifier_call_chain+0x38/0x48 [c0000000137e7a00] [c000000000682a28] .of_reconfig_notify+0x34/0x5c [c0000000137e7a90] [c000000000682e14] .of_detach_node+0x44/0x1fc [c0000000137e7b40] [c0000000000518e4] .ofdt_write+0x3ac/0x688 [c0000000137e7c20] [c000000000238430] .proc_reg_write+0xb8/0xd4 [c0000000137e7cd0] [c0000000001cbeac] .vfs_write+0xec/0x1f8 [c0000000137e7d70] [c0000000001cc3b0] .SyS_write+0x58/0xa0 [c0000000137e7e30] [c00000000000a064] syscall_exit+0x0/0x98 Cc: [email protected] Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/pseries: Failure on removing device node	Gavin Shan	1	-1/+1
	While running command "drmgr -c phb -r -s 'PHB 528'", following backtrace jumped out because the target device node isn't marked with OF_DETACHED by of_detach_node(), which caused by error returned from memory hotplug related reconfig notifier when disabling CONFIG_MEMORY_HOTREMOVE. The patch fixes it. ERROR: Bad of_node_put() on /pci@800000020000210/ethernet@0 CPU: 14 PID: 2252 Comm: drmgr Tainted: G W 3.16.0+ #427 Call Trace: [c000000012a776a0] [c000000000013d9c] .show_stack+0x88/0x148 (unreliable) [c000000012a77750] [c00000000083cd34] .dump_stack+0x7c/0x9c [c000000012a777d0] [c0000000006807c4] .of_node_release+0x58/0xe0 [c000000012a77860] [c00000000038a7d0] .kobject_release+0x174/0x1b8 [c000000012a77900] [c00000000038a884] .kobject_put+0x70/0x78 [c000000012a77980] [c000000000681680] .of_node_put+0x28/0x34 [c000000012a77a00] [c000000000681ea8] .__of_get_next_child+0x64/0x70 [c000000012a77a90] [c000000000682138] .of_find_node_by_path+0x1b8/0x20c [c000000012a77b40] [c000000000051840] .ofdt_write+0x308/0x688 [c000000012a77c20] [c000000000238430] .proc_reg_write+0xb8/0xd4 [c000000012a77cd0] [c0000000001cbeac] .vfs_write+0xec/0x1f8 [c000000012a77d70] [c0000000001cc3b0] .SyS_write+0x58/0xa0 [c000000012a77e30] [c00000000000a064] syscall_exit+0x0/0x98 Cc: [email protected] Signed-off-by: Gavin Shan <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/boot: Use correct zlib types for comparison	Benjamin Herrenschmidt	1	-2/+2
	Avoids this warning: arch/powerpc/boot/gunzip_util.c:118:9: warning: comparison of distinct pointer types lacks a cast Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/powernv: Interface to register/unregister opal dump region	Vasant Hegde	3	-0/+36
	PowerNV platform is capable of capturing host memory region when system crashes (because of host/firmware). We have new OPAL API to register/ unregister memory region to be captured when system crashes. This patch adds support for new API. Also during boot time we register kernel log buffer and unregister before doing kexec. Signed-off-by: Vasant Hegde <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	printk: Add function to return log buffer address and size	Vasant Hegde	2	-0/+15
	Platforms like IBM Power Systems supports service processor assisted dump. It provides interface to add memory region to be captured when system is crashed. During initialization/running we can add kernel memory region to be collected. Presently we don't have a way to get the log buffer base address and size. This patch adds support to return log buffer address and size. Signed-off-by: Vasant Hegde <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]> Acked-by: Andrew Morton <[email protected]>
2014-08-13	powerpc: Add POWER8 features to CPU_FTRS_POSSIBLE/ALWAYS	Michael Ellerman	1	-2/+4
	We have been a bit slack about updating the CPU_FTRS_POSSIBLE and CPU_FTRS_ALWAYS masks. When we added POWER8, and also POWER8E we forgot to update the ALWAYS mask. And when we added POWER8_DD1 we forgot to update both the POSSIBLE and ALWAYS masks. Luckily this hasn't caused any actual bugs AFAICS. Failing to update the ALWAYS mask just forgoes a potential optimisation opportunity. Failing to update the POSSIBLE mask for POWER8_DD1 is also OK because it only removes a bit rather than adding any. Regardless they should all be in both masks so as to avoid any future bugs when the set of ALWAYS/POSSIBLE bits changes, or the masks themselves change. Signed-off-by: Michael Ellerman <[email protected]> Acked-by: Michael Neuling <[email protected]> Acked-by: Joel Stanley <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/ppc476: Disable BTAC	Alistair Popple	1	-1/+3
	This patch disables the branch target address CAM which under specific circumstances may cause the processor to skip execution of 1-4 instructions. This fixes IBM Erratum #47. Signed-off-by: Alistair Popple <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/powernv: Fix IOMMU group lost	Gavin Shan	2	-18/+22
	When we take full hotplug to recover from EEH errors, PCI buses could be involved. For the case, the child devices of involved PCI buses can't be attached to IOMMU group properly, which is caused by commit 3f28c5a ("powerpc/powernv: Reduce multi-hit of iommu_add_device()"). When adding the PCI devices of the newly created PCI buses to the system, the IOMMU group is expected to be added in (C). (A) fails to bind the IOMMU group because bus->is_added is false. (B) fails because the device doesn't have binding IOMMU table yet. bus->is_added is set to true at end of (C) and pdev->is_added is set to true at (D). pcibios_add_pci_devices() pci_scan_bridge() pci_scan_child_bus() pci_scan_slot() pci_scan_single_device() pci_scan_device() pci_device_add() pcibios_add_device() A: Ignore device_add() B: Ignore pcibios_fixup_bus() pcibios_setup_bus_devices() pcibios_setup_device() C: Hit pcibios_finish_adding_to_bus() pci_bus_add_devices() pci_bus_add_device() D: Add device If the parent PCI bus isn't involved in hotplug, the IOMMU group is expected to be bound in (B). (A) should fail as the sysfs entries aren't populated. The patch fixes the issue by reverting commit 3f28c5a and remove WARN_ON() in iommu_add_device() to allow calling the function even the specified device already has associated IOMMU group. Cc: <[email protected]> # 3.16+ Reported-by: Thadeu Lima de Souza Cascardo <[email protected]> Signed-off-by: Gavin Shan <[email protected]> Acked-by: Wei Yang <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc: Add smp_mb()s to arch_spin_unlock_wait()	Michael Ellerman	1	-0/+4
	Similar to the previous commit which described why we need to add a barrier to arch_spin_is_locked(), we have a similar problem with spin_unlock_wait(). We need a barrier on entry to ensure any spinlock we have previously taken is visibly locked prior to the load of lock->slock. It's also not clear if spin_unlock_wait() is intended to have ACQUIRE semantics. For now be conservative and add a barrier on exit to give it ACQUIRE semantics. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc: Add smp_mb() to arch_spin_is_locked()	Michael Ellerman	1	-0/+1
	The kernel defines the function spin_is_locked(), which can be used to check if a spinlock is currently locked. Using spin_is_locked() on a lock you don't hold is obviously racy. That is, even though you may observe that the lock is unlocked, it may become locked at any time. There is (at least) one exception to that, which is if two locks are used as a pair, and the holder of each checks the status of the other before doing any update. Assuming A and B are two locks, and COUNTER is a shared non-atomic value: The first CPU does: spin_lock(A) if spin_is_locked(B) # nothing else smp_mb() LOAD r = COUNTER r++ STORE COUNTER = r spin_unlock(A) And the second CPU does: spin_lock(B) if spin_is_locked(A) # nothing else smp_mb() LOAD r = COUNTER r++ STORE COUNTER = r spin_unlock(B) Although this is a strange locking construct, it should work. It seems to be understood, but not documented, that spin_is_locked() is not a memory barrier, so in the examples above and below the caller inserts its own memory barrier before acting on the result of spin_is_locked(). For now we assume spin_is_locked() is implemented as below, and we break it out in our examples: bool spin_is_locked(LOCK) { LOAD l = LOCK return l.locked } Our intuition is that there should be no problem even if the two code sequences run simultaneously such as: CPU 0 CPU 1 ================================================== spin_lock(A) spin_lock(B) LOAD b = B LOAD a = A if b.locked # true if a.locked # true # nothing # nothing spin_unlock(A) spin_unlock(B) If one CPU gets the lock before the other then it will do the update and the other CPU will back off: CPU 0 CPU 1 ================================================== spin_lock(A) LOAD b = B spin_lock(B) if b.locked # false LOAD a = A else if a.locked # true smp_mb() # nothing LOAD r1 = COUNTER spin_unlock(B) r1++ STORE COUNTER = r1 spin_unlock(A) However in reality spin_lock() itself is not indivisible. On powerpc we implement it as a load-and-reserve and store-conditional. Ignoring the retry logic for the lost reservation case, it boils down to: spin_lock(LOCK) { LOAD l = LOCK l.locked = true STORE LOCK = l ACQUIRE_BARRIER } The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as defined in memory-barriers.txt: This acts as a one-way permeable barrier. It guarantees that all memory operations after the ACQUIRE operation will appear to happen after the ACQUIRE operation with respect to the other components of the system. On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is also know as "lightweight sync", or "sync 1". As described in Power ISA v2.07 section B.2.1.1, in this scenario the lwsync is not the barrier itself. It instead causes the LOAD of LOCK to act as the barrier, preventing any loads or stores in the locked region from occurring prior to the load of LOCK. Whether this behaviour is in accordance with the definition of ACQUIRE semantics in memory-barriers.txt is open to discussion, we may switch to a different barrier in future. What this means in practice is that the following can occur: CPU 0 CPU 1 ================================================== LOAD a = A LOAD b = B a.locked = true b.locked = true LOAD b = B LOAD a = A STORE A = a STORE B = b if b.locked # false if a.locked # false else else smp_mb() smp_mb() LOAD r1 = COUNTER LOAD r2 = COUNTER r1++ r2++ STORE COUNTER = r1 STORE COUNTER = r2 # Lost update spin_unlock(A) spin_unlock(B) That is, the load of B can occur prior to the store that makes A visibly locked. And similarly for CPU 1. The result is both CPUs hold their lock and believe the other lock is unlocked. The easiest fix for this is to add a full memory barrier to the start of spin_is_locked(), so adding to our previous definition would give us: bool spin_is_locked(LOCK) { smp_mb() LOAD l = LOCK return l.locked } The new barrier orders the store to the lock we are locking vs the load of the other lock: CPU 0 CPU 1 ================================================== LOAD a = A LOAD b = B a.locked = true b.locked = true STORE A = a STORE B = b smp_mb() smp_mb() LOAD b = B LOAD a = A if b.locked # true if a.locked # true # nothing # nothing spin_unlock(A) spin_unlock(B) Although the above example is theoretical, there is code similar to this example in sem_lock() in ipc/sem.c. This commit in addition to the next commit appears to be a fix for crashes we are seeing in that code where we believe this race happens in practice. Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc: Fix "attempt to move .org backwards" error	Guenter Roeck	1	-55/+55
	Once again, we see arch/powerpc/kernel/exceptions-64s.S: Assembler messages: arch/powerpc/kernel/exceptions-64s.S:865: Error: attempt to move .org backwards arch/powerpc/kernel/exceptions-64s.S:866: Error: attempt to move .org backwards arch/powerpc/kernel/exceptions-64s.S:890: Error: attempt to move .org backwards when compiling ppc:allmodconfig. This time the problem has been caused by to commit 0869b6fd209bda ("powerpc/book3s: Add basic infrastructure to handle HMI in Linux"), which adds functions hmi_exception_early and hmi_exception_after_realmode into a critical (size-limited) code area, even though that does not appear to be necessary. Move those functions to a non-critical area of the file. Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-13	powerpc/nohash: Split __early_init_mmu() into boot and secondary	Scott Wood	1	-45/+66
	__early_init_mmu() does some things that are really only needed by the boot cpu. On FSL booke, This includes calling memblock_enforce_memory_limit(), which is labelled __init. Secondary cpu init code can't be __init as that would break CPU hotplug. While it's probably a bug that memblock_enforce_memory_limit() isn't __init_memblock instead, there's no reason why we should be doing this stuff for secondary cpus in the first place. Signed-off-by: Scott Wood <[email protected]> Signed-off-by: Benjamin Herrenschmidt <[email protected]>
2014-08-12	RDMA/ocrdma: report asic-id in query device	Mitesh Ahuja	1	-1/+1
	Ocrdma does not report hw_ver when query_device is issued. This patch adds a meaningful value to this field. Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Mitesh Ahuja <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-08-12	RDMA/ocrdma: Update sli data structure for endianness	Devesh Sharma	2	-50/+129
	Update the sli specific mailbox command request/response data sturcures to fix endianness issues. Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-08-12	RDMA/ocrdma: Obtain SL from device structure	Devesh Sharma	2	-4/+4
	Currently, driver obtains service level value from ah_attr->sl field. However, this field is set to zero all the times from rdma-cm. This patch allows create_ah to obtain service level from dev->sl. Signed-off-by: Devesh Sharma <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-08-12	RDMA/uapi: Include socket.h in rdma_user_cm.h	Doug Ledford	1	-0/+1
	added struct sockaddr_storage to rdma_user_cm.h without also adding an include for linux/socket.h to make sure it is defined. Systemtap needs the header files to build standalone and cannot rely on other files to pre-include other headers, so add linux/socket.h to the list of includes in this file. Fixes: ee7aed4528f ("RDMA/ucma: Support querying for AF_IB addresses") Signed-off-by: Doug Ledford <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-08-12	IB/srpt: Handle GID change events	Doug Ledford	1	-0/+1
	GID change events need a refresh just like LID change events and several others. Handle this the same as the others. Signed-off-by: Doug Ledford <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-08-12	IB/mlx5: Use ARRAY_SIZE instead of sizeof/sizeof[0]	Fabian Frederick	1	-1/+1
	Acked-by: Eli Cohen <[email protected]> Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: Doug Ledford <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-08-12	IB/mlx4: Use ARRAY_SIZE instead of sizeof/sizeof[0]	Fabian Frederick	1	-4/+2
	Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: Doug Ledford <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-08-12	RDMA/amso1100: Check for integer overflow in c2_alloc_cq_buf()	Dan Carpenter	1	-2/+5
	This is a static checker fix. The static checker says that q_size comes from the user and can be any 32 bit value. The call tree is: --> ib_uverbs_create_cq() --> c2_create_cq() --> c2_init_cq() Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Doug Ledford <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-08-12	IPoIB: Remove unnecessary test for NULL before debugfs_remove()	Fabian Frederick	1	-4/+2
	Fix checkpatch warning: WARNING: debugfs_remove(NULL) is safe this check is probably not required Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: Doug Ledford <[email protected]> Signed-off-by: Roland Dreier <[email protected]>
2014-08-12	PCI: Remove DEFINE_PCI_DEVICE_TABLE macro use	Benoit Taine	230	-240/+242
	We should prefer `struct pci_device_id` over `DEFINE_PCI_DEVICE_TABLE` to meet kernel coding style guidelines. This issue was reported by checkpatch. A simplified version of the semantic patch that makes this change is as follows (http://coccinelle.lip6.fr/): // <smpl> @@ identifier i; declarer name DEFINE_PCI_DEVICE_TABLE; initializer z; @@ - DEFINE_PCI_DEVICE_TABLE(i) + const struct pci_device_id i[] = z; // </smpl> [bhelgaas: add semantic patch] Signed-off-by: Benoit Taine <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
2014-08-12	Input: joystick - use get_cycles on ARMv8	Mark Brown	1	-1/+1
	As with ARM the ARMv8 architecture provides a cycle counter which can be used to provide a high resolution time for the joystick driver and silence the build warning that results from not having a precise timer on ARMv8, making allmodconfig and allyesconfig quieter. Signed-off-by: Mark Brown <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]>
2014-08-12	Input: wacom - fix compiler warning if !CONFIG_PM	Geert Uytterhoeven	1	-0/+2
	If CONFIG_PM is not set: drivers/hid/wacom_sys.c:1436: warning: ‘wacom_reset_resume’ defined but not used Protect the unused functions by #ifdef CONFIG_PM to fix this. Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Benjamin Tissoires <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]>
2014-08-12	reiserfs: Fix use after free in journal teardown	Jan Kara	2	-7/+21
	If do_journal_release() races with do_journal_end() which requeues delayed works for transaction flushing, we can leave work items for flushing outstanding transactions queued while freeing them. That results in use after free and possible crash in run_timers_softirq(). Fix the problem by not requeueing works if superblock is being shut down (MS_ACTIVE not set) and using cancel_delayed_work_sync() in do_journal_release(). CC: [email protected] Signed-off-by: Jan Kara <[email protected]>
2014-08-12	e1000e: delete excessive space character in debug message	Jean Sacren	1	-1/+1
	There is an excessive space character between the word and the period in the debug message. So delete it. Signed-off-by: Jean Sacren <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2014-08-12	e1000e: fix trivial kernel doc typos	Jean Sacren	1	-1/+1
	The macro E1000_success is meant to be E1000_SUCCESS. As the return statement in the function is good as is, let's simply correct the comment for this trivial matter. Additionally E1000_ERR_HOST_INTERFACE_COMMAND is supposed to be -E1000_ERR_HOST_INTERFACE_COMMAND. Signed-off-by: Jean Sacren <[email protected]> Tested-by: Aaron Brown <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2014-08-12	i40e: Cleaning up missing null-terminate in conjunction with strncpy	Rickard Strandqvist	1	-8/+8
	Replacing strncpy with strlcpy to avoid strings that lacks null terminate. Signed-off-by: Rickard Strandqvist <[email protected]> Tested-By: Jim Young <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2014-08-12	i40e: use correct structure type name in sizeof	Julia Lawall	1	-1/+1
	Correct typo in the name of the type given to sizeof. Because it is the size of a pointer that is wanted, the typo has no impact on compilation or execution. This problem was found using Coccinelle (http://coccinelle.lip6.fr/). The semantic patch used can be found in message 0 of this patch series. Signed-off-by: Julia Lawall <[email protected]> Tested-By: Jim Young <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2014-08-12	i40e: fix sparse non static symbol warning	Wei Yongjun	1	-3/+3
	Fixes the following sparse warnings: drivers/net/ethernet/intel/i40e/i40e_nvm.c:254:13: warning: symbol 'i40e_write_nvm_aq' was not declared. Should it be static? Signed-off-by: Wei Yongjun <[email protected]> Tested-By: Jim Young <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2014-08-11	i40e: Fix missing uapi/linux/dcbnl.h include in i40e_fcoe.c	Lucas Tanure	1	-0/+1
	Fix missing include in Intel i40e driver. Without this include linux next tree won't compile. Signed-off-by: Lucas Tanure <[email protected]> Tested-by: Jim Young <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]>
2014-08-11	sparc64: Fix pcr_ops initialization and usage bugs.	David S. Miller	3	-3/+8
	Christopher reports that perf_event_print_debug() can crash in uniprocessor builds. The crash is due to pcr_ops being NULL. This happens because pcr_arch_init() is only invoked by smp_cpus_done() which only executes in SMP builds. init_hw_perf_events() is closely intertwined with pcr_ops being setup properly, therefore: 1) Call pcr_arch_init() early on from init_hw_perf_events(), instead of from smp_cpus_done(). 2) Do not hook up a PMU type if pcr_ops is NULL after pcr_arch_init(). 3) Move init_hw_perf_events to a later initcall so that it we will be sure to invoke pcr_arch_init() after all cpus are brought up. Finally, guard the one naked sequence of pcr_ops dereferences in __global_pmu_self() with an appropriate NULL check. Reported-by: Christopher Alexander Tobias Schulze <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-08-11	sparc64: Do not disable interrupts in nmi_cpu_busy()	David S. Miller	1	-1/+0
	nmi_cpu_busy() is a SMP function call that just makes sure that all of the cpus are spinning using cpu cycles while the NMI test runs. It does not need to disable IRQs because we just care about NMIs executing which will even with 'normal' IRQs disabled. It is not legal to enable hard IRQs in a SMP cross call, in fact this bug triggers the BUG check in irq_work_run_list(): BUG_ON(!irqs_disabled()); Because now irq_work_run() is invoked from the tail of generic_smp_call_function_single_interrupt(). Signed-off-by: David S. Miller <[email protected]>
2014-08-11	Merge branch 'bcmgenet'	David S. Miller	2	-15/+30
	Florian Fainelli says: ==================== net: bcmgenet: Wake-on-LAN and suspend fixes This patch series fixes some mistakes that were introduced during the driver changes adding support suspend/resume and Wake-on-LAN. ==================== Signed-off-by: David S. Miller <[email protected]>
2014-08-11	net: bcmgenet: correctly resume adapter from Wake-on-LAN	Florian Fainelli	1	-4/+6
	In case we configured the adapter to be a wake up source from Wake-on-LAN, but we never actually woke up using Wake-on-LAN, we will leave the adapter in MagicPacket matching mode, which prevents any other type of packets from reaching the RX engine. Fix this by calling bcmgenet_power_up() with GENET_POWER_WOL_MAGIC to restore the adapter configuration in bcmgenet_resume(). The second problem we had was an imbalanced clock disabling in bcmgenet_wol_resume(), the Wake-on-LAN slow clock is only enabled in bcmgenet_suspend() if we configured Wake-on-LAN, yet we unconditionally disabled the clock in bcmgenet_wol_resume(). Fixes: 8c90db72f926 ("net: bcmgenet: suspend and resume from Wake-on-LAN") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2014-08-11	net: bcmgenet: update UMAC_CMD only when link is detected	Florian Fainelli	1	-2/+6
	When we bring the interface down, phy_stop() will schedule the PHY state machine to call our link adjustment callback. By the time we do so, we may have clock gated off the GENET hardware block, and this will cause bus errors to happen in bcmgenet_mii_setup(): Make sure that we only touch the UMAC_CMD register when there is an actual link. This is safe to do for two reasons: - updating the Ethernet MAC registers only make sense when a physical link is present - the PHY library state machine first set phydev->link = 0 before invoking phydev->adjust_link in the PHY_HALTED case Fixes: 240524089d7a ("net: bcmgenet: only update UMAC_CMD if something changed") Signed-off-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>