aboutsummaryrefslogtreecommitdiff
path: root/arch/arc/include/asm
AgeCommit message (Collapse)AuthorFilesLines
2015-10-17ARC: [arcompact] entry.S: Improve early return from exceptionVineet Gupta2-7/+8
The requirement is to - Reenable Exceptions (AE cleared) - Reenable Interrupts (E1/E2 set) We need to do wiggle these bits into ERSTATUS and call RTIE. Prev version used the pre-exception STATUS32 as starting point for what goes into ERSTATUS. This required explicit fixups of U/DE/L bits. Instead, use the current (in-exception) STATUS32 as starting point. Being in exception handler U/DE/L can be safely assumed to be correct. Only AE/E1/E2 need to be fixed. So the new implementation is slightly better -Avoids read form memory -Is 4 bytes smaller for the typical 1 level of intr configuration -Depicts the semantics more clearly Signed-off-by: Vineet Gupta <[email protected]>
2015-10-17ARC: [arcompact] don't check for hard isr calling local_irq_enable()Vineet Gupta1-1/+13
Historically this was done by ARC IDE driver, which is long gone. IRQ core is pretty robust now and already checks if IRQs are enabled in hard ISRs. Thus no point in checking this in arch code, for every call of irq enabled. Further if some driver does do that - let it bring down the system so we notice/fix this sooner than covering up for sucker This makes local_irq_enable() - for L1 only case atleast simple enough so we can inline it. Signed-off-by: Vineet Gupta <[email protected]>
2015-10-17ARCv2: mm: THP: flush_pmd_tlb_range make SMP safeVineet Gupta1-0/+5
Signed-off-by: Vineet Gupta <[email protected]>
2015-10-17ARCv2: mm: THP: Implement flush_pmd_tlb_range() optimizationVineet Gupta1-0/+4
Implement the TLB flush routine to evict a sepcific Super TLB entry, vs. moving to a new ASID on every such flush. Signed-off-by: Vineet Gupta <[email protected]>
2015-10-17ARCv2: mm: THP supportVineet Gupta3-2/+92
MMUv4 in HS38x cores supports Super Pages which are basis for Linux THP support. Normal and Super pages can co-exist (ofcourse not overlap) in TLB with a new bit "SZ" in TLB page desciptor to distinguish between them. Super Page size is configurable in hardware (4K to 16M), but fixed once RTL builds. The exact THP size a Linx configuration will support is a function of: - MMU page size (typical 8K, RTL fixed) - software page walker address split between PGD:PTE:PFN (typical 11:8:13, but can be changed with 1 line) So for above default, THP size supported is 8K * 256 = 2M Default Page Walker is 2 levels, PGD:PTE:PFN, which in THP regime reduces to 1 level (as PTE is folded into PGD and canonically referred to as PMD). Thus thp PMD accessors are implemented in terms of PTE (just like sparc) Signed-off-by: Vineet Gupta <[email protected]>
2015-10-09ARC: mm: Introduce PTE_SPECIALVineet Gupta1-2/+5
Needed for THP, but will also come in handy for fast GUP later Signed-off-by: Vineet Gupta <[email protected]>
2015-10-09ARC: mm: pte flags comsetic cleanups, commentsVineet Gupta1-21/+16
No semantical changes Signed-off-by: Vineet Gupta <[email protected]>
2015-10-09ARC: mm: switch pgtable_to to pte_t *Vineet Gupta2-5/+5
ARC is the only arch with unsigned long type (vs. struct page *). Historically this was done to avoid the page_address() calls in various arch hooks which need to get the virtual/logical address of the table. Some arches alternately define it as pte_t *, and is as efficient as unsigned long (generated code doesn't change) Suggested-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-10-06Merge tag 'v4.3-rc4' into locking/core, to pick up fixes before applying new ↵Ingo Molnar1-0/+1
changes Signed-off-by: Ingo Molnar <[email protected]>
2015-10-04Merge branch 'strscpy' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile Pull strscpy string copy function implementation from Chris Metcalf. Chris sent this during the merge window, but I waffled back and forth on the pull request, which is why it's going in only now. The new "strscpy()" function is definitely easier to use and more secure than either strncpy() or strlcpy(), both of which are horrible nasty interfaces that have serious and irredeemable problems. strncpy() has a useless return value, and doesn't NUL-terminate an overlong result. To make matters worse, it pads a short result with zeroes, which is a performance disaster if you have big buffers. strlcpy(), by contrast, is a mis-designed "fix" for strlcpy(), lacking the insane NUL padding, but having a differently broken return value which returns the original length of the source string. Which means that it will read characters past the count from the source buffer, and you have to trust the source to be properly terminated. It also makes error handling fragile, since the test for overflow is unnecessarily subtle. strscpy() avoids both these problems, guaranteeing the NUL termination (but not excessive padding) if the destination size wasn't zero, and making the overflow condition very obvious by returning -E2BIG. It also doesn't read past the size of the source, and can thus be used for untrusted source data too. So why did I waffle about this for so long? Every time we introduce a new-and-improved interface, people start doing these interminable series of trivial conversion patches. And every time that happens, somebody does some silly mistake, and the conversion patch to the improved interface actually makes things worse. Because the patch is mindnumbing and trivial, nobody has the attention span to look at it carefully, and it's usually done over large swatches of source code which means that not every conversion gets tested. So I'm pulling the strscpy() support because it *is* a better interface. But I will refuse to pull mindless conversion patches. Use this in places where it makes sense, but don't do trivial patches to fix things that aren't actually known to be broken. * 'strscpy' of git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile: tile: use global strscpy() rather than private copy string: provide strscpy() Make asm/word-at-a-time.h available on all architectures
2015-09-23atomic, arch: Audit atomic_{read,set}()Peter Zijlstra1-4/+4
This patch makes sure that atomic_{read,set}() are at least {READ,WRITE}_ONCE(). We already had the 'requirement' that atomic_read() should use ACCESS_ONCE(), and most archs had this, but a few were lacking. All are now converted to use READ_ONCE(). And, by a symmetry and general paranoia argument, upgrade atomic_set() to use WRITE_ONCE(). Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Paul E. McKenney <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2015-09-03Merge branch 'locking-core-for-linus' of ↵Linus Torvalds1-2/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking and atomic updates from Ingo Molnar: "Main changes in this cycle are: - Extend atomic primitives with coherent logic op primitives (atomic_{or,and,xor}()) and deprecate the old partial APIs (atomic_{set,clear}_mask()) The old ops were incoherent with incompatible signatures across architectures and with incomplete support. Now every architecture supports the primitives consistently (by Peter Zijlstra) - Generic support for 'relaxed atomics': - _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return() - atomic_read_acquire() - atomic_set_release() This came out of porting qwrlock code to arm64 (by Will Deacon) - Clean up the fragile static_key APIs that were causing repeat bugs, by introducing a new one: DEFINE_STATIC_KEY_TRUE(name); DEFINE_STATIC_KEY_FALSE(name); which define a key of different types with an initial true/false value. Then allow: static_branch_likely() static_branch_unlikely() to take a key of either type and emit the right instruction for the case. To be able to know the 'type' of the static key we encode it in the jump entry (by Peter Zijlstra) - Static key self-tests (by Jason Baron) - qrwlock optimizations (by Waiman Long) - small futex enhancements (by Davidlohr Bueso) - ... and misc other changes" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits) jump_label/x86: Work around asm build bug on older/backported GCCs locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics locking/qrwlock: Implement queue_write_unlock() using smp_store_release() locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t' locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic locking/static_keys: Make verify_keys() static jump label, locking/static_keys: Update docs locking/static_keys: Provide a selftest jump_label: Provide a self-test s390/uaccess, locking/static_keys: employ static_branch_likely() x86, tsc, locking/static_keys: Employ static_branch_likely() locking/static_keys: Add selftest locking/static_keys: Add a new static_key interface locking/static_keys: Rework update logic locking/static_keys: Add static_key_{en,dis}able() helpers ...
2015-08-27ARCv2: perf: Finally introduce HS perf unitVineet Gupta1-1/+4
With all features in place, the ARC HS pct block can now be effectively allowed to be probed/used Acked-by: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Alexey Brodkin <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-27ARCv2: perf: implement exclusion of event counting in user or kernel modeAlexey Brodkin1-0/+3
Acked-by: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Alexey Brodkin <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-27ARCv2: perf: Support sampling events using overflow interruptsAlexey Brodkin1-2/+6
In times of ARC 700 performance counters didn't have support of interrupt an so for ARC we only had support of non-sampling events. Put simply only "perf stat" was functional. Now with ARC HS we have support of interrupts in performance counters which this change introduces support of. ARC performance counters act in the following way in regard of interrupts generation. [1] A counter counts starting from value set in PCT_COUNT register pair [2] Once counter reaches value set in PCT_INT_CNT interrupt is raised Basic setup look like this: [1] PCT_COUNT = 0; [2] PCT_INT_CNT = __limit_value__; [3] Enable interrupts for that counter and let it run [4] Let counter reach its limit [5] Handle interrupt when it happens Note that PCT HW block is build in CPU core and so ints interrupt line (which is basically OR of all counters IRQs) is wired directly to top-level IRQC. That means do de-assert PCT interrupt it's required to reset IRQs from all counters that have reached their limit values. Acked-by: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Alexey Brodkin <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-27ARC: perf: cap the number of counters to hardware max of 32Vineet Gupta1-2/+3
The number of counters in PCT can never be more than 32 (while countable conditions could be 100+) for both ARCompact and ARCv2 And while at it update copyright dates. Acked-by: Peter Zijlstra <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-20ARC: add/fix some comments in code - no functional changeVineet Gupta2-12/+12
Signed-off-by: Vineet Gupta <[email protected]>
2015-08-20ARC: change some branchs to jumps to resolve linkage errorsYuriy Kolerov1-3/+3
When kernel's binary becomes large enough (32M and more) errors may occur during the final linkage stage. It happens because the build system uses short relocations for ARC by default. This problem may be easily resolved by passing -mlong-calls option to GCC to use long absolute jumps (j) instead of short relative branchs (b). But there are fragments of pure assembler code exist which use branchs in inappropriate places and cause a linkage error because of relocations overflow. First of these fragments is .fixup insertion in futex.h and unaligned.c. It inserts a code in the separate section (.fixup) with branch instruction. It leads to the linkage error when kernel becomes large. Second of these fragments is calling scheduler's functions (common kernel code) from entry.S of ARC's code. When kernel's binary becomes large it may lead to the linkage error because scheduler may occur far enough from ARC's code in the final binary. Signed-off-by: Yuriy Kolerov <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-20ARC: ensure futex ops are atomic in !LLSC configVineet Gupta1-0/+12
W/o hardware assisted atomic r-m-w the best we can do is to disable preemption. Cc: David Hildenbrand <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Michel Lespinasse <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-20ARC: make futex_atomic_cmpxchg_inatomic() return bimodalVineet Gupta1-9/+11
Callers of cmpxchg_futex_value_locked() in futex code expect bimodal return value: !0 (essentially -EFAULT as failure) 0 (success) Before this patch, the success return value was old value of futex, which could very well be non zero, causing caller to possibly take the failure path erroneously. Fix that by returning 0 for success (This fix was done back in 2011 for all upstream arches, which ARC obviously missed) Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Michel Lespinasse <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-20ARC: futex cosmeticsVineet Gupta1-8/+9
Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Michel Lespinasse <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-20ARC: add barriers to futex codeVineet Gupta1-11/+10
The atomic ops on futex need to provide the full barrier just like regular atomics in kernel. Also remove pagefault_enable/disable in futex_atomic_cmpxchg_inatomic() as core code already does that Cc: David Hildenbrand <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Michel Lespinasse <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-20ARCv2: Support IO Coherency and permutations involving L1 and L2 cachesAlexey Brodkin2-0/+9
In case of ARCv2 CPU there're could be following configurations that affect cache handling for data exchanged with peripherals via DMA: [1] Only L1 cache exists [2] Both L1 and L2 exist, but no IO coherency unit [3] L1, L2 caches and IO coherency unit exist Current implementation takes care of [1] and [2]. Moreover support of [2] is implemented with run-time check for SLC existence which is not super optimal. This patch introduces support of [3] and rework of DMA ops usage. Instead of doing run-time check every time a particular DMA op is executed we'll have 3 different implementations of DMA ops and select appropriate one during init. As for IOC support for it we need: [a] Implement empty DMA ops because IOC takes care of cache coherency with DMAed data [b] Route dma_alloc_coherent() via dma_alloc_noncoherent() This is required to make IOC work in first place and also serves as optimization as LD/ST to coherent buffers can be srviced from caches w/o going all the way to memory Signed-off-by: Alexey Brodkin <[email protected]> [vgupta: -Added some comments about IOC gains -Marked dma ops as static, -Massaged changelog a bit] Signed-off-by: Vineet Gupta <[email protected]>
2015-08-07ARCv2: spinlock/rwlock/atomics: reduce 1 instruction in exponential backoffVineet Gupta2-4/+2
The increment of delay counter was 2 instructions: Arithmatic Shfit Left (ASL) + set to 1 on overflow This can be done in 1 using ROtate Left (ROL) Suggested-by: Nigel Topham <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: [email protected] Signed-off-by: Vineet Gupta <[email protected]>
2015-08-05ARC: Make pt_regs regs unsignedVineet Gupta1-27/+27
KGDB fails to build after f51e2f191112 ("ARC: make sure instruction_pointer() returns unsigned value") The hack to force one specific reg to unsigned backfired. There's no reason to keep the regs signed after all. | CC arch/arc/kernel/kgdb.o |../arch/arc/kernel/kgdb.c: In function 'kgdb_trap': | ../arch/arc/kernel/kgdb.c:180:29: error: lvalue required as left operand of assignment | instruction_pointer(regs) -= BREAK_INSTR_SIZE; Reported-by: Yuriy Kolerov <[email protected]> Fixes: f51e2f191112 ("ARC: make sure instruction_pointer() returns unsigned value") Cc: Alexey Brodkin <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-04ARCv2: spinlock/rwlock: Reset retry delay when starting a new spin-wait cycleVineet Gupta1-3/+3
The previous commit for delayed retry of SCOND needs some fine tuning for spin locks. The backoff from delayed retry in conjunction with spin looping of lock itself can potentially cause the delay counter to reach high values. So to provide fairness to any lock operation, after a lock "seems" available (i.e. just before first SCOND try0, reset the delay counter back to starting value of 1 Essentially reset delay to 1 for a new spin-wait-loop-acquire cycle. Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-04ARCv2: spinlock/rwlock/atomics: Delayed retry of failed SCOND with ↵Vineet Gupta2-4/+338
exponential backoff This is to workaround the llock/scond livelock HS38x4 could get into a LLOCK/SCOND livelock in case of multiple overlapping coherency transactions in the SCU. The exclusive line state keeps rotating among contenting cores leading to a never ending cycle. So break the cycle by deferring the retry of failed exclusive access (SCOND). The actual delay needed is function of number of contending cores as well as the unrelated coherency traffic from other cores. To keep the code simple, start off with small delay of 1 which would suffice most cases and in case of contention double the delay. Eventually the delay is sufficient such that the coherency pipeline is drained, thus a subsequent exclusive access would succeed. Link: http://lkml.kernel.org/r/[email protected] Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-04ARC: LLOCK/SCOND based rwlockVineet Gupta2-10/+166
With LLOCK/SCOND, the rwlock counter can be atomically updated w/o need for a guarding spin lock. This in turn elides the EXchange instruction based spinning which causes the cacheline transition to exclusive state and concurrent spinning across cores would cause the line to keep bouncing around. LLOCK/SCOND based implementation is superior as spinning on LLOCK keeps the cacheline in shared state. Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-04ARC: LLOCK/SCOND based spin_lockVineet Gupta1-7/+69
Current spin_lock uses EXchange instruction to implement the atomic test and set of lock location (reads orig value and ST 1). This however forces the cacheline into exclusive state (because of the ST) and concurrent loops in multiple cores will bounce the line around between cores. Instead, use LLOCK/SCOND to implement the atomic test and set which is better as line is in shared state while lock is spinning on LLOCK The real motivation of this change however is to make way for future changes in atomics to implement delayed retry (with backoff). Initial experiment with delayed retry in atomics combined with orig EX based spinlock was a total disaster (broke even LMBench) as struct sock has a cache line sharing an atomic_t and spinlock. The tight spinning on lock, caused the atomic retry to keep backing off such that it would never finish. Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-04ARC: refactor atomic inline asm operands with symbolic namesVineet Gupta1-15/+17
This reduces the diff in forth-coming patches and also helps understand better the incremental changes to inline asm. Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-08-04Revert "ARCv2: STAR 9000837815 workaround hardware exclusive transactions ↵Vineet Gupta1-12/+2
livelock" Extended testing of quad core configuration revealed that this fix was insufficient. Specifically LTP open posix shm_op/23-1 would cause the hardware livelock in llock/scond loop in update_cpu_load_active() So remove this and make way for a proper workaround This reverts commit a5c8b52abe677977883655166796f167ef1e0084. Signed-off-by: Vineet Gupta <[email protected]>
2015-08-03ARCv2: Fix the peripheral address space detectionVineet Gupta1-4/+3
With HS 2.1 release, the peripheral space register no longer contains the uncached space specifics, causing the kernel to panic early on. So read the newer NON VOLATILE AUX register to get that info. Signed-off-by: Vineet Gupta <[email protected]>
2015-07-27atomic: Collapse all atomic_{set,clear}_mask definitionsPeter Zijlstra1-10/+0
Move the now generic definitions of atomic_{set,clear}_mask() into linux/atomic.h to avoid endless and pointless repetition. Also, provide an atomic_andnot() wrapper for those few archs that can implement that. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2015-07-27atomic: Provide atomic_{or,xor,and}Peter Zijlstra1-1/+0
Implement atomic logic ops -- atomic_{or,xor,and}. These will replace the atomic_{set,clear}_mask functions that are available on some archs. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2015-07-27arc: Provide atomic_{or,xor,and}Peter Zijlstra1-2/+17
Implement atomic logic ops -- atomic_{or,xor,and}. These will replace the atomic_{set,clear}_mask functions that are available on some archs. Acked-by: Vineet Gupta <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]>
2015-07-17mm: clean up per architecture MM hook header filesLaurent Dufour2-15/+1
Commit 2ae416b142b6 ("mm: new mm hook framework") introduced an empty header file (mm-arch-hooks.h) for every architecture, even those which doesn't need to define mm hooks. As suggested by Geert Uytterhoeven, this could be cleaned through the use of a generic header file included via each per architecture asm/include/Kbuild file. The PowerPC architecture is not impacted here since this architecture has to defined the arch_remap MM hook. Signed-off-by: Laurent Dufour <[email protected]> Suggested-by: Geert Uytterhoeven <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Acked-by: Vineet Gupta <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-07-13ARC: make sure instruction_pointer() returns unsigned valueAlexey Brodkin1-1/+1
Currently instruction_pointer() returns pt_regs->ret and so return value is of type "long", which implicitly stands for "signed long". While that's perfectly fine when dealing with 32-bit values if return value of instruction_pointer() gets assigned to 64-bit variable sign extension may happen. And at least in one real use-case it happens already. In perf_prepare_sample() return value of perf_instruction_pointer() (which is an alias to instruction_pointer() in case of ARC) is assigned to (struct perf_sample_data)->ip (which type is "u64"). And what we see if instuction pointer points to user-space application that in case of ARC lays below 0x8000_0000 "ip" gets set properly with leading 32 zeros. But if instruction pointer points to kernel address space that starts from 0x8000_0000 then "ip" is set with 32 leadig "f"-s. I.e. id instruction_pointer() returns 0x8100_0000, "ip" will be assigned with 0xffff_ffff__8100_0000. Which is obviously wrong. In particular that issuse broke output of perf, because perf was unable to associate addresses like 0xffff_ffff__8100_0000 with anything from /proc/kallsyms. That's what we used to see: ----------->8---------- 6.27% ls [unknown] [k] 0xffffffff8046c5cc 2.96% ls libuClibc-0.9.34-git.so [.] memcpy 2.25% ls libuClibc-0.9.34-git.so [.] memset 1.66% ls [unknown] [k] 0xffffffff80666536 1.54% ls libuClibc-0.9.34-git.so [.] 0x000224d6 1.18% ls libuClibc-0.9.34-git.so [.] 0x00022472 ----------->8---------- With that change perf output looks much better now: ----------->8---------- 8.21% ls [kernel.kallsyms] [k] memset 3.52% ls libuClibc-0.9.34-git.so [.] memcpy 2.11% ls libuClibc-0.9.34-git.so [.] malloc 1.88% ls libuClibc-0.9.34-git.so [.] memset 1.64% ls [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore 1.41% ls [kernel.kallsyms] [k] __d_lookup_rcu ----------->8---------- Signed-off-by: Alexey Brodkin <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Vineet Gupta <[email protected]>
2015-07-09ARC: Add llock/scond to futex backendVineet Gupta1-6/+42
Signed-off-by: Vineet Gupta <[email protected]>
2015-07-09ARC: Make ARC bitops "safer" (add anti-optimization)Vineet Gupta1-26/+9
ARCompact/ARCv2 ISA provide that any instructions which deals with bitpos/count operand ASL, LSL, BSET, BCLR, BMSK .... will only consider lower 5 bits. i.e. auto-clamp the pos to 0-31. ARC Linux bitops exploited this fact by NOT explicitly masking out upper bits for @nr operand in general, saving a bunch of AND/BMSK instructions in generated code around bitops. While this micro-optimization has worked well over years it is NOT safe as shifting a number with a value, greater than native size is "undefined" per "C" spec. So as it turns outm EZChip ran into this eventually, in their massive muti-core SMP build with 64 cpus. There was a test_bit() inside a loop from 63 to 0 and gcc was weirdly optimizing away the first iteration (so it was really adhering to standard by implementing undefined behaviour vs. removing all the iterations which were phony i.e. (1 << [63..32]) | for i = 63 to 0 | X = ( 1 << i ) | if X == 0 | continue So fix the code to do the explicit masking at the expense of generating additional instructions. Fortunately, this can be mitigated to a large extent as gcc has SHIFT_COUNT_TRUNCATED which allows combiner to fold masking into shift operation itself. It is currently not enabled in ARC gcc backend, but could be done after a bit of testing. Fixes STAR 9000866918 ("unsafe "undefined behavior" code in kernel") Reported-by: Noam Camus <[email protected]> Cc: <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-07-08Make asm/word-at-a-time.h available on all architecturesChris Metcalf1-0/+1
Added the x86 implementation of word-at-a-time to the generic version, which previously only supported big-endian. Omitted the x86-specific load_unaligned_zeropad(), which in any case is also not present for the existing BE-only implementation of a word-at-a-time, and is only used under CONFIG_DCACHE_WORD_ACCESS. Added as a "generic-y" to the Kbuilds of all architectures that didn't previously have it. Signed-off-by: Chris Metcalf <[email protected]>
2015-07-01Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-5/+7
Merge third patchbomb from Andrew Morton: - the rest of MM - scripts/gdb updates - ipc/ updates - lib/ updates - MAINTAINERS updates - various other misc things * emailed patches from Andrew Morton <[email protected]>: (67 commits) genalloc: rename of_get_named_gen_pool() to of_gen_pool_get() genalloc: rename dev_get_gen_pool() to gen_pool_get() x86: opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit MAINTAINERS: add zpool MAINTAINERS: BCACHE: Kent Overstreet has changed email address MAINTAINERS: move Jens Osterkamp to CREDITS MAINTAINERS: remove unused nbd.h pattern MAINTAINERS: update brcm gpio filename pattern MAINTAINERS: update brcm dts pattern MAINTAINERS: update sound soc intel patterns MAINTAINERS: remove website for paride MAINTAINERS: update Emulex ocrdma email addresses bcache: use kvfree() in various places libcxgbi: use kvfree() in cxgbi_free_big_mem() target: use kvfree() in session alloc and free IB/ehca: use kvfree() in ipz_queue_{cd}tor() drm/nouveau/gem: use kvfree() in u_free() drm: use kvfree() in drm_free_large() cxgb4: use kvfree() in t4_free_mem() cxgb3: use kvfree() in cxgb_free_mem() ...
2015-07-01Merge tag 'arc-4.2-rc1' of ↵Linus Torvalds27-927/+1501
git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC architecture updates from Vineet Gupta: - support for HS38 cores based on ARCv2 ISA ARCv2 is the next generation ISA from Synopsys and basis for the HS3{4,6,8} families of processors which retain the traditional ARC mantra of low power and configurability and are now more performant and feature rich. HS38x is a 10 stage pipeline core which supports MMU (with huge pages) and SMP (upto 4 cores) among other features. + www.synopsys.com/dw/ipdir.php?ds=arc-hs38-processor + http://news.synopsys.com/2014-10-14-New-DesignWare-ARC-HS38-Processor-Doubles-Performance-for-Embedded-Linux-Applications + http://www.embedded.com/electronics-news/4435975/Synopsys-ARC-HS38-core-gives-2X-boost-to-Linux-based-apps - support for ARC SDP (Software Development platform): Main Board + CPU Cards = AXS101: CPU Card with ARC700 in silicon @ 700 MHz = AXS103: CPU Card with HS38x in FPGA - refactoring of ARCompact port to accomodate new ARCv2 ISA - misc updates/cleanups * tag 'arc-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: (72 commits) ARC: Fix build failures for ARCompact in linux-next after ARCv2 support ARCv2: Allow older gcc to cope with new regime of ARCv2/ARCompact support ARCv2: [vdk] dts files and defconfig for HS38 VDK ARCv2: [axs103] Support ARC SDP FPGA platform for HS38x cores ARC: [axs101] Prepare for AXS103 ARCv2: [nsim*hs*] Support simulation platforms for HS38x cores ARCv2: All bits in place, allow ARCv2 builds ARCv2: SLC: Handle explcit flush for DMA ops (w/o IO-coherency) ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelock ARC: Reduce bitops lines of code using macros ARCv2: barriers arch: conditionally define smp_{mb,rmb,wmb} ARC: add smp barriers around atomics per Documentation/atomic_ops.txt ARC: add compiler barrier to LLSC based cmpxchg ARCv2: SMP: intc: IDU 2nd level intc for dynamic IRQ distribution ARCv2: SMP: clocksource: Enable Global Real Time counter ARCv2: SMP: ARConnect debug/robustness ARCv2: SMP: Support ARConnect (MCIP) for Inter-Core-Interrupts et al ARC: make plat_smp_ops weak to allow over-rides ARCv2: clocksource: Introduce 64bit local RTC counter ...
2015-06-30arc: use for_each_sg()Akinobu Mita1-5/+7
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since arc doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: Akinobu Mita <[email protected]> Acked-by: Vineet Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-25Merge branch 'for-4.2/sg' of git://git.kernel.dk/linux-blockLinus Torvalds1-1/+0
Pull asm/scatterlist.h removal from Jens Axboe: "We don't have any specific arch scatterlist anymore, since parisc finally switched over. Kill the include" * 'for-4.2/sg' of git://git.kernel.dk/linux-block: remove scatterlist.h generation from arch Kbuild files remove <asm/scatterlist.h>
2015-06-24mm: new mm hook frameworkLaurent Dufour1-0/+15
CRIU is recreating the process memory layout by remapping the checkpointee memory area on top of the current process (criu). This includes remapping the vDSO to the place it has at checkpoint time. However some architectures like powerpc are keeping a reference to the vDSO base address to build the signal return stack frame by calling the vDSO sigreturn service. So once the vDSO has been moved, this reference is no more valid and the signal frame built later are not usable. This patch serie is introducing a new mm hook framework, and a new arch_remap hook which is called when mremap is done and the mm lock still hold. The next patch is adding the vDSO remap and unmap tracking to the powerpc architecture. This patch (of 3): This patch introduces a new set of header file to manage mm hooks: - per architecture empty header file (arch/x/include/asm/mm-arch-hooks.h) - a generic header (include/linux/mm-arch-hooks.h) The architecture which need to overwrite a hook as to redefine it in its header file, while architecture which doesn't need have nothing to do. The default hooks are defined in the generic header and are used in the case the architecture is not defining it. In a next step, mm hooks defined in include/asm-generic/mm_hooks.h should be moved here. Signed-off-by: Laurent Dufour <[email protected]> Suggested-by: Andrew Morton <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-25ARCv2: SLC: Handle explcit flush for DMA ops (w/o IO-coherency)Vineet Gupta1-0/+11
L2 cache on ARCHS processors is called SLC (System Level Cache) For working DMA (in absence of hardware assisted IO Coherency) we need to manage SLC explicitly when buffers transition between cpu and controllers. Signed-off-by: Vineet Gupta <[email protected]>
2015-06-25ARCv2: STAR 9000837815 workaround hardware exclusive transactions livelockVineet Gupta1-2/+12
A quad core SMP build could get into hardware livelock with concurrent LLOCK/SCOND. Workaround that by adding a PREFETCHW which is serialized by SCU (System Coherency Unit). It brings the cache line in Exclusive state and makes others invalidate their lines. This gives enough time for winner to complete the LLOCK/SCOND, before others can get the line back. The prefetchw in the ll/sc loop is not nice but this is the only software workaround for current version of RTL. Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-06-25ARC: Reduce bitops lines of code using macrosVineet Gupta1-333/+144
No semantical changes ! Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-06-25ARCv2: barriersVineet Gupta3-4/+87
ARCv2 based HS38 cores are weakly ordered and thus explicit barriers for kernel proper. SMP barrier is provided by DMB instruction which also guarantees local barrier hence used as backend of smp_*mb() as well as *mb() APIs Also hookup barriers into MMIO accessors to avoid ordering issues in IO Cc: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Will Deacon <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2015-06-25ARC: add smp barriers around atomics per Documentation/atomic_ops.txtVineet Gupta4-0/+89
- arch_spin_lock/unlock were lacking the ACQUIRE/RELEASE barriers Since ARCv2 only provides load/load, store/store and all/all, we need the full barrier - LLOCK/SCOND based atomics, bitops, cmpxchg, which return modified values were lacking the explicit smp barriers. - Non LLOCK/SCOND varaints don't need the explicit barriers since that is implicity provided by the spin locks used to implement the critical section (the spin lock barriers in turn are also fixed in this commit as explained above Cc: Paul E. McKenney <[email protected]> Cc: [email protected] Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>