aboutsummaryrefslogtreecommitdiff
path: root/arch/alpha/include
AgeCommit message (Collapse)AuthorFilesLines
2020-12-01net: Introduce preferred busy-pollingBjörn Töpel1-0/+2
The existing busy-polling mode, enabled by the SO_BUSY_POLL socket option or system-wide using the /proc/sys/net/core/busy_read knob, is an opportunistic. That means that if the NAPI context is not scheduled, it will poll it. If, after busy-polling, the budget is exceeded the busy-polling logic will schedule the NAPI onto the regular softirq handling. One implication of the behavior above is that a busy/heavy loaded NAPI context will never enter/allow for busy-polling. Some applications prefer that most NAPI processing would be done by busy-polling. This series adds a new socket option, SO_PREFER_BUSY_POLL, that works in concert with the napi_defer_hard_irqs and gro_flush_timeout knobs. The napi_defer_hard_irqs and gro_flush_timeout knobs were introduced in commit 6f8b12d661d0 ("net: napi: add hard irqs deferral feature"), and allows for a user to defer interrupts to be enabled and instead schedule the NAPI context from a watchdog timer. When a user enables the SO_PREFER_BUSY_POLL, again with the other knobs enabled, and the NAPI context is being processed by a softirq, the softirq NAPI processing will exit early to allow the busy-polling to be performed. If the application stops performing busy-polling via a system call, the watchdog timer defined by gro_flush_timeout will timeout, and regular softirq handling will resume. In summary; Heavy traffic applications that prefer busy-polling over softirq processing should use this option. Example usage: $ echo 2 | sudo tee /sys/class/net/ens785f1/napi_defer_hard_irqs $ echo 200000 | sudo tee /sys/class/net/ens785f1/gro_flush_timeout Note that the timeout should be larger than the userspace processing window, otherwise the watchdog will timeout and fall back to regular softirq processing. Enable the SO_BUSY_POLL/SO_PREFER_BUSY_POLL options on your socket. Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-11-23arch: move SA_* definitions to generic headersPeter Collingbourne1-14/+0
Most architectures with the exception of alpha, mips, parisc and sparc use the same values for these flags. Move their definitions into asm-generic/signal-defs.h and allow the architectures with non-standard values to override them. Also, document the non-standard flag values in order to make it easier to add new generic flags in the future. A consequence of this change is that on powerpc and x86, the constants' values aside from SA_RESETHAND change signedness from unsigned to signed. This is not expected to impact realistic use of these constants. In particular the typical use of the constants where they are or'ed together and assigned to sa_flags (or another int variable) would not be affected. Signed-off-by: Peter Collingbourne <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Reviewed-by: Dave Martin <[email protected]> Link: https://linux-review.googlesource.com/id/Ia3849f18b8009bf41faca374e701cdca36974528 Link: https://lkml.kernel.org/r/b6d0d1ec34f9ee93e1105f14f288fba5f89d1f24.1605235762.git.pcc@google.com Signed-off-by: Eric W. Biederman <[email protected]>
2020-11-06highmem: Get rid of kmap_types.hThomas Gleixner1-15/+0
The header is not longer used and on alpha, ia64, openrisc, parisc and um it was completely unused anyway as these architectures have no highmem support. Signed-off-by: Thomas Gleixner <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-10-26alpha: use asm-generic/mmu_context.h for no-op implementationsNicholas Piggin1-8/+4
Signed-off-by: Nicholas Piggin <[email protected]> Acked-by: Matt Turner <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: [email protected] Signed-off-by: Arnd Bergmann <[email protected]>
2020-08-20saner calling conventions for csum_and_copy_..._user()Al Viro1-1/+1
All callers of these primitives will * discard anything we might've copied in case of error * ignore the csum value in case of error * always pass 0xffffffff as the initial sum, so the resulting csum value (in case of success, that is) will never be 0. That suggest the following calling conventions: * don't pass err_ptr - just return 0 on error. * don't bother with zeroing destination, etc. in case of error * don't pass the initial sum - just use 0xffffffff. This commit does the minimal conversion in the instances of csum_and_copy_...(); the changes of actual asm code behind them are done later in the series. Note that this asm code is often shared with csum_partial_copy_nocheck(); the difference is that csum_partial_copy_nocheck() passes 0 for initial sum while csum_and_copy_..._user() pass 0xffffffff. Fortunately, we are free to pass 0xffffffff in all cases and subsequent patches will use that freedom without any special comments. A part that could be split off: parisc and uml/i386 claimed to have csum_and_copy_to_user() instances of their own, but those were identical to the generic one, so we simply drop them. Not sure if it's worth a separate commit... Signed-off-by: Al Viro <[email protected]>
2020-08-20csum_partial_copy_nocheck(): drop the last argumentAl Viro1-1/+1
It's always 0. Note that we theoretically could use ~0U as well - result will be the same modulo 0xffff, _if_ the damn thing did the right thing for any value of initial sum; later we'll make use of that when convenient. However, unlike csum_and_copy_..._user(), there are instances that did not work for arbitrary initial sums; c6x is one such. Signed-off-by: Al Viro <[email protected]>
2020-08-20unify generic instances of csum_partial_copy_nocheck()Al Viro1-0/+1
quite a few architectures have the same csum_partial_copy_nocheck() - simply memcpy() the data and then return the csum of the copy. hexagon, parisc, ia64, s390, um: explicitly spelled out that way. arc, arm64, csky, h8300, m68k/nommu, microblaze, mips/GENERIC_CSUM, nds32, nios2, openrisc, riscv, unicore32: end up picking the same thing spelled out in lib/checksum.h (with varying amounts of perversions along the way). everybody else (alpha, arm, c6x, m68k/mmu, mips/!GENERIC_CSUM, powerpc, sh, sparc, x86, xtensa) have non-generic variants. For all except c6x the declaration is in their asm/checksum.h. c6x uses the wrapper from asm-generic/checksum.h that would normally lead to the lib/checksum.h instance, but in case of c6x we end up using an asm function from arch/c6x instead. Screw that mess - have architectures with private instances define _HAVE_ARCH_CSUM_AND_COPY in their asm/checksum.h and have the default one right in net/checksum.h conditional on _HAVE_ARCH_CSUM_AND_COPY *not* defined. Signed-off-by: Al Viro <[email protected]>
2020-08-14iomap: constify ioreadX() iomem argument (as in generic implementation)Krzysztof Kozlowski10-33/+33
Patch series "iomap: Constify ioreadX() iomem argument", v3. The ioread8/16/32() and others have inconsistent interface among the architectures: some taking address as const, some not. It seems there is nothing really stopping all of them to take pointer to const. This patch (of 4): The ioreadX() and ioreadX_rep() helpers have inconsistent interface. On some architectures void *__iomem address argument is a pointer to const, on some not. Implementations of ioreadX() do not modify the memory under the address so they can be converted to a "const" version for const-safety and consistency among architectures. [[email protected]: sh: clk: fix assignment from incompatible pointer type for ioreadX()] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix drivers/mailbox/bcm-pdc-mailbox.c] Link: http://lkml.kernel.org/r/202007132209.Rxmv4QyS%[email protected] Suggested-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Arnd Bergmann <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Helge Deller <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Cc: Kalle Valo <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Dave Jiang <[email protected]> Cc: Jon Mason <[email protected]> Cc: Allen Hubbe <[email protected]> Cc: "Michael S. Tsirkin" <[email protected]> Cc: Jason Wang <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12alpha: fix annotation of io{read,write}{16,32}be()Luc Van Oostenryck1-4/+4
These accessors must be used to read/write a big-endian bus. The value returned or written is native-endian. However, these accessors are defined using be{16,32}_to_cpu() or cpu_to_be{16,32}() to make the endian conversion but these expect a __be{16,32} when none is present. Keeping them would need a force cast that would solve nothing at all. So, do the conversion using swab{16,32}, like done in asm-generic for similar situations. Reported-by: kernel test robot <[email protected]> Signed-off-by: Luc Van Oostenryck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: Arnd Bergmann <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-12uaccess: remove segment_eqChristoph Hellwig1-1/+1
segment_eq is only used to implement uaccess_kernel. Just open code uaccess_kernel in the arch uaccess headers and remove one layer of indirection. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Linus Torvalds <[email protected]> Acked-by: Greentime Hu <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Cc: Nick Hu <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07asm-generic: pgalloc: provide generic pgd_free()Mike Rapoport1-6/+0
Most architectures define pgd_free() as a wrapper for free_page(). Provide a generic version in asm-generic/pgalloc.h and enable its use for most architectures. Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> [m68k] Cc: Abdul Haleem <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Max Filippov <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Satheesh Rajendran <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Matthew Wilcox <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07asm-generic: pgalloc: provide generic pmd_alloc_one() and pmd_free_one()Mike Rapoport1-14/+1
For most architectures that support >2 levels of page tables, pmd_alloc_one() is a wrapper for __get_free_pages(), sometimes with __GFP_ZERO and sometimes followed by memset(0) instead. More elaborate versions on arm64 and x86 account memory for the user page tables and call to pgtable_pmd_page_ctor() as the part of PMD page initialization. Move the arm64 version to include/asm-generic/pgalloc.h and use the generic version on several architectures. The pgtable_pmd_page_ctor() is a NOP when ARCH_ENABLE_SPLIT_PMD_PTLOCK is not enabled, so there is no functional change for most architectures except of the addition of __GFP_ACCOUNT for allocation of user page tables. The pmd_free() is a wrapper for free_page() in all the cases, so no functional change here. Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Abdul Haleem <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Max Filippov <[email protected]> Cc: Peter Zijlstra (Intel) <[email protected]> Cc: Satheesh Rajendran <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-07mm: remove unneeded includes of <asm/pgalloc.h>Mike Rapoport1-1/+0
Patch series "mm: cleanup usage of <asm/pgalloc.h>" Most architectures have very similar versions of pXd_alloc_one() and pXd_free_one() for intermediate levels of page table. These patches add generic versions of these functions in <asm-generic/pgalloc.h> and enable use of the generic functions where appropriate. In addition, functions declared and defined in <asm/pgalloc.h> headers are used mostly by core mm and early mm initialization in arch and there is no actual reason to have the <asm/pgalloc.h> included all over the place. The first patch in this series removes unneeded includes of <asm/pgalloc.h> In the end it didn't work out as neatly as I hoped and moving pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require unnecessary changes to arches that have custom page table allocations, so I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local to mm/. This patch (of 8): In most cases <asm/pgalloc.h> header is required only for allocations of page table memory. Most of the .c files that include that header do not use symbols declared in <asm/pgalloc.h> and do not require that header. As for the other header files that used to include <asm/pgalloc.h>, it is possible to move that include into the .c file that actually uses symbols from <asm/pgalloc.h> and drop the include from the header file. The process was somewhat automated using sed -i -E '/[<"]asm\/pgalloc\.h/d' \ $(grep -L -w -f /tmp/xx \ $(git grep -E -l '[<"]asm/pgalloc\.h')) where /tmp/xx contains all the symbols defined in arch/*/include/asm/pgalloc.h. [[email protected]: fix powerpc warning] Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Pekka Enberg <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> [m68k] Cc: Abdul Haleem <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Max Filippov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Satheesh Rajendran <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Matthew Wilcox <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-08-03Merge tag 'locking-core-2020-08-03' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: - LKMM updates: mostly documentation changes, but also some new litmus tests for atomic ops. - KCSAN updates: the most important change is that GCC 11 now has all fixes in place to support KCSAN, so GCC support can be enabled again. Also more annotations. - futex updates: minor cleanups and simplifications - seqlock updates: merge preparatory changes/cleanups for the 'associated locks' facilities. - lockdep updates: - simplify IRQ trace event handling - add various new debug checks - simplify header dependencies, split out <linux/lockdep_types.h>, decouple lockdep from other low level headers some more - fix NMI handling - misc cleanups and smaller fixes * tag 'locking-core-2020-08-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits) kcsan: Improve IRQ state trace reporting lockdep: Refactor IRQ trace events fields into struct seqlock: lockdep assert non-preemptibility on seqcount_t write lockdep: Add preemption enabled/disabled assertion APIs seqlock: Implement raw_seqcount_begin() in terms of raw_read_seqcount() seqlock: Add kernel-doc for seqcount_t and seqlock_t APIs seqlock: Reorder seqcount_t and seqlock_t API definitions seqlock: seqcount_t latch: End read sections with read_seqcount_retry() seqlock: Properly format kernel-doc code samples Documentation: locking: Describe seqlock design and usage locking/qspinlock: Do not include atomic.h from qspinlock_types.h locking/atomic: Move ATOMIC_INIT into linux/types.h lockdep: Move list.h inclusion into lockdep.h locking/lockdep: Fix TRACE_IRQFLAGS vs. NMIs futex: Remove unused or redundant includes futex: Consistently use fshared as boolean futex: Remove needless goto's futex: Remove put_futex_key() rwsem: fix commas in initialisation docs: locking: Replace HTTP links with HTTPS ones ...
2020-07-29locking/atomic: Move ATOMIC_INIT into linux/types.hHerbert Xu1-1/+0
This patch moves ATOMIC_INIT from asm/atomic.h into linux/types.h. This allows users of atomic_t to use ATOMIC_INIT without having to include atomic.h as that way may lead to header loops. Signed-off-by: Herbert Xu <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Waiman Long <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-07-21alpha: Replace smp_read_barrier_depends() usage with smp_[r]mb()Will Deacon2-13/+13
In preparation for removing smp_read_barrier_depends() altogether, move the Alpha code over to using smp_rmb() and smp_mb() directly. Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2020-07-21alpha: Override READ_ONCE() with barriered implementationWill Deacon2-54/+40
Rather then relying on the core code to use smp_read_barrier_depends() as part of the READ_ONCE() definition, instead override __READ_ONCE() in the Alpha code so that it generates the required mb() and then implement smp_load_acquire() using the new macro to avoid redundant back-to-back barriers from the generic implementation. Acked-by: Peter Zijlstra (Intel) <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2020-06-12alpha: fix memory barriers so that they conform to the specificationMikulas Patocka1-15/+59
The commits cd0e00c10672 and 92d7223a7423 broke boot on the Alpha Avanti platform. The patches move memory barriers after a write before the write. The result is that if there's iowrite followed by ioread, there is no barrier between them. The Alpha architecture allows reordering of the accesses to the I/O space, and the missing barrier between write and read causes hang with serial port and real time clock. This patch makes barriers confiorm to the specification. 1. We add mb() before readX_relaxed and writeX_relaxed - memory-barriers.txt claims that these functions must be ordered w.r.t. each other. Alpha doesn't order them, so we need an explicit barrier. 2. We add mb() before reads from the I/O space - so that if there's a write followed by a read, there should be a barrier between them. Signed-off-by: Mikulas Patocka <[email protected]> Fixes: cd0e00c10672 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering") Fixes: 92d7223a7423 ("alpha: io: reorder barriers to guarantee writeX() and iowriteX() ordering #2") Cc: [email protected] # v4.17+ Acked-by: Ivan Kokshaysky <[email protected]> Reviewed-by: Maciej W. Rozycki <[email protected]> Signed-off-by: Matt Turner <[email protected]>
2020-06-09mm: consolidate pte_index() and pte_offset_*() definitionsMike Rapoport1-12/+2
All architectures define pte_index() as (address >> PAGE_SHIFT) & (PTRS_PER_PTE - 1) and all architectures define pte_offset_kernel() as an entry in the array of PTEs indexed by the pte_index(). For the most architectures the pte_offset_kernel() implementation relies on the availability of pmd_page_vaddr() that converts a PMD entry value to the virtual address of the page containing PTEs array. Let's move x86 definitions of the PTE accessors to the generic place in <linux/pgtable.h> and then simply drop the respective definitions from the other architectures. The architectures that didn't provide pmd_page_vaddr() are updated to have that defined. The generic implementation of pte_offset_kernel() can be overridden by an architecture and alpha makes use of this because it has special ordering requirements for its version of pte_offset_kernel(). [[email protected]: v2] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: update] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: update] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix x86 warning] [[email protected]: fix powerpc build] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nick Hu <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-09mm: introduce include/linux/pgtable.hMike Rapoport1-2/+0
The include/linux/pgtable.h is going to be the home of generic page table manipulation functions. Start with moving asm-generic/pgtable.h to include/linux/pgtable.h and make the latter include asm/pgtable.h. Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Nick Hu <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-09mm: don't include asm/pgtable.h if linux/mm.h is already includedMike Rapoport1-1/+0
Patch series "mm: consolidate definitions of page table accessors", v2. The low level page table accessors (pXY_index(), pXY_offset()) are duplicated across all architectures and sometimes more than once. For instance, we have 31 definition of pgd_offset() for 25 supported architectures. Most of these definitions are actually identical and typically it boils down to, e.g. static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); } static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); } These definitions can be shared among 90% of the arches provided XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined. For architectures that really need a custom version there is always possibility to override the generic version with the usual ifdefs magic. These patches introduce include/linux/pgtable.h that replaces include/asm-generic/pgtable.h and add the definitions of the page table accessors to the new header. This patch (of 12): The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the functions involving page table manipulations, e.g. pte_alloc() and pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h> in the files that include <linux/mm.h>. The include statements in such cases are remove with a simple loop: for f in $(git grep -l "include <linux/mm.h>") ; do sed -i -e '/include <asm\/pgtable.h>/ d' $f done Signed-off-by: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Chris Zankel <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Max Filippov <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Simek <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Nick Hu <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-08mm: rename flush_icache_user_range to flush_icache_user_pageChristoph Hellwig1-5/+5
The function currently known as flush_icache_user_range only operates on a single page. Rename it to flush_icache_user_page as we'll need the name flush_icache_user_range for something else soon. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Stefan Kristiansson <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Albert Ou <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-08alpha: use asm-generic/cacheflush.hChristoph Hellwig1-22/+6
Alpha needs almost no cache flushing routines of its own. Rely on asm-generic/cacheflush.h for the defaults. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-06-02Merge tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-blockLinus Torvalds1-2/+2
Pull block driver updates from Jens Axboe: "On top of the core changes, here are the block driver changes for this merge window: - NVMe changes: - NVMe over Fibre Channel protocol updates, which also reach over to drivers/scsi/lpfc (James Smart) - namespace revalidation support on the target (Anthony Iliopoulos) - gcc zero length array fix (Arnd Bergmann) - nvmet cleanups (Chaitanya Kulkarni) - misc cleanups and fixes (me, Keith Busch, Sagi Grimberg) - use a SRQ per completion vector (Max Gurtovoy) - fix handling of runtime changes to the queue count (Weiping Zhang) - t10 protection information support for nvme-rdma and nvmet-rdma (Israel Rukshin and Max Gurtovoy) - target side AEN improvements (Chaitanya Kulkarni) - various fixes and minor improvements all over, icluding the nvme part of the lpfc driver" - Floppy code cleanup series (Willy, Denis) - Floppy contention fix (Jiri) - Loop CONFIGURE support (Martijn) - bcache fixes/improvements (Coly, Joe, Colin) - q->queuedata cleanups (Christoph) - Get rid of ioctl_by_bdev (Christoph, Stefan) - md/raid5 allocation fixes (Coly) - zero length array fixes (Gustavo) - swim3 task state fix (Xu)" * tag 'for-5.8/drivers-2020-06-01' of git://git.kernel.dk/linux-block: (166 commits) bcache: configure the asynchronous registertion to be experimental bcache: asynchronous devices registration bcache: fix refcount underflow in bcache_device_free() bcache: Convert pr_<level> uses to a more typical style bcache: remove redundant variables i and n lpfc: Fix return value in __lpfc_nvme_ls_abort lpfc: fix axchg pointer reference after free and double frees lpfc: Fix pointer checks and comments in LS receive refactoring nvme: set dma alignment to qword nvmet: cleanups the loop in nvmet_async_events_process nvmet: fix memory leak when removing namespaces and controllers concurrently nvmet-rdma: add metadata/T10-PI support nvmet: add metadata support for block devices nvmet: add metadata/T10-PI support nvme: add Metadata Capabilities enumerations nvmet: rename nvmet_check_data_len to nvmet_check_transfer_len nvmet: rename nvmet_rw_len to nvmet_rw_data_len nvmet: add metadata characteristics for a namespace nvme-rdma: add metadata/T10-PI support nvme-rdma: introduce nvme_rdma_sgl structure ...
2020-05-29alpha: turn csum_partial_copy_from_user() into csum_and_copy_from_user()Al Viro1-1/+2
It's already doing the right thing - it does access_ok() and the wrapper in net/checksum.h is pointless here. Just rename it and be done with that... Signed-off-by: Al Viro <[email protected]>
2020-05-12floppy: split the base port from the register in I/O accessesWilly Tarreau1-2/+2
Currently we have architecture-specific fd_inb() and fd_outb() functions or macros, taking just a port which is in fact made of a base address and a register. The base address is FDC-specific and derived from the local or global "fdc" variable through the FD_IOPORT macro used in the base address calculation. This change splits this by explicitly passing the FDC's base address and the register separately to fd_outb() and fd_inb(). It affects the following archs: - x86, alpha, mips, powerpc, parisc, arm, m68k: simple remap of port -> base+reg - sparc32: use of reg only, since the base address was already masked out and the FDC controller is known from a static struct. - sparc64: like x86 for PCI, like sparc32 for 82077 Some archs use inline functions and others macros. This was not unified in order to minimize the number of changes to review. For the same reason checkpatch still spews a few warnings about things that were already there before. The parisc still uses hard-coded register values and could be cleaned up by taking the register definitions. The sparc per-controller inb/outb functions could further be refined to explicitly take an FDC register instead of a port in argument but it was not needed yet and may be cleaned later. Link: https://lore.kernel.org/r/[email protected] Cc: Ivan Kokshaysky <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Matt Turner <[email protected]> Cc: Ian Molton <[email protected]> Cc: Russell King <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Helge Deller <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: [email protected] Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Denis Efremov <[email protected]>
2020-04-10mm/special: create generic fallbacks for pte_special() and pte_mkspecial()Anshuman Khandual1-2/+0
Currently there are many platforms that dont enable ARCH_HAS_PTE_SPECIAL but required to define quite similar fallback stubs for special page table entry helpers such as pte_special() and pte_mkspecial(), as they get build in generic MM without a config check. This creates two generic fallback stub definitions for these helpers, eliminating much code duplication. mips platform has a special case where pte_special() and pte_mkspecial() visibility is wider than what ARCH_HAS_PTE_SPECIAL enablement requires. This restricts those symbol visibility in order to avoid redefinitions which is now exposed through this new generic stubs and subsequent build failure. arm platform set_pte_at() definition needs to be moved into a C file just to prevent a build failure. [[email protected]: use defined(CONFIG_ARCH_HAS_PTE_SPECIAL) in mips per Thomas] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Guo Ren <[email protected]> [csky] Acked-by: Geert Uytterhoeven <[email protected]> [m68k] Acked-by: Stafford Horne <[email protected]> [openrisc] Acked-by: Helge Deller <[email protected]> [parisc] Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Russell King <[email protected]> Cc: Brian Cain <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Michal Simek <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Paul Burton <[email protected]> Cc: Nick Hu <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Stefan Kristiansson <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Max Filippov <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm/vma: define a default value for VM_DATA_DEFAULT_FLAGSAnshuman Khandual1-3/+0
There are many platforms with exact same value for VM_DATA_DEFAULT_FLAGS This creates a default value for VM_DATA_DEFAULT_FLAGS in line with the existing VM_STACK_DEFAULT_FLAGS. While here, also define some more macros with standard VMA access flag combinations that are used frequently across many platforms. Apart from simplification, this reduces code duplication as well. Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Russell King <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Mark Salter <[email protected]> Cc: Guo Ren <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Brian Cain <[email protected]> Cc: Tony Luck <[email protected]> Cc: Michal Simek <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Paul Burton <[email protected]> Cc: Nick Hu <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Rich Felker <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Chris Zankel <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-07mm: remove dummy struct bootmem_data/bootmem_data_tWaiman Long1-2/+0
Both bootmem_data and bootmem_data_t structures are no longer defined. Remove the dummy forward declarations. Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Baoquan He <[email protected]> Acked-by: Mike Rapoport <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-02asm-generic: make more kernel-space headers mandatoryMasahiro Yamada1-11/+0
Change a header to mandatory-y if both of the following are met: [1] At least one architecture (except um) specifies it as generic-y in arch/*/include/asm/Kbuild [2] Every architecture (except um) either has its own implementation (arch/*/include/asm/*.h) or specifies it as generic-y in arch/*/include/asm/Kbuild This commit was generated by the following shell script. ----------------------------------->8----------------------------------- arches=$(cd arch; ls -1 | sed -e '/Kconfig/d' -e '/um/d') tmpfile=$(mktemp) grep "^mandatory-y +=" include/asm-generic/Kbuild > $tmpfile find arch -path 'arch/*/include/asm/Kbuild' | xargs sed -n 's/^generic-y += \(.*\)/\1/p' | sort -u | while read header do mandatory=yes for arch in $arches do if ! grep -q "generic-y += $header" arch/$arch/include/asm/Kbuild && ! [ -f arch/$arch/include/asm/$header ]; then mandatory=no break fi done if [ "$mandatory" = yes ]; then echo "mandatory-y += $header" >> $tmpfile for arch in $arches do sed -i "/generic-y += $header/d" arch/$arch/include/asm/Kbuild done fi done sed -i '/^mandatory-y +=/d' include/asm-generic/Kbuild LANG=C sort $tmpfile >> include/asm-generic/Kbuild ----------------------------------->8----------------------------------- One obvious benefit is the diff stat: 25 files changed, 52 insertions(+), 557 deletions(-) It is tedious to list generic-y for each arch that needs it. So, mandatory-y works like a fallback default (by just wrapping asm-generic one) when arch does not have a specific header implementation. See the following commits: def3f7cefe4e81c296090e1722a76551142c227c a1b39bae16a62ce4aae02d958224f19316d98b24 It is tedious to convert headers one by one, so I processed by a shell script. Signed-off-by: Masahiro Yamada <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Michal Simek <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Arnd Bergmann <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-03-27futex: arch_futex_atomic_op_inuser() calling conventions changeAl Viro1-3/+2
Move access_ok() in and pagefault_enable()/pagefault_disable() out. Mechanical conversion only - some instances don't really need a separate access_ok() at all (e.g. the ones only using get_user()/put_user(), or architectures where access_ok() is always true); we'll deal with that in followups. Signed-off-by: Al Viro <[email protected]>
2020-01-28Merge branch 'efi-core-for-linus' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Ingo Molnar: "The main changes in this cycle were: - Cleanup of the GOP [graphics output] handling code in the EFI stub - Complete refactoring of the mixed mode handling in the x86 EFI stub - Overhaul of the x86 EFI boot/runtime code - Increase robustness for mixed mode code - Add the ability to disable DMA at the root port level in the EFI stub - Get rid of RWX mappings in the EFI memory map and page tables, where possible - Move the support code for the old EFI memory mapping style into its only user, the SGI UV1+ support code. - plus misc fixes, updates, smaller cleanups. ... and due to interactions with the RWX changes, another round of PAT cleanups make a guest appearance via the EFI tree - with no side effects intended" * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits) efi/x86: Disable instrumentation in the EFI runtime handling code efi/libstub/x86: Fix EFI server boot failure efi/x86: Disallow efi=old_map in mixed mode x86/boot/compressed: Relax sed symbol type regex for LLVM ld.lld efi/x86: avoid KASAN false positives when accessing the 1: 1 mapping efi: Fix handling of multiple efi_fake_mem= entries efi: Fix efi_memmap_alloc() leaks efi: Add tracking for dynamically allocated memmaps efi: Add a flags parameter to efi_memory_map efi: Fix comment for efi_mem_type() wrt absent physical addresses efi/arm: Defer probe of PCIe backed efifb on DT systems efi/x86: Limit EFI old memory map to SGI UV machines efi/x86: Avoid RWX mappings for all of DRAM efi/x86: Don't map the entire kernel text RW for mixed mode x86/mm: Fix NX bit clearing issue in kernel_map_pages_in_pgd efi/libstub/x86: Fix unused-variable warning efi/libstub/x86: Use mandatory 16-byte stack alignment in mixed mode efi/libstub/x86: Use const attribute for efi_is_64bit() efi: Allow disabling PCI busmastering on bridges during boot efi/x86: Allow translating 64-bit arguments for mixed mode calls ...
2020-01-06remove ioremap_nocache and devm_ioremap_nocacheChristoph Hellwig1-8/+2
ioremap has provided non-cached semantics by default since the Linux 2.6 days, so remove the additional ioremap_nocache interface. Signed-off-by: Christoph Hellwig <[email protected]> Acked-by: Arnd Bergmann <[email protected]>
2019-12-10mm/vmalloc: Add empty <asm/vmalloc.h> headers and use them from ↵Ingo Molnar1-0/+4
<linux/vmalloc.h> In the x86 MM code we'd like to untangle various types of historic header dependency spaghetti, but for this we'd need to pass to the generic vmalloc code various vmalloc related defines that customarily come via the <asm/page.h> low level arch header. Signed-off-by: Ingo Molnar <[email protected]>
2019-12-04alpha: use pgtable-nopud instead of 4level-fixupMike Rapoport3-15/+14
Patch series "mm: remove __ARCH_HAS_4LEVEL_HACK", v13. These patches convert several architectures to use page table folding and remove __ARCH_HAS_4LEVEL_HACK along with include/asm-generic/4level-fixup.h. For the nommu configurations the folding is already implemented by the generic code so the only change was to use the appropriate header file. As for the rest, the changes are mostly about mechanical replacement of pgd accessors with pud/pmd ones and the addition of higher levels to page table traversals. With Vineet's patches from "elide extraneous generated code for folded p4d/pud/pmd" series [1] there is a small shrink of the kernel size of about -0.01% for the defconfig builds. This patch (of 13): It is not likely alpha will have 5-level page tables. Replace usage of include/asm-generic/4level-fixup.h and implied __ARCH_HAS_4LEVEL_HACK with include/asm-generic/pgtable-nopud.h and adjust page table manipulation macros and functions accordingly. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Greg Ungerer <[email protected]> Cc: Helge Deller <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Jeff Dike <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Cc: Mark Salter <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michal Simek <[email protected]> Cc: Peter Rosin <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Rolf Eike Beer <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Anatoly Pugachev <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-11-11alpha: remove the unused __ioremap wrapperChristoph Hellwig1-6/+0
No need for the additional namespace pollution. Signed-off-by: Christoph Hellwig <[email protected]>
2019-09-25mm: introduce MADV_PAGEOUTMinchan Kim1-0/+1
When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit intentionally because it's automatically bounded by PMD size. If PMD size(e.g., 256) makes some trouble, we could fix it later by limit it to SWAP_CLUSTER_MAX[1]. - man-page material MADV_PAGEOUT (since Linux x.x) Do not expect access in the near future so pages in the specified regions could be reclaimed instantly regardless of memory pressure. Thus, access in the range after successful operation could cause major page fault but never lose the up-to-date contents unlike MADV_DONTNEED. Pages belonging to a shared mapping are only processed if a write access is allowed for the calling process. MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [1] https://lore.kernel.org/lkml/[email protected]/ [[email protected]: clear PG_active on MADV_PAGEOUT] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Reported-by: kbuild test robot <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Daniel Colascione <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Oleksandr Natalenko <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Sonny Rao <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tim Murray <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25mm: introduce MADV_COLDMinchan Kim1-0/+2
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7. - Background The Android terminology used for forking a new process and starting an app from scratch is a cold start, while resuming an existing app is a hot start. While we continually try to improve the performance of cold starts, hot starts will always be significantly less power hungry as well as faster so we are trying to make hot start more likely than cold start. To increase hot start, Android userspace manages the order that apps should be killed in a process called ActivityManagerService. ActivityManagerService tracks every Android app or service that the user could be interacting with at any time and translates that into a ranked list for lmkd(low memory killer daemon). They are likely to be killed by lmkd if the system has to reclaim memory. In that sense they are similar to entries in any other cache. Those apps are kept alive for opportunistic performance improvements but those performance improvements will vary based on the memory requirements of individual workloads. - Problem Naturally, cached apps were dominant consumers of memory on the system. However, they were not significant consumers of swap even though they are good candidate for swap. Under investigation, swapping out only begins once the low zone watermark is hit and kswapd wakes up, but the overall allocation rate in the system might trip lmkd thresholds and cause a cached process to be killed(we measured performance swapping out vs. zapping the memory by killing a process. Unsurprisingly, zapping is 10x times faster even though we use zram which is much faster than real storage) so kill from lmkd will often satisfy the high zone watermark, resulting in very few pages actually being moved to swap. - Approach The approach we chose was to use a new interface to allow userspace to proactively reclaim entire processes by leveraging platform information. This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages that are known to be cold from userspace and to avoid races with lmkd by reclaiming apps as soon as they entered the cached state. Additionally, it could provide many chances for platform to use much information to optimize memory efficiency. To achieve the goal, the patchset introduce two new options for madvise. One is MADV_COLD which will deactivate activated pages and the other is MADV_PAGEOUT which will reclaim private pages instantly. These new options complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to gain some free memory space. MADV_PAGEOUT is similar to MADV_DONTNEED in a way that it hints the kernel that memory region is not currently needed and should be reclaimed immediately; MADV_COLD is similar to MADV_FREE in a way that it hints the kernel that memory region is not currently needed and should be reclaimed when memory pressure rises. This patch (of 5): When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Pages in the specified regions will be treated as less-recently-accessed compared to pages in the system with similar access frequencies. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [[email protected]: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Reported-by: kbuild test robot <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Daniel Colascione <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Oleksandr Natalenko <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Sonny Rao <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tim Murray <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-24mm: consolidate pgtable_cache_init() and pgd_cache_init()Mike Rapoport1-5/+0
Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem cache for page table allocations on several architectures that do not use PAGE_SIZE tables for one or more levels of the page table hierarchy. Most architectures do not implement these functions and use __weak default NOP implementation of pgd_cache_init(). Since there is no such default for pgtable_cache_init(), its empty stub is duplicated among most architectures. Rename the definitions of pgd_cache_init() to pgtable_cache_init() and drop empty stubs of pgtable_cache_init(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Acked-by: Will Deacon <[email protected]> [arm64] Acked-by: Thomas Gleixner <[email protected]> [x86] Cc: Catalin Marinas <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-24mm: remove quicklist page table cachesNicholas Piggin1-2/+0
Patch series "mm: remove quicklist page table caches". A while ago Nicholas proposed to remove quicklist page table caches [1]. I've rebased his patch on the curren upstream and switched ia64 and sh to use generic versions of PTE allocation. [1] https://lore.kernel.org/linux-mm/[email protected] This patch (of 3): Remove page table allocator "quicklists". These have been around for a long time, but have not got much traction in the last decade and are only used on ia64 and sh architectures. The numbers in the initial commit look interesting but probably don't apply anymore. If anybody wants to resurrect this it's in the git history, but it's unhelpful to have this code and divergent allocator behaviour for minor archs. Also it might be better to instead make more general improvements to page allocator if this is still so slow. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Nicholas Piggin <[email protected]> Signed-off-by: Mike Rapoport <[email protected]> Cc: Tony Luck <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-16arch/*: remove unused isa_page_to_bus()Stephen Kitt1-5/+0
isa_page_to_bus() is deprecated and is no longer used anywhere. Remove it entirely. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Stephen Kitt <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12alpha: switch to generic version of pte allocationMike Rapoport1-37/+3
alpha allocates PTE pages with __get_free_page() and uses GFP_KERNEL | __GFP_ZERO for the allocations. Switch it to the generic version that does exactly the same thing for the kernel page tables and adds __GFP_ACCOUNT for the user PTEs. The alpha pte_free() and pte_free_kernel() versions are identical to the generic ones and can be simply dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Cc: Albert Ou <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Guo Ren <[email protected]> Cc: Guo Ren <[email protected]> Cc: Helge Deller <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Matt Turner <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Palmer Dabbelt <[email protected]> Cc: Paul Burton <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Kuo <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Russell King <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Vincent Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-11Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-nextLinus Torvalds1-0/+2
Pull networking updates from David Miller: "Some highlights from this development cycle: 1) Big refactoring of ipv6 route and neigh handling to support nexthop objects configurable as units from userspace. From David Ahern. 2) Convert explored_states in BPF verifier into a hash table, significantly decreased state held for programs with bpf2bpf calls, from Alexei Starovoitov. 3) Implement bpf_send_signal() helper, from Yonghong Song. 4) Various classifier enhancements to mvpp2 driver, from Maxime Chevallier. 5) Add aRFS support to hns3 driver, from Jian Shen. 6) Fix use after free in inet frags by allocating fqdirs dynamically and reworking how rhashtable dismantle occurs, from Eric Dumazet. 7) Add act_ctinfo packet classifier action, from Kevin Darbyshire-Bryant. 8) Add TFO key backup infrastructure, from Jason Baron. 9) Remove several old and unused ISDN drivers, from Arnd Bergmann. 10) Add devlink notifications for flash update status to mlxsw driver, from Jiri Pirko. 11) Lots of kTLS offload infrastructure fixes, from Jakub Kicinski. 12) Add support for mv88e6250 DSA chips, from Rasmus Villemoes. 13) Various enhancements to ipv6 flow label handling, from Eric Dumazet and Willem de Bruijn. 14) Support TLS offload in nfp driver, from Jakub Kicinski, Dirk van der Merwe, and others. 15) Various improvements to axienet driver including converting it to phylink, from Robert Hancock. 16) Add PTP support to sja1105 DSA driver, from Vladimir Oltean. 17) Add mqprio qdisc offload support to dpaa2-eth, from Ioana Radulescu. 18) Add devlink health reporting to mlx5, from Moshe Shemesh. 19) Convert stmmac over to phylink, from Jose Abreu. 20) Add PTP PHC (Physical Hardware Clock) support to mlxsw, from Shalom Toledo. 21) Add nftables SYNPROXY support, from Fernando Fernandez Mancera. 22) Convert tcp_fastopen over to use SipHash, from Ard Biesheuvel. 23) Track spill/fill of constants in BPF verifier, from Alexei Starovoitov. 24) Support bounded loops in BPF, from Alexei Starovoitov. 25) Various page_pool API fixes and improvements, from Jesper Dangaard Brouer. 26) Just like ipv4, support ref-countless ipv6 route handling. From Wei Wang. 27) Support VLAN offloading in aquantia driver, from Igor Russkikh. 28) Add AF_XDP zero-copy support to mlx5, from Maxim Mikityanskiy. 29) Add flower GRE encap/decap support to nfp driver, from Pieter Jansen van Vuuren. 30) Protect against stack overflow when using act_mirred, from John Hurley. 31) Allow devmap map lookups from eBPF, from Toke Høiland-Jørgensen. 32) Use page_pool API in netsec driver, Ilias Apalodimas. 33) Add Google gve network driver, from Catherine Sullivan. 34) More indirect call avoidance, from Paolo Abeni. 35) Add kTLS TX HW offload support to mlx5, from Tariq Toukan. 36) Add XDP_REDIRECT support to bnxt_en, from Andy Gospodarek. 37) Add MPLS manipulation actions to TC, from John Hurley. 38) Add sending a packet to connection tracking from TC actions, and then allow flower classifier matching on conntrack state. From Paul Blakey. 39) Netfilter hw offload support, from Pablo Neira Ayuso" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2080 commits) net/mlx5e: Return in default case statement in tx_post_resync_params mlx5: Return -EINVAL when WARN_ON_ONCE triggers in mlx5e_tls_resync(). net: dsa: add support for BRIDGE_MROUTER attribute pkt_sched: Include const.h net: netsec: remove static declaration for netsec_set_tx_de() net: netsec: remove superfluous if statement netfilter: nf_tables: add hardware offload support net: flow_offload: rename tc_cls_flower_offload to flow_cls_offload net: flow_offload: add flow_block_cb_is_busy() and use it net: sched: remove tcf block API drivers: net: use flow block API net: sched: use flow block API net: flow_offload: add flow_block_cb_{priv, incref, decref}() net: flow_offload: add list handling functions net: flow_offload: add flow_block_cb_alloc() and flow_block_cb_free() net: flow_offload: rename TCF_BLOCK_BINDER_TYPE_* to FLOW_BLOCK_BINDER_TYPE_* net: flow_offload: rename TC_BLOCK_{UN}BIND to FLOW_BLOCK_{UN}BIND net: flow_offload: add flow_block_cb_setup_simple() net: hisilicon: Add an tx_desc to adapt HI13X1_GMAC net: hisilicon: Add an rx_desc to adapt HI13X1_GMAC ...
2019-06-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller1-0/+2
Alexei Starovoitov says: ==================== pull-request: bpf-next 2019-06-19 The following pull-request contains BPF updates for your *net-next* tree. The main changes are: 1) new SO_REUSEPORT_DETACH_BPF setsocktopt, from Martin. 2) BTF based map definition, from Andrii. 3) support bpf_map_lookup_elem for xskmap, from Jonathan. 4) bounded loops and scalar precision logic in the verifier, from Alexei. ==================== Signed-off-by: David S. Miller <[email protected]>
2019-06-17Merge tag 'v5.2-rc5' into locking/core, to pick up fixesIngo Molnar1-2/+1
Signed-off-by: Ingo Molnar <[email protected]>
2019-06-15bpf: net: Add SO_DETACH_REUSEPORT_BPFMartin KaFai Lau1-0/+2
There is SO_ATTACH_REUSEPORT_[CE]BPF but there is no DETACH. This patch adds SO_DETACH_REUSEPORT_BPF sockopt. The same sockopt can be used to undo both SO_ATTACH_REUSEPORT_[CE]BPF. reseport_detach_prog() is added and it is mostly a mirror of the existing reuseport_attach_prog(). The differences are, it does not call reuseport_alloc() and returns -ENOENT when there is no old prog. Cc: Craig Gallek <[email protected]> Signed-off-by: Martin KaFai Lau <[email protected]> Reviewed-by: Stanislav Fomichev <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-06-05treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 428Thomas Gleixner1-2/+1
Based on 1 normalized pattern(s): this file is released under the gplv2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 68 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Armijn Hemel <[email protected]> Reviewed-by: Allison Randal <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-06-03locking/atomic, alpha: Use s64 for atomic64Mark Rutland1-10/+10
As a step towards making the atomic64 API use consistent types treewide, let's have the alpha atomic64 implementation use s64 as the underlying type for atomic64_t, rather than long, matching the generated headers. As atomic64_read() depends on the generic defintion of atomic64_t, this still returns long. This will be converted in a subsequent patch. Otherwise, there should be no functional change as a result of this patch. Signed-off-by: Mark Rutland <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matt Turner <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-05-30treewide: Add SPDX license identifier - KbuildGreg Kroah-Hartman1-0/+1
Add SPDX license identifiers to all Make/Kconfig files which: - Have no license information of any form These files fall under the project license, GPL v2 only. The resulting SPDX license identifier is: GPL-2.0 Reported-by: Masahiro Yamada <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]> Reviewed-by: Kate Stewart <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2019-05-24treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 47Thomas Gleixner1-9/+1
Based on 1 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license as published by the free software foundation either version 2 or at your option any later version you should have received a copy of the gnu general public license for example usr src linux copying if not write to the free software foundation inc 675 mass ave cambridge ma 02139 usa extracted by the scancode license scanner the SPDX license identifier GPL-2.0-or-later has been chosen to replace the boilerplate/reference in 20 file(s). Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Allison Randal <[email protected]> Reviewed-by: Kate Stewart <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>