Age | Commit message (Collapse) | Author | Files | Lines |
|
The start position for an ins instruction is always encoded as an
immediate, so allowing registers to be used by the inline asm makes no
sense. It should never happen anyway since a bit index should always be
small enough to be treated as an immediate, but remove the nonsensical
"r" for sanity.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
Rather than #ifdef on CONFIG_CPU_* to determine whether the ins
instruction is supported we can simply check MIPS_ISA_REV to discover
whether we're targeting MIPSr2 or higher. Do so in order to clean up the
code.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
set_bit() can set bits 0-15 using an ori instruction, rather than
loading the value -1 into a register & then using an ins instruction.
That is, rather than the following:
li t0, -1
ll t1, 0(t2)
ins t1, t0, 4, 1
sc t1, 0(t2)
We can have the simpler:
ll t1, 0(t2)
ori t1, t1, 0x10
sc t1, 0(t2)
The or path already allows immediates to be used, so simply restricting
the ins path to bits that don't fit in immediates is sufficient to take
advantage of this.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
Reorder conditions in our various bitops functions that check
kernel_uses_llsc such that they handle the !kernel_uses_llsc case first.
This allows us to avoid the need to duplicate the kernel_uses_llsc check
in all the other cases. For functions that don't involve barriers common
to the various implementations, we switch to returning from within each
if block making each case easier to read in isolation.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
Remove the remaining duplication between 32b & 64b in asm/atomic.h by
making use of an ATOMIC_OPS() macro to generate:
- atomic_read()/atomic64_read()
- atomic_set()/atomic64_set()
- atomic_cmpxchg()/atomic64_cmpxchg()
- atomic_xchg()/atomic64_xchg()
This is consistent with the way all other functions in asm/atomic.h are
generated, and ensures consistency between the 32b & 64b functions.
Of note is that this results in the above now being static inline
functions rather than macros.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
Unify the definitions of atomic_sub_if_positive() &
atomic64_sub_if_positive() using a macro like we do for most other
atomic functions. This allows us to share the implementation ensuring
consistency between the two. Notably this provides the appropriate
loongson3_war barriers in the atomic64_sub_if_positive() case which were
previously missing.
The code is rearranged a little to handle the !kernel_uses_llsc case
first in order to de-indent the LL/SC case & allow us not to go over 80
characters per line.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
Use smp_mb__before_atomic() & smp_mb__after_atomic() in
atomic_sub_if_positive() rather than the equivalent
smp_mb__before_llsc() & smp_llsc_mb(). The former are more standard &
this preps us for avoiding redundant duplicate barriers on Loongson3 in
a later patch.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
Generate the sync instructions required to workaround Loongson3 LL/SC
errata within inline asm blocks, which feels a little safer than doing
it from C where strictly speaking the compiler would be well within its
rights to insert a memory access between the separate asm statements we
previously had, containing sync & ll instructions respectively.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
Cut down on duplication by generalizing the ATOMIC_OP(),
ATOMIC_OP_RETURN() & ATOMIC_FETCH_OP() macros to work for both 32b &
64b atomics, and removing the ATOMIC64_ variants. This ensures
consistency between our atomic_* & atomic64_* functions.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
Handle the !kernel_uses_llsc path first in our ATOMIC_OP(),
ATOMIC_OP_RETURN() & ATOMIC_FETCH_OP() macros & return from within the
block. This allows us to de-indent the kernel_uses_llsc path by one
level which will be useful when making further changes.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
We define macros in asm/atomic.h which end each line with space
characters before a backslash to continue on the next line. Remove the
space characters leaving tabs as the whitespace used for conformity with
coding convention.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
Use the new __SYNC() infrastructure to implement sync_ginv(), for
consistency with much of the rest of the asm/barrier.h.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
Implement __sync() using the new __SYNC() infrastructure, which will
take care of not emitting an instruction for old R3k CPUs that don't
support it. The only behavioral difference is that __sync() will now
provide a compiler barrier on these old CPUs, but that seems like
reasonable behavior anyway.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
The definition of fast_mb() is the same in both the Octeon & non-Octeon
cases, so remove the duplication & define it only once.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
We #ifdef on Cavium Octeon CPUs, but emit the same sync instruction in
both cases. Remove the #ifdef & simply expand to the __sync() macro.
Whilst here indent the strong ordering case definitions to match the
indentation of the weak ordering ones, helping readability.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
Simplify our definitions of rmb() & wmb() using the new __SYNC()
infrastructure.
The fast_rmb() & fast_wmb() macros are removed, since they only provided
a level of indirection that made the code less readable & weren't
directly used anywhere in the kernel tree.
The Octeon #ifdef'ery is removed, since the "syncw" instruction
previously used is merely an alias for "sync 4" which __SYNC() will emit
for the wmb sync type when the kernel is configured for an Octeon CPU.
Similarly __SYNC() will emit nothing for the rmb sync type in Octeon
configurations.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
Introduce an asm/sync.h header which provides infrastructure that can be
used to generate sync instructions of various types, and for various
reasons. For example if we need a sync instruction that provides a full
completion barrier but only on systems which have weak memory ordering,
we can generate the appropriate assembly code using:
__SYNC(full, weak_ordering)
When the kernel is configured to run on systems with weak memory
ordering (ie. CONFIG_WEAK_ORDERING is selected) we'll emit a sync
instruction. When the kernel is configured to run on systems with strong
memory ordering (ie. CONFIG_WEAK_ORDERING is not selected) we'll emit
nothing. The caller doesn't need to know which happened - it simply says
what it needs & when, with no concern for checking the kernel
configuration.
There are some scenarios in which we may want to emit code only when we
*didn't* emit a sync instruction. For example, some Loongson3 CPUs
suffer from a bug that requires us to emit a sync instruction prior to
each ll instruction (enabled by CONFIG_CPU_LOONGSON3_WORKAROUNDS). In
cases where this bug workaround is enabled, it's wasteful to then have
more generic code emit another sync instruction to provide barriers we
need in general. A __SYNC_ELSE() macro allows for this, providing an
extra argument that contains code to be assembled only in cases where
the sync instruction was not emitted. For example if we have a scenario
in which we generally want to emit a release barrier but for affected
Loongson3 configurations upgrade that to a full completion barrier, we
can do that like so:
__SYNC_ELSE(full, loongson3_war, __SYNC(rl, always))
The assembly generated by these macros can be used either as inline
assembly or in assembly source files.
Differing types of sync as provided by MIPSr6 are defined, but currently
they all generate a full completion barrier except in kernels configured
for Cavium Octeon systems. There the wmb sync-type is used, and rmb
syncs are omitted, as has been the case since commit 6b07d38aaa52
("MIPS: Octeon: Use optimized memory barrier primitives."). Using
__SYNC() with the wmb or rmb types will abstract away the Octeon
specific behavior and allow us to later clean up asm/barrier.h code that
currently includes a plethora of #ifdef's.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
When targeting MIPSr6 or higher make use of a compact branch in LL/SC
loops, preventing the insertion of a delay slot nop that only serves to
waste space.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
We currently duplicate the definition of __scbeqz in asm/atomic.h &
asm/cmpxchg.h. Move it to asm/llsc.h & rename it to __SC_BEQZ to fit
better with the existing __SC macro provided there.
We include a tab in the string in order to avoid the need for users to
indent code any further to include whitespace of their own after the
instruction mnemonic.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: Huacai Chen <[email protected]>
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
This patch adds support for the GARDENA smart Gateway, which is based on
the MediaTek MT7688 SoC. It is equipped with 128 MiB of DDR and 8 MiB of
flash (SPI NOR) and additional 128MiB SPI NAND storage.
Signed-off-by: Stefan Roese <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Cc: Harvey Hunt <[email protected]>
Cc: John Crispin <[email protected]>
Cc: [email protected]
|
|
This patch adds the I2C controller description to the MT7628A dtsi file.
Signed-off-by: Stefan Roese <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Cc: Harvey Hunt <[email protected]>
Cc: John Crispin <[email protected]>
Cc: [email protected]
|
|
The r4k-bugs64 code will no longer be built for MIPSr6 kernel
configurations, so there's no need to perform checks for MIPSr6 within
the code. Drop those redundant checks.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
|
|
Only build the checks for R4k errata workarounds if we expect that the
kernel might actually run on a system with an R4k CPU - ie.
CONFIG_SYS_HAS_CPU_R4X00=y & we're targeting a pre-MIPSr1 ISA revision.
Rename cpu-bugs64.c to r4k-bugs64.c to indicate the fact that the code
is specific to R4k CPUs.
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
|
|
Node ids don't need to be contiguous in Linux, so the concept to
use compact node ids to make them contiguous isn't needed at all.
This patchset therefore removes it.
Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: James Hogan <[email protected]>
Cc: [email protected]
Cc: [email protected]
|
|
Most of the SN/SN0 header files are inherited from IRIX header files,
but not all of that stuff is useful for Linux. Remove not used parts.
Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: James Hogan <[email protected]>
Cc: [email protected]
Cc: [email protected]
|
|
Commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly") allows compiler to uninline functions marked as 'inline'.
In cace of cmpxchg this would cause to reference function
__cmpxchg_called_with_bad_pointer, which is a error case
for catching bugs and will not happen for correct code, if
__cmpxchg is inlined.
Signed-off-by: Thomas Bogendoerfer <[email protected]>
[[email protected]: s/__cmpxchd/__cmpxchg in subject]
Signed-off-by: Paul Burton <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: James Hogan <[email protected]>
Cc: [email protected]
Cc: [email protected]
|
|
The addr variable in prom_free_prom_memory() has been unused since
commit 0df1007677d5 ("MIPS: fw: Record prom memory"), leading to a
compiler warning:
arch/mips/fw/arc/memory.c:163:16:
warning: unused variable 'addr' [-Wunused-variable]
Fix this by removing the unused variable.
Signed-off-by: Paul Burton <[email protected]>
Fixes: 0df1007677d5 ("MIPS: fw: Record prom memory")
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
The Rio500 kernel driver has not been used by Rio500 owners since 2001
not long after the rio500 project added support for a user-space USB stack
through the very first versions of usbdevfs and then libusb.
Support for the kernel driver was removed from the upstream utilities
in 2008:
https://gitlab.freedesktop.org/hadess/rio500/commit/943f624ab721eb8281c287650fcc9e2026f6f5db
Cc: Cesar Miquel <[email protected]>
Signed-off-by: Bastien Nocera <[email protected]>
Cc: stable <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
|
|
The addr variable in prom_free_prom_memory() has been unused since
commit b3c948e2c00f ("MIPS: msp: Record prom memory"), causing a warning
& build failure due to -Werror. Remove the unused variable.
Signed-off-by: Paul Burton <[email protected]>
Fixes: b3c948e2c00f ("MIPS: msp: Record prom memory")
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
Commit b3c948e2c00f ("MIPS: msp: Record prom memory") introduced use of
a MAX_PROM_MEM value but didn't define it. A bounds check in
prom_meminit() suggests its value was supposed to be 5, so define it as
such & adjust the bounds check to use the macro rather than a magic
number.
Signed-off-by: Paul Burton <[email protected]>
Fixes: b3c948e2c00f ("MIPS: msp: Record prom memory")
Cc: Jiaxun Yang <[email protected]>
Cc: [email protected]
|
|
'exit' functions should be marked as __exit, not __init.
Fixes: 85cc028817ef ("mips: make loongsoon serial driver explicitly modular")
Signed-off-by: Christophe JAILLET <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
|
|
On some SGI machines (IP28 and IP30) a small region of memory is mirrored
to pyhsical address 0 for exception vectors while rest of the memory
is reachable at a higher physical address. ARC PROM marks this
region as reserved, but with commit a94e4f24ec83 ("MIPS: init: Drop
boot_mem_map") this chunk is used, when searching for start of ram,
which breaks at least IP28 and IP30 machines. To fix this
add_region_memory() checks for start address < PHYS_OFFSET and ignores
these chunks.
Fixes: a94e4f24ec83 ("MIPS: init: Drop boot_mem_map")
Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: James Hogan <[email protected]>
Cc: [email protected]
Cc: [email protected]
|
|
Fix calculation of the size for reserving memory between PHYS_OFFSET
and real memory start.
Fixes: a94e4f24ec83 ("MIPS: init: Drop boot_mem_map")
Signed-off-by: Thomas Bogendoerfer <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: James Hogan <[email protected]>
Cc: [email protected]
Cc: [email protected]
|
|
Versions of binutils prior to 2.25 are unable to link our VDSO due to an
unsupported R_MIPS_PC32 relocation generated by the ".word _start - ."
line of the inline asm in get_vdso_base(). As such, the intent is that
when building with binutils older than 2.25 we don't build code for
gettimeofday() & friends in the VDSO that rely upon get_vdso_base().
Commit 24640f233b46 ("mips: Add support for generic vDSO") converted us
to using generic VDSO infrastructure, and as part of that the
gettimeofday() functionality moved to a new vgettimeofday.c file. The
check for binutils < 2.25 wasn't updated to handle this new filename,
and so it continues trying to remove the old unused filename from the
build. The end result is that we try to include the gettimeofday() code
in builds that will fail to link.
Fix this by updating the binutils < 2.25 case to remove vgettimeofday.c
from obj-vdso-y, rather than gettimeofday.c.
Signed-off-by: Paul Burton <[email protected]>
Fixes: 24640f233b46 ("mips: Add support for generic vDSO")
Cc: Vincenzo Frascino <[email protected]>
Cc: [email protected]
|
|
arch/mips/vdso/gettimeofday.c has been unused since commit 24640f233b46
("mips: Add support for generic vDSO"). Remove the dead code.
Signed-off-by: Paul Burton <[email protected]>
Fixes: 24640f233b46 ("mips: Add support for generic vDSO")
Cc: Vincenzo Frascino <[email protected]>
Cc: [email protected]
|
|
Wire up the new clone3 syscall for MIPS, using save_static_function() to
generate a wrapper that saves registers $s0-$s7 prior to invoking the
generic sys_clone3 function just like we do for plain old clone.
Tested atop 64r6el_defconfig using o32, n32 & n64 builds of the simple
test program from:
https://lore.kernel.org/lkml/[email protected]/
Signed-off-by: Paul Burton <[email protected]>
Cc: Christian Brauner <[email protected]>
Acked-by: Christian Brauner <[email protected]>
Cc: [email protected]
|
|
Commit 171a9bae68c7 ("staging/octeon: Allow test build on !MIPS") moved
the inclusion of a bunch of headers by various files in the Octeon
ethernet driver into a common header, but in doing so it changed the
order in which those headers are included.
Prior to the referenced commit drivers/staging/octeon/ethernet.c
included asm/octeon/cvmx-pip.h before asm/octeon/cvmx-ipd.h, which makes
use of the CVMX_PIP_SFT_RST definition pulled in by the former. After
commit 171a9bae68c7 ("staging/octeon: Allow test build on !MIPS") we
pull in asm/octeon/cvmx-ipd.h first & builds fail with:
In file included from drivers/staging/octeon/octeon-ethernet.h:27,
from drivers/staging/octeon/ethernet.c:22:
arch/mips/include/asm/octeon/cvmx-ipd.h: In function 'cvmx_ipd_free_ptr':
arch/mips/include/asm/octeon/cvmx-ipd.h:330:27: error: storage size of
'pip_sft_rst' isn't known
union cvmx_pip_sft_rst pip_sft_rst;
^~~~~~~~~~~
arch/mips/include/asm/octeon/cvmx-ipd.h:331:36: error: 'CVMX_PIP_SFT_RST'
undeclared (first use in this function); did you mean 'CVMX_CIU_SOFT_RST'?
pip_sft_rst.u64 = cvmx_read_csr(CVMX_PIP_SFT_RST);
^~~~~~~~~~~~~~~~
CVMX_CIU_SOFT_RST
arch/mips/include/asm/octeon/cvmx-ipd.h:331:36: note: each undeclared
identifier is reported only once for each function it appears in
arch/mips/include/asm/octeon/cvmx-ipd.h:330:27: warning: unused variable
'pip_sft_rst' [-Wunused-variable]
union cvmx_pip_sft_rst pip_sft_rst;
^~~~~~~~~~~
make[4]: *** [scripts/Makefile.build:266: drivers/staging/octeon/ethernet.o]
Error 1
make[3]: *** [scripts/Makefile.build:509: drivers/staging/octeon] Error 2
Fix this by having asm/octeon/cvmx-ipd.h include the
asm/octeon/cvmx-pip-defs.h header that it is reliant upon, rather than
requiring its users to pull in that header before it.
Signed-off-by: Paul Burton <[email protected]>
Fixes: 171a9bae68c7 ("staging/octeon: Allow test build on !MIPS")
Cc: David S. Miller <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Matthew Wilcox (Oracle) <[email protected]>
Cc: [email protected]
Cc: David S . Miller <[email protected]>
Cc: Matthew Wilcox <[email protected]>
|
|
Commit ac7c3e4ff401 ("compiler: enable CONFIG_OPTIMIZE_INLINING
forcibly") allows compiler to uninline functions marked as 'inline'.
Leading to section mismatch in this case.
Since we're using const variables to pass assembly flags, 'inline's
can't be dropped. So we simply mark them as __always_inline.
Signed-off-by: Jiaxun Yang <[email protected]>
Cc: [email protected]
[[email protected]:
- Annotate these functions with __init, even if it only serves to
inform human readers when the code can be used.
- Drop the __always_inline from check_daddi() & check_daddiu() which
don't use arguments as immediates in inline asm.
- Rewrap the commit message.]
Signed-off-by: Paul Burton <[email protected]>
|
|
It is two registers each of 4 byte.
Signed-off-by: Oleksij Rempel <[email protected]>
Signed-off-by: Paul Burton <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Pengutronix Kernel Team <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: James Hogan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
|
|
From commit a94e4f24ec83 ("MIPS: init: Drop boot_mem_map") onwards,
add_memory_region() is handled by memblock_add()/memblock_reserve()
directly and all bootmem API should be converted to memblock API.
Otherwise it will lead to boot failure, especially in the NUMA case
because add_memory_region lose the node_id information.
Fixes: a94e4f24ec836c8984f83959 ("MIPS: init: Drop boot_mem_map")
Signed-off-by: Huacai Chen <[email protected]>
Signed-off-by: Jiaxun Yang <[email protected]>
[[email protected]:
- Invert node_id check to de-indent the switch statement & avoid lines
over 80 characters.
- Fixup commit reference in commit message.]
Signed-off-by: Paul Burton <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: James Hogan <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: Fuxin Zhang <[email protected]>
Cc: Zhangjin Wu <[email protected]>
Cc: Huacai Chen <[email protected]>
|
|
The naming of pgtable_page_{ctor,dtor}() seems to have confused a few
people, and until recently arm64 used these erroneously/pointlessly for
other levels of page table.
To make it incredibly clear that these only apply to the PTE level, and to
align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them
to pgtable_pte_page_{ctor,dtor}().
These changes were generated with the following shell script:
----
git grep -lw 'pgtable_page_.tor' | while read FILE; do
sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE;
sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE;
done
----
... with the documentation re-flowed to remain under 80 columns, and
whitespace fixed up in macros to keep backslashes aligned.
There should be no functional change as a result of this patch.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mark Rutland <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Acked-by: Geert Uytterhoeven <[email protected]> [m68k]
Cc: Anshuman Khandual <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
When a process expects no accesses to a certain memory range for a long
time, it could hint kernel that the pages can be reclaimed instantly but
data should be preserved for future use. This could reduce workingset
eviction so it ends up increasing performance.
This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall.
MADV_PAGEOUT can be used by a process to mark a memory range as not
expected to be used for a long time so that kernel reclaims *any LRU*
pages instantly. The hint can help kernel in deciding which pages to
evict proactively.
A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit
intentionally because it's automatically bounded by PMD size. If PMD
size(e.g., 256) makes some trouble, we could fix it later by limit it to
SWAP_CLUSTER_MAX[1].
- man-page material
MADV_PAGEOUT (since Linux x.x)
Do not expect access in the near future so pages in the specified
regions could be reclaimed instantly regardless of memory pressure.
Thus, access in the range after successful operation could cause
major page fault but never lose the up-to-date contents unlike
MADV_DONTNEED. Pages belonging to a shared mapping are only processed
if a write access is allowed for the calling process.
MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or
VM_PFNMAP pages.
[1] https://lore.kernel.org/lkml/[email protected]/
[[email protected]: clear PG_active on MADV_PAGEOUT]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: resolve conflicts with hmm.git]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Minchan Kim <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Daniel Colascione <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Oleksandr Natalenko <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Sonny Rao <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Tim Murray <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7.
- Background
The Android terminology used for forking a new process and starting an app
from scratch is a cold start, while resuming an existing app is a hot
start. While we continually try to improve the performance of cold
starts, hot starts will always be significantly less power hungry as well
as faster so we are trying to make hot start more likely than cold start.
To increase hot start, Android userspace manages the order that apps
should be killed in a process called ActivityManagerService.
ActivityManagerService tracks every Android app or service that the user
could be interacting with at any time and translates that into a ranked
list for lmkd(low memory killer daemon). They are likely to be killed by
lmkd if the system has to reclaim memory. In that sense they are similar
to entries in any other cache. Those apps are kept alive for
opportunistic performance improvements but those performance improvements
will vary based on the memory requirements of individual workloads.
- Problem
Naturally, cached apps were dominant consumers of memory on the system.
However, they were not significant consumers of swap even though they are
good candidate for swap. Under investigation, swapping out only begins
once the low zone watermark is hit and kswapd wakes up, but the overall
allocation rate in the system might trip lmkd thresholds and cause a
cached process to be killed(we measured performance swapping out vs.
zapping the memory by killing a process. Unsurprisingly, zapping is 10x
times faster even though we use zram which is much faster than real
storage) so kill from lmkd will often satisfy the high zone watermark,
resulting in very few pages actually being moved to swap.
- Approach
The approach we chose was to use a new interface to allow userspace to
proactively reclaim entire processes by leveraging platform information.
This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages
that are known to be cold from userspace and to avoid races with lmkd by
reclaiming apps as soon as they entered the cached state. Additionally,
it could provide many chances for platform to use much information to
optimize memory efficiency.
To achieve the goal, the patchset introduce two new options for madvise.
One is MADV_COLD which will deactivate activated pages and the other is
MADV_PAGEOUT which will reclaim private pages instantly. These new
options complement MADV_DONTNEED and MADV_FREE by adding non-destructive
ways to gain some free memory space. MADV_PAGEOUT is similar to
MADV_DONTNEED in a way that it hints the kernel that memory region is not
currently needed and should be reclaimed immediately; MADV_COLD is similar
to MADV_FREE in a way that it hints the kernel that memory region is not
currently needed and should be reclaimed when memory pressure rises.
This patch (of 5):
When a process expects no accesses to a certain memory range, it could
give a hint to kernel that the pages can be reclaimed when memory pressure
happens but data should be preserved for future use. This could reduce
workingset eviction so it ends up increasing performance.
This patch introduces the new MADV_COLD hint to madvise(2) syscall.
MADV_COLD can be used by a process to mark a memory range as not expected
to be used in the near future. The hint can help kernel in deciding which
pages to evict early during memory pressure.
It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves
active file page -> inactive file LRU
active anon page -> inacdtive anon LRU
Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file
LRU's head because MADV_COLD is a little bit different symantic.
MADV_FREE means it's okay to discard when the memory pressure because the
content of the page is *garbage* so freeing such pages is almost zero
overhead since we don't need to swap out and access afterward causes just
minor fault. Thus, it would make sense to put those freeable pages in
inactive file LRU to compete other used-once pages. It makes sense for
implmentaion point of view, too because it's not swapbacked memory any
longer until it would be re-dirtied. Even, it could give a bonus to make
them be reclaimed on swapless system. However, MADV_COLD doesn't mean
garbage so reclaiming them requires swap-out/in in the end so it's bigger
cost. Since we have designed VM LRU aging based on cost-model, anonymous
cold pages would be better to position inactive anon's LRU list, not file
LRU. Furthermore, it would help to avoid unnecessary scanning if system
doesn't have a swap device. Let's start simpler way without adding
complexity at this moment. However, keep in mind, too that it's a caveat
that workloads with a lot of pages cache are likely to ignore MADV_COLD on
anonymous memory because we rarely age anonymous LRU lists.
* man-page material
MADV_COLD (since Linux x.x)
Pages in the specified regions will be treated as less-recently-accessed
compared to pages in the system with similar access frequencies. In
contrast to MADV_FREE, the contents of the region are preserved regardless
of subsequent writes to pages.
MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP
pages.
[[email protected]: resolve conflicts with hmm.git]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Minchan Kim <[email protected]>
Reported-by: kbuild test robot <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Acked-by: Johannes Weiner <[email protected]>
Cc: James E.J. Bottomley <[email protected]>
Cc: Richard Henderson <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Chris Zankel <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Daniel Colascione <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Hillf Danton <[email protected]>
Cc: Joel Fernandes (Google) <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Oleksandr Natalenko <[email protected]>
Cc: Shakeel Butt <[email protected]>
Cc: Sonny Rao <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Tim Murray <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Merge updates from Andrew Morton:
- a few hot fixes
- ocfs2 updates
- almost all of -mm (slab-generic, slab, slub, kmemleak, kasan,
cleanups, debug, pagecache, memcg, gup, pagemap, memory-hotplug,
sparsemem, vmalloc, initialization, z3fold, compaction, mempolicy,
oom-kill, hugetlb, migration, thp, mmap, madvise, shmem, zswap,
zsmalloc)
* emailed patches from Andrew Morton <[email protected]>: (132 commits)
mm/zsmalloc.c: fix a -Wunused-function warning
zswap: do not map same object twice
zswap: use movable memory if zpool support allocate movable memory
zpool: add malloc_support_movable to zpool_driver
shmem: fix obsolete comment in shmem_getpage_gfp()
mm/madvise: reduce code duplication in error handling paths
mm: mmap: increase sockets maximum memory size pgoff for 32bits
mm/mmap.c: refine find_vma_prev() with rb_last()
riscv: make mmap allocation top-down by default
mips: use generic mmap top-down layout and brk randomization
mips: replace arch specific way to determine 32bit task with generic version
mips: adjust brk randomization offset to fit generic version
mips: use STACK_TOP when computing mmap base address
mips: properly account for stack randomization and stack guard gap
arm: use generic mmap top-down layout and brk randomization
arm: use STACK_TOP when computing mmap base address
arm: properly account for stack randomization and stack guard gap
arm64, mm: make randomization selected by generic topdown mmap layout
arm64, mm: move generic mmap layout functions to mm
arm64: consider stack randomization for mmap base only when necessary
...
|
|
mips uses a top-down layout by default that exactly fits the generic
functions, so get rid of arch specific code and use the generic version by
selecting ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT.
As ARCH_WANT_DEFAULT_TOPDOWN_MMAP_LAYOUT selects ARCH_HAS_ELF_RANDOMIZE,
use the generic version of arch_randomize_brk since it also fits. Note
that this commit also removes the possibility for mips to have elf
randomization and no MMU: without MMU, the security added by randomization
is worth nothing.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Paul Burton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Mips uses TASK_IS_32BIT_ADDR to determine if a task is 32bit, but this
define is mips specific and other arches do not have it: instead, use
!IS_ENABLED(CONFIG_64BIT) || is_compat_task() condition.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Paul Burton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This commit simply bumps up to 32MB and 1GB the random offset of brk,
compared to 8MB and 256MB, for 32bit and 64bit respectively.
Link: http://lkml.kernel.org/r/[email protected]
Suggested-by: Kees Cook <[email protected]>
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Paul Burton <[email protected]>
Reviewed-by: Kees Cook <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
mmap base address must be computed wrt stack top address, using TASK_SIZE
is wrong since STACK_TOP and TASK_SIZE are not equivalent.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Paul Burton <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
This commit takes care of stack randomization and stack guard gap when
computing mmap base address and checks if the task asked for
randomization. This fixes the problem uncovered and not fixed for arm
here: https://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Alexandre Ghiti <[email protected]>
Acked-by: Kees Cook <[email protected]>
Acked-by: Paul Burton <[email protected]>
Reviewed-by: Luis Chamberlain <[email protected]>
Cc: Albert Ou <[email protected]>
Cc: Alexander Viro <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Cc: James Hogan <[email protected]>
Cc: Palmer Dabbelt <[email protected]>
Cc: Ralf Baechle <[email protected]>
Cc: Russell King <[email protected]>
Cc: Will Deacon <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem
cache for page table allocations on several architectures that do not use
PAGE_SIZE tables for one or more levels of the page table hierarchy.
Most architectures do not implement these functions and use __weak default
NOP implementation of pgd_cache_init(). Since there is no such default
for pgtable_cache_init(), its empty stub is duplicated among most
architectures.
Rename the definitions of pgd_cache_init() to pgtable_cache_init() and
drop empty stubs of pgtable_cache_init().
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Mike Rapoport <[email protected]>
Acked-by: Will Deacon <[email protected]> [arm64]
Acked-by: Thomas Gleixner <[email protected]> [x86]
Cc: Catalin Marinas <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|