Age | Commit message (Collapse) | Author | Files | Lines |
|
There are two generations of tlb operation instruction for C-SKY.
First generation is use mcr register and it need software do more
things, second generation is use specific instructions, eg:
tlbi.va, tlbi.vas, tlbi.alls
We implemented the following functions:
- flush_tlb_range (a range of entries)
- flush_tlb_page (one entry)
Above functions use asid from vma->mm to invalid tlb entries and
we could use tlbi.vas instruction for newest generation csky cpu.
- flush_tlb_kernel_range
- flush_tlb_one
Above functions don't care asid and it invalid the tlb entries only
with vpn and we could use tlbi.vaas instruction for newest generat-
ion csky cpu.
Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>
|
|
Use linux generic asid/vmid algorithm to implement csky
switch_mm function. The algorithm is from arm and it could
work with SMP system. It'll help reduce tlb flush for
switch_mm in task/vm switch.
Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>
|
|
This patch only contains asid help code from arm for next patch to
use.
The asid allocator use five level check to reduce the cost of
switch_mm.
1. Check if the asid version is the same (it's general)
2. Check reserved_asid which is set in rollover flush_context()
and key point is to keep the same bit position with the current
asid version instead of input version.
3. Check if the position of bitmap is free then it could be set &
used directly.
4. find_next_zero_bit() (a little performance cost)
5. flush_context (this is the worst cost with increase current asid
version)
Check is level by level and cost is also higher with the next level.
The reserved_asid and bitmap mechanism prevent unnecessary
find_next_zero_bit().
The atomic 64 bit asid is also suitable for 32-bit system and it
won't cost a lot in 1th 2th 3th level check.
The operation of set/clear mm_cpumask was removed in arm64 compared to
arm32. It seems no side effect on current arm64 system, but from
software meaning it's wrong. Although csky also needn't it, we add it
back for csky.
The asid_per_ctxt is no use for csky and it reserves the lowest bits for
other use, maybe: trust zone ? Ok, just keep it in csky copy.
Seems it also could be used by other archs and it's worth to move asid
code to generic in future.
Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>
Cc: Julien Grall <[email protected]>
|
|
Current C-SKY ASID mechanism is from mips and it doesn't work well
with multi-cores. ASID per core mechanism is not suitable for C-SKY
SMP tlb maintain operations, eg: tlbi.vas need share the same asid
in all processors and it'll invalid the tlb entry in all cores with
the same asid.
This patch is prepare for new ASID mechanism.
Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>
|
|
This patch adds the documentation to describe that how to add pmu node in
dts.
Signed-off-by: Mao Han <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
Cc: Rob Herring <[email protected]>
|
|
Add trigger type setting for csky,mpintc. The driver also could
support #interrupt-cells <1> and it wouldn't invalidate existing
DTs. Here we only show the complete format.
Signed-off-by: Guo Ren <[email protected]>
Reviewed-by: Rob Herring <[email protected]>
Cc: Marc Zyngier <[email protected]>
|
|
CK810 pmu only support event with index 0-8 and 0xd; CK860 only
support event 1~4, 0xa~0x1b. So do not register unsupport event
to hardware cache event, which may leader to unknown behavior.
Signed-off-by: Mao Han <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
|
|
csky_pmu_event_init is called several times during the perf record
initialzation. After configure the event counter in either kernel
space or user space, csky_pmu_event_init is called twice with no
attr specified. Configuration will be overwritten with sampling in
both kernel space and user space. --all-kernel/--all-user is
useless without this patch applied.
Signed-off-by: Mao Han <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
|
|
This patch add interrupt request and handler for csky pmu.
perf can record on hardware event with this patch applied.
Signed-off-by: Mao Han <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
|
|
The csky pmu counter may have different io width. When the counter is
smaller then 64 bits and counter value is smaller than the old value, it
will result to a extremely large delta value. So the sampled value should
be extend to 64 bits to avoid this, the extension bits base on the
count-width property from dts.
Signed-off-by: Mao Han <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
|
|
This patch change the csky pmu initialization from arch init to
device init. The pmu can be configued with information from
device tree(pmu device name, irq number and etc.).
Signed-off-by: Mao Han <[email protected]>
Signed-off-by: Guo Ren <[email protected]>
|
|
These traps couldn't be hanppen in kernel and we must panic there not
send a signal to userspace.
Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>
|
|
Let arch help to select interrupt controller's and timer's drivers
instead of people using menuconfig to select. This help the mini system
boot up.
Signed-off-by: Guo Ren <[email protected]>
Cc: Arnd Bergmann <[email protected]>
|
|
Neal reported incorrect use of ns_capable() from bpf hook.
bpf_setsockopt(...TCP_CONGESTION...)
-> tcp_set_congestion_control()
-> ns_capable(sock_net(sk)->user_ns, CAP_NET_ADMIN)
-> ns_capable_common()
-> current_cred()
-> rcu_dereference_protected(current->cred, 1)
Accessing 'current' in bpf context makes no sense, since packets
are processed from softirq context.
As Neal stated : The capability check in tcp_set_congestion_control()
was written assuming a system call context, and then was reused from
a BPF call site.
The fix is to add a new parameter to tcp_set_congestion_control(),
so that the ns_capable() call is only performed under the right
context.
Fixes: 91b5b21c7c16 ("bpf: Add support for changing congestion control")
Signed-off-by: Eric Dumazet <[email protected]>
Cc: Lawrence Brakmo <[email protected]>
Reported-by: Neal Cardwell <[email protected]>
Acked-by: Neal Cardwell <[email protected]>
Acked-by: Lawrence Brakmo <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In case of error, the function of_get_mac_address() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check should
be replaced with IS_ERR().
Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver")
Signed-off-by: Wei Yongjun <[email protected]>
Reviewed-by: Oleksij Rempel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
Fix to return error code -ENOMEM from the dmam_alloc_coherent() error
handling case instead of 0, as done elsewhere in this function.
Fixes: d51b6ce441d3 ("net: ethernet: add ag71xx driver")
Signed-off-by: Wei Yongjun <[email protected]>
Reviewed-by: Oleksij Rempel <[email protected]>
Signed-off-by: David S. Miller <[email protected]>
|
|
In the sysctl code the proc_dointvec_minmax() function is often used to
validate the user supplied value between an allowed range. This
function uses the extra1 and extra2 members from struct ctl_table as
minimum and maximum allowed value.
On sysctl handler declaration, in every source file there are some
readonly variables containing just an integer which address is assigned
to the extra1 and extra2 members, so the sysctl range is enforced.
The special values 0, 1 and INT_MAX are very often used as range
boundary, leading duplication of variables like zero=0, one=1,
int_max=INT_MAX in different source files:
$ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
248
Add a const int array containing the most commonly used values, some
macros to refer more easily to the correct array member, and use them
instead of creating a local one for every object file.
This is the bloat-o-meter output comparing the old and new binary
compiled with the default Fedora config:
# scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
Data old new delta
sysctl_vals - 12 +12
__kstrtab_sysctl_vals - 12 +12
max 14 10 -4
int_max 16 - -16
one 68 - -68
zero 128 28 -100
Total: Before=20583249, After=20583085, chg -0.00%
[[email protected]: tipc: remove two unused variables]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: fix net/ipv6/sysctl_net_ipv6.c]
[[email protected]: proc/sysctl: make firmware loader table conditional]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: fix fs/eventpoll.c]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Matteo Croce <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
Acked-by: Kees Cook <[email protected]>
Reviewed-by: Aaron Tomlin <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
migrate_page_move_mapping() doesn't use the mode argument. Remove it
and update callers accordingly.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Keith Busch <[email protected]>
Reviewed-by: Zi Yan <[email protected]>
Cc: Mel Gorman <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
David points out that there is a mixture of 'int' and 'unsigned long'
usage for section number data types. Update the memory hotplug path to
use 'unsigned long' consistently for section numbers.
[[email protected]: fix printk format]
Link: http://lkml.kernel.org/r/156107543656.1329419.11505835211949439815.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reported-by: David Hildenbrand <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Now that the mm core supports section-unaligned hotplug of ZONE_DEVICE
memory, we no longer need to add padding at pfn/dax device creation
time. The kernel will still honor padding established by older kernels.
Link: http://lkml.kernel.org/r/156092356588.979959.6793371748950931916.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Jeff Moyer <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
At namespace creation time there is the potential for the "expected to
be zero" fields of a 'pfn' info-block to be filled with indeterminate
data. While the kernel buffer is zeroed on allocation it is immediately
overwritten by nd_pfn_validate() filling it with the current contents of
the on-media info-block location. For fields like, 'flags' and the
'padding' it potentially means that future implementations can not rely on
those fields being zero.
In preparation to stop using the 'start_pad' and 'end_trunc' fields for
section alignment, arrange for fields that are not explicitly
initialized to be guaranteed zero. Bump the minor version to indicate
it is safe to assume the 'padding' and 'flags' are zero. Otherwise,
this corruption is expected to benign since all other critical fields
are explicitly initialized.
Note The cc: stable is about spreading this new policy to as many
kernels as possible not fixing an issue in those kernels. It is not
until the change titled "libnvdimm/pfn: Stop padding pmem namespaces to
section alignment" where this improper initialization becomes a problem.
So if someone decides to backport "libnvdimm/pfn: Stop padding pmem
namespaces to section alignment" (which is not tagged for stable), make
sure this pre-requisite is flagged.
Link: http://lkml.kernel.org/r/156092356065.979959.6681003754765958296.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 32ab0a3f5170 ("libnvdimm, pmem: 'struct page' for pmem")
Signed-off-by: Dan Williams <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Teach devm_memremap_pages() about the new sub-section capabilities of
arch_{add,remove}_memory(). Effectively, just replace all usage of
align_start, align_end, and align_size with res->start, res->end, and
resource_size(res). The existing sanity check will still make sure that
the two separate remap attempts do not collide within a sub-section (2MB
on x86).
Link: http://lkml.kernel.org/r/156092355542.979959.10060071713397030576.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: Michal Hocko <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Explain the general mechanisms of 'ZONE_DEVICE' pages and list the users
of 'devm_memremap_pages()'.
[[email protected]: update ZONE_DEVICE memory model documentation]
Link: http://lkml.kernel.org/r/156109575458.1409767.1885676287099277666.stgit@dwillia2-desk3.amr.corp.intel.com
Link: http://lkml.kernel.org/r/156092354985.979959.15763234410543451710.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Mike Rapoport <[email protected]>
Reviewed-by: Mike Rapoport <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: Jonathan Corbet <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The libnvdimm sub-system has suffered a series of hacks and broken
workarounds for the memory-hotplug implementation's awkward
section-aligned (128MB) granularity.
For example the following backtrace is emitted when attempting
arch_add_memory() with physical address ranges that intersect 'System
RAM' (RAM) with 'Persistent Memory' (PMEM) within a given section:
# cat /proc/iomem | grep -A1 -B1 Persistent\ Memory
100000000-1ffffffff : System RAM
200000000-303ffffff : Persistent Memory (legacy)
304000000-43fffffff : System RAM
440000000-23ffffffff : Persistent Memory
2400000000-43bfffffff : Persistent Memory
2400000000-43bfffffff : namespace2.0
WARNING: CPU: 38 PID: 928 at arch/x86/mm/init_64.c:850 add_pages+0x5c/0x60
[..]
RIP: 0010:add_pages+0x5c/0x60
[..]
Call Trace:
devm_memremap_pages+0x460/0x6e0
pmem_attach_disk+0x29e/0x680 [nd_pmem]
? nd_dax_probe+0xfc/0x120 [libnvdimm]
nvdimm_bus_probe+0x66/0x160 [libnvdimm]
It was discovered that the problem goes beyond RAM vs PMEM collisions as
some platform produce PMEM vs PMEM collisions within a given section.
The libnvdimm workaround for that case revealed that the libnvdimm
section-alignment-padding implementation has been broken for a long
while.
A fix for that long-standing breakage introduces as many problems as it
solves as it would require a backward-incompatible change to the
namespace metadata interpretation. Instead of that dubious route [1],
address the root problem in the memory-hotplug implementation.
Note that EEXIST is no longer treated as success as that is how
sparse_add_section() reports subsection collisions, it was also obviated
by recent changes to perform the request_region() for 'System RAM'
before arch_add_memory() in the add_memory() sequence.
[1] https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
[[email protected]: fix deactivate_section for early sections]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/156092354368.979959.6232443923440952359.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Oscar Salvador <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Prepare the memory hot-{add,remove} paths for handling sub-section
ranges by plumbing the starting page frame and number of pages being
handled through arch_{add,remove}_memory() to
sparse_{add,remove}_one_section().
This is simply plumbing, small cleanups, and some identifier renames.
No intended functional changes.
Link: http://lkml.kernel.org/r/156092353780.979959.9713046515562743194.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Given there are no more usages of is_dev_zone() outside of 'ifdef
CONFIG_ZONE_DEVICE' protection, kill off the compilation helper.
Link: http://lkml.kernel.org/r/156092353211.979959.1489004866360828964.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Acked-by: David Hildenbrand <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: Michal Hocko <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The zone type check was a leftover from the cleanup that plumbed altmap
through the memory hotplug path, i.e. commit da024512a1fa "mm: pass the
vmem_altmap to arch_remove_memory and __remove_pages".
Link: http://lkml.kernel.org/r/156092352642.979959.6664333788149363039.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: Michal Hocko <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Allow sub-section sized ranges to be added to the memmap.
populate_section_memmap() takes an explict pfn range rather than
assuming a full section, and those parameters are plumbed all the way
through to vmmemap_populate(). There should be no sub-section usage in
current deployments. New warnings are added to clarify which memmap
allocation paths are sub-section capable.
Link: http://lkml.kernel.org/r/156092352058.979959.6551283472062305149.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Sub-section hotplug support reduces the unit of operation of hotplug
from section-sized-units (PAGES_PER_SECTION) to sub-section-sized units
(PAGES_PER_SUBSECTION). Teach shrink_{zone,pgdat}_span() to consider
PAGES_PER_SUBSECTION boundaries as the points where pfn_valid(), not
valid_section(), can toggle.
[[email protected]: fix shrink_{zone,node}_span]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/156092351496.979959.12703722803097017492.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Signed-off-by: Oscar Salvador <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Prepare for hot{plug,remove} of sub-ranges of a section by tracking a
sub-section active bitmask, each bit representing a PMD_SIZE span of the
architecture's memory hotplug section size.
The implications of a partially populated section is that pfn_valid()
needs to go beyond a valid_section() check and either determine that the
section is an "early section", or read the sub-section active ranges
from the bitmask. The expectation is that the bitmask (subsection_map)
fits in the same cacheline as the valid_section() / early_section()
data, so the incremental performance overhead to pfn_valid() should be
negligible.
The rationale for using early_section() to short-ciruit the
subsection_map check is that there are legacy code paths that use
pfn_valid() at section granularity before validating the pfn against
pgdat data. So, the early_section() check allows those traditional
assumptions to persist while also permitting subsection_map to tell the
truth for purposes of populating the unused portions of early sections
with PMEM and other ZONE_DEVICE mappings.
Link: http://lkml.kernel.org/r/156092350874.979959.18185938451405518285.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Qian Cai <[email protected]>
Tested-by: Jane Chu <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In preparation for sub-section hotplug, track whether a given section
was created during early memory initialization, or later via memory
hotplug. This distinction is needed to maintain the coarse expectation
that pfn_valid() returns true for any pfn within a given section even if
that section has pages that are reserved from the page allocator.
For example one of the of goals of subsection hotplug is to support
cases where the system physical memory layout collides System RAM and
PMEM within a section. Several pfn_valid() users expect to just check
if a section is valid, but they are not careful to check if the given
pfn is within a "System RAM" boundary and instead expect pgdat
information to further validate the pfn.
Rather than unwind those paths to make their pfn_valid() queries more
precise a follow on patch uses the SECTION_IS_EARLY flag to maintain the
traditional expectation that pfn_valid() returns true for all early
sections.
Link: https://lore.kernel.org/lkml/[email protected]/
Link: http://lkml.kernel.org/r/156092350358.979959.5817209875548072819.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reported-by: Qian Cai <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Reviewed-by: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: Sub-section memory hotplug support", v10.
The memory hotplug section is an arbitrary / convenient unit for memory
hotplug. 'Section-size' units have bled into the user interface
('memblock' sysfs) and can not be changed without breaking existing
userspace. The section-size constraint, while mostly benign for typical
memory hotplug, has and continues to wreak havoc with 'device-memory'
use cases, persistent memory (pmem) in particular. Recall that pmem
uses devm_memremap_pages(), and subsequently arch_add_memory(), to
allocate a 'struct page' memmap for pmem. However, it does not use the
'bottom half' of memory hotplug, i.e. never marks pmem pages online and
never exposes the userspace memblock interface for pmem. This leaves an
opening to redress the section-size constraint.
To date, the libnvdimm subsystem has attempted to inject padding to
satisfy the internal constraints of arch_add_memory(). Beyond
complicating the code, leading to bugs [2], wasting memory, and limiting
configuration flexibility, the padding hack is broken when the platform
changes this physical memory alignment of pmem from one boot to the
next. Device failure (intermittent or permanent) and physical
reconfiguration are events that can cause the platform firmware to
change the physical placement of pmem on a subsequent boot, and device
failure is an everyday event in a data-center.
It turns out that sections are only a hard requirement of the
user-facing interface for memory hotplug and with a bit more
infrastructure sub-section arch_add_memory() support can be added for
kernel internal usages like devm_memremap_pages(). Here is an analysis
of the current design assumptions in the current code and how they are
addressed in the new implementation:
Current design assumptions:
- Sections that describe boot memory (early sections) are never
unplugged / removed.
- pfn_valid(), in the CONFIG_SPARSEMEM_VMEMMAP=y, case devolves to a
valid_section() check
- __add_pages() and helper routines assume all operations occur in
PAGES_PER_SECTION units.
- The memblock sysfs interface only comprehends full sections
New design assumptions:
- Sections are instrumented with a sub-section bitmask to track (on
x86) individual 2MB sub-divisions of a 128MB section.
- Partially populated early sections can be extended with additional
sub-sections, and those sub-sections can be removed with
arch_remove_memory(). With this in place we no longer lose usable
memory capacity to padding.
- pfn_valid() is updated to look deeper than valid_section() to also
check the active-sub-section mask. This indication is in the same
cacheline as the valid_section() so the performance impact is
expected to be negligible. So far the lkp robot has not reported any
regressions.
- Outside of the core vmemmap population routines which are replaced,
other helper routines like shrink_{zone,pgdat}_span() are updated to
handle the smaller granularity. Core memory hotplug routines that
deal with online memory are not touched.
- The existing memblock sysfs user api guarantees / assumptions are not
touched since this capability is limited to !online
!memblock-sysfs-accessible sections.
Meanwhile the issue reports continue to roll in from users that do not
understand when and how the 128MB constraint will bite them. The current
implementation relied on being able to support at least one misaligned
namespace, but that immediately falls over on any moderately complex
namespace creation attempt. Beyond the initial problem of 'System RAM'
colliding with pmem, and the unsolvable problem of physical alignment
changes, Linux is now being exposed to platforms that collide pmem ranges
with other pmem ranges by default [3]. In short, devm_memremap_pages()
has pushed the venerable section-size constraint past the breaking point,
and the simplicity of section-aligned arch_add_memory() is no longer
tenable.
These patches are exposed to the kbuild robot on a subsection-v10 branch
[4], and a preview of the unit test for this functionality is available
on the 'subsection-pending' branch of ndctl [5].
[2]: https://lore.kernel.org/r/155000671719.348031.2347363160141119237.stgit@dwillia2-desk3.amr.corp.intel.com
[3]: https://github.com/pmem/ndctl/issues/76
[4]: https://git.kernel.org/pub/scm/linux/kernel/git/djbw/nvdimm.git/log/?h=subsection-v10
[5]: https://github.com/pmem/ndctl/commit/7c59b4867e1c
This patch (of 13):
Towards enabling memory hotplug to track partial population of a section,
introduce 'struct mem_section_usage'.
A pointer to a 'struct mem_section_usage' instance replaces the existing
pointer to a 'pageblock_flags' bitmap. Effectively it adds one more
'unsigned long' beyond the 'pageblock_flags' (usemap) allocation to house
a new 'subsection_map' bitmap. The new bitmap enables the memory
hot{plug,remove} implementation to act on incremental sub-divisions of a
section.
SUBSECTION_SHIFT is defined as global constant instead of per-architecture
value like SECTION_SIZE_BITS in order to allow cross-arch compatibility of
subsection users. Specifically a common subsection size allows for the
possibility that persistent memory namespace configurations be made
compatible across architectures.
The primary motivation for this functionality is to support platforms that
mix "System RAM" and "Persistent Memory" within a single section, or
multiple PMEM ranges with different mapping lifetimes within a single
section. The section restriction for hotplug has caused an ongoing saga
of hacks and bugs for devm_memremap_pages() users.
Beyond the fixups to teach existing paths how to retrieve the 'usemap'
from a section, and updates to usemap allocation path, there are no
expected behavior changes.
Link: http://lkml.kernel.org/r/156092349845.979959.73333291612799019.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Tested-by: Aneesh Kumar K.V <[email protected]> [ppc64]
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jérôme Glisse <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Jane Chu <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Jonathan Corbet <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Jeff Moyer <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Jason Gunthorpe <[email protected]>
Cc: Christoph Hellwig <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
No longer needed, let's remove it. Also, drop the "hint" parameter
completely from "find_memory_block_by_id", as nobody needs it anymore.
[[email protected]: v3]
Link: http://lkml.kernel.org/r/[email protected]
[[email protected]: handle zero-length walks]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Tested-by: Qian Cai <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Mike Travis <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Let's move walk_memory_blocks() to the place where memory block logic
resides and simplify it. While at it, add a type for the callback
function.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Mike Travis <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Qian Cai <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
pfns
walk_memory_range() was once used to iterate over sections. Now, it
iterates over memory blocks. Rename the function, fixup the
documentation.
Also, pass start+size instead of PFNs, which is what most callers
already have at hand. (we'll rework link_mem_sections() most probably
soon)
Follow-up patches will rework, simplify, and move walk_memory_blocks()
to drivers/base/memory.c.
Note: walk_memory_blocks() only works correctly right now if the
start_pfn is aligned to a section start. This is the case right now,
but we'll generalize the function in a follow up patch so the semantics
match the documentation.
[[email protected]: remove unused variable]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Len Brown <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Rashmica Gupta <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Michael Neuling <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Juergen Gross <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Nick Desaulniers <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
It is only used internally.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Keith Busch <[email protected]>
Cc: Oscar Salvador <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Block ids are just shifted section numbers, so let's also use "unsigned
long" for them, too.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Patch series "mm: Further memory block device cleanups", v1.
Some further cleanups around memory block devices. Especially, clean up
and simplify walk_memory_range(). Including some other minor cleanups.
This patch (of 6):
We are using a mixture of "int" and "unsigned long". Let's make this
consistent by using "unsigned long" everywhere. We'll do the same with
memory block ids next.
While at it, turn the "unsigned long i" in removable_show() into an int
- sections_per_block is an int.
[[email protected]: s/unsigned long i/unsigned long nr/]
[[email protected]: v3]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Mel Gorman <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Johannes Weiner <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Stephen Rothwell <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Baoquan He <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
find_next_iomem_res() shows up to be a source for overhead in dax
benchmarks.
Improve performance by not considering children of the tree if the top
level does not match. Since the range of the parents should include the
range of the children such check is redundant.
Running sysbench on dax (pmem emulation, with write_cache disabled):
sysbench fileio --file-total-size=3G --file-test-mode=rndwr \
--file-io-mode=mmap --threads=4 --file-fsync-mode=fdatasync run
Provides the following results:
events (avg/stddev)
-------------------
5.2-rc3: 1247669.0000/16075.39
w/patch: 1286320.5000/16402.72 (+3%)
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Nadav Amit <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Ingo Molnar <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Since resources can be removed, locking should ensure that the resource
is not removed while accessing it. However, find_next_iomem_res() does
not hold the lock while copying the data of the resource.
Keep holding the lock while the data is copied. While at it, change the
return value to a more informative value. It is disregarded by the
callers.
[[email protected]: fix find_next_iomem_res() documentation]
Link: http://lkml.kernel.org/r/[email protected]
Fixes: ff3cc952d3f00 ("resource: Add remove_resource interface")
Signed-off-by: Nadav Amit <[email protected]>
Reviewed-by: Andrew Morton <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Toshi Kani <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Bjorn Helgaas <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Commit 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each
vma") introduced THPeligible bit for processes' smaps. But, when
checking the eligibility for shmem vma, __transparent_hugepage_enabled()
is called to override the result from shmem_huge_enabled(). It may
result in the anonymous vma's THP flag override shmem's. For example,
running a simple test which create THP for shmem, but with anonymous THP
disabled, when reading the process's smaps, it may show:
7fc92ec00000-7fc92f000000 rw-s 00000000 00:14 27764 /dev/shm/test
Size: 4096 kB
...
[snip]
...
ShmemPmdMapped: 4096 kB
...
[snip]
...
THPeligible: 0
And, /proc/meminfo does show THP allocated and PMD mapped too:
ShmemHugePages: 4096 kB
ShmemPmdMapped: 4096 kB
This doesn't make too much sense. The shmem objects should be treated
separately from anonymous THP. Calling shmem_huge_enabled() with
checking MMF_DISABLE_THP sounds good enough. And, we could skip stack
and dax vma check since we already checked if the vma is shmem already.
Also check if vma is suitable for THP by calling
transhuge_vma_suitable().
And minor fix to smaps output format and documentation.
Link: http://lkml.kernel.org/r/[email protected]
Fixes: 7635d9cbe832 ("mm, thp, proc: report THP eligibility for each vma")
Signed-off-by: Yang Shi <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
transhuge_vma_suitable() was only available for shmem THP, but anonymous
THP has the same check except pgoff check. And, it will be used for THP
eligible check in the later patch, so make it available for all kind of
THPs. This also helps reduce code duplication slightly.
Since anonymous THP doesn't have to check pgoff, so make pgoff check
shmem vma only.
And regroup some functions in include/linux/mm.h to solve compile issue
since transhuge_vma_suitable() needs call vma_is_anonymous() which was
defined after huge_mm.h is included.
[[email protected]: fix typo]
[[email protected]: v4]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Yang Shi <[email protected]>
Acked-by: Hugh Dickins <[email protected]>
Cc: Kirill A. Shutemov <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: David Rientjes <[email protected]>
Cc: Andrea Arcangeli <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
In case of NODE_NOT_IN_PAGE_FLAGS is set, we store section's node id in
section_to_node_table[]. While for hot-add memory, this is missed.
Without this information, page_to_nid() may not give the right node id.
BTW, current online_pages works because it leverages nid in
memory_block. But the granularity of node id should be mem_section
wide.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: Wei Yang <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Reviewed-by: David Hildenbrand <[email protected]>
Reviewed-by: Anshuman Khandual <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
The parameter is unused, so let's drop it. Memory removal paths should
never care about zones. This is the job of memory offlining and will
require more refactorings.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jun Yao <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We really don't want anything during memory hotunplug to fail. We
always pass a valid memory block device, that check can go. Avoid
allocating memory and eventually failing. As we are always called under
lock, we can use a static piece of memory. This avoids having to put
the structure onto the stack, having to guess about the stack size of
callers.
Patch inspired by a patch from Oscar Salvador.
In the future, there might be no need to iterate over nodes at all.
mem->nid should tell us exactly what to remove. Memory block devices
with mixed nodes (added during boot) should properly fenced off and
never removed.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Reviewed-by: Oscar Salvador <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jun Yao <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Let's factor out removing of memory block devices, which is only
necessary for memory added via add_memory() and friends that created
memory block devices. Remove the devices before calling
arch_remove_memory().
This finishes factoring out memory block device handling from
arch_add_memory() and arch_remove_memory().
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Dan Williams <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jun Yao <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
No longer needed, the callers of arch_add_memory() can handle this
manually.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Wei Yang <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Jun Yao <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
Only memory to be added to the buddy and to be onlined/offlined by user
space using /sys/devices/system/memory/... needs (and should have!)
memory block devices.
Factor out creation of memory block devices. Create all devices after
arch_add_memory() succeeded. We can later drop the want_memblock
parameter, because it is now effectively stale.
Only after memory block devices have been added, memory can be onlined
by user space. This implies, that memory is not visible to user space
at all before arch_add_memory() succeeded.
While at it
- use WARN_ON_ONCE instead of BUG_ON in moved unregister_memory()
- introduce find_memory_block_by_id() to search via block id
- Use find_memory_block_by_id() in init_memory_block() to catch
duplicates
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jun Yao <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We want to improve error handling while adding memory by allowing to use
arch_remove_memory() and __remove_pages() even if
CONFIG_MEMORY_HOTREMOVE is not set to e.g., implement something like:
arch_add_memory()
rc = do_something();
if (rc) {
arch_remove_memory();
}
We won't get rid of CONFIG_MEMORY_HOTREMOVE for now, as it will require
quite some dependencies for memory offlining.
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Reviewed-by: Pavel Tatashin <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Qian Cai <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jun Yao <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|
|
We'll rework hotplug_memory_register() shortly, so it no longer consumes
pass a section.
[[email protected]: fix a compilation warning]
Link: http://lkml.kernel.org/r/[email protected]
Link: http://lkml.kernel.org/r/[email protected]
Signed-off-by: David Hildenbrand <[email protected]>
Signed-off-by: Qian Cai <[email protected]>
Acked-by: Michal Hocko <[email protected]>
Cc: Greg Kroah-Hartman <[email protected]>
Cc: "Rafael J. Wysocki" <[email protected]>
Cc: Alex Deucher <[email protected]>
Cc: Andrew Banman <[email protected]>
Cc: Andy Lutomirski <[email protected]>
Cc: Anshuman Khandual <[email protected]>
Cc: Ard Biesheuvel <[email protected]>
Cc: Arun KS <[email protected]>
Cc: Baoquan He <[email protected]>
Cc: Benjamin Herrenschmidt <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Catalin Marinas <[email protected]>
Cc: Chintan Pandya <[email protected]>
Cc: Christophe Leroy <[email protected]>
Cc: Chris Wilson <[email protected]>
Cc: Dan Williams <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: "David S. Miller" <[email protected]>
Cc: Fenghua Yu <[email protected]>
Cc: Heiko Carstens <[email protected]>
Cc: "H. Peter Anvin" <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Jonathan Cameron <[email protected]>
Cc: Joonsoo Kim <[email protected]>
Cc: Jun Yao <[email protected]>
Cc: "Kirill A. Shutemov" <[email protected]>
Cc: Logan Gunthorpe <[email protected]>
Cc: Mark Brown <[email protected]>
Cc: Mark Rutland <[email protected]>
Cc: Masahiro Yamada <[email protected]>
Cc: Mathieu Malaterre <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: "[email protected]" <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Oscar Salvador <[email protected]>
Cc: Paul Mackerras <[email protected]>
Cc: Pavel Tatashin <[email protected]>
Cc: Peter Zijlstra <[email protected]>
Cc: Rich Felker <[email protected]>
Cc: Rob Herring <[email protected]>
Cc: Robin Murphy <[email protected]>
Cc: Thomas Gleixner <[email protected]>
Cc: Tony Luck <[email protected]>
Cc: Vasily Gorbik <[email protected]>
Cc: Wei Yang <[email protected]>
Cc: Will Deacon <[email protected]>
Cc: Yoshinori Sato <[email protected]>
Cc: Yu Zhao <[email protected]>
Signed-off-by: Andrew Morton <[email protected]>
Signed-off-by: Linus Torvalds <[email protected]>
|