aboutsummaryrefslogtreecommitdiff
path: root/drivers/base/node.c
AgeCommit message (Collapse)AuthorFilesLines
2011-12-21cpu: convert 'cpu' and 'machinecheck' sysdev_class to a regular subsystemKay Sievers1-4/+4
This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem and converts the devices to regular devices. The sysdev drivers are implemented as subsystem interfaces now. After all sysdev classes are ported to regular driver core entities, the sysdev implementation will be entirely removed from the kernel. Userspace relies on events and generic sysfs subsystem infrastructure from sysdev devices, which are made available with this conversion. Cc: Haavard Skinnemoen <[email protected]> Cc: Hans-Christian Egtvedt <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Paul Mundt <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Tigran Aivazian <[email protected]> Cc: Len Brown <[email protected]> Cc: Zhang Rui <[email protected]> Cc: Dave Jones <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Russell King <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Arjan van de Ven <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Cc: "Srivatsa S. Bhat" <[email protected]> Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2011-11-18drivers/base/node.c: fix compilation error with older versions of gccClaudio Scordino1-6/+8
Patch to fix the error message "directives may not be used inside a macro argument" which appears when the kernel is compiled for the cris architecture. Signed-off-by: Claudio Scordino <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2011-05-25mm: per-node vmstat: show proper vmstatsKOSAKI Motohiro1-5/+9
commit 2ac390370a ("writeback: add /sys/devices/system/node/<node>/vmstat") added vmstat entry. But strangely it only show nr_written and nr_dirtied. # cat /sys/devices/system/node/node20/vmstat nr_written 0 nr_dirtied 0 Of course, It's not adequate. With this patch, the vmstat show all vm stastics as /proc/vmstat. # cat /sys/devices/system/node/node0/vmstat nr_free_pages 899224 nr_inactive_anon 201 nr_active_anon 17380 nr_inactive_file 31572 nr_active_file 28277 nr_unevictable 0 nr_mlock 0 nr_anon_pages 17321 nr_mapped 8640 nr_file_pages 60107 nr_dirty 33 nr_writeback 0 nr_slab_reclaimable 6850 nr_slab_unreclaimable 7604 nr_page_table_pages 3105 nr_kernel_stack 175 nr_unstable 0 nr_bounce 0 nr_vmscan_write 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 260 nr_dirtied 1050 nr_written 938 numa_hit 962872 numa_miss 0 numa_foreign 0 numa_interleave 8617 numa_local 962872 numa_other 0 nr_anon_transparent_hugepages 0 [[email protected]: no externs in .c files] Signed-off-by: KOSAKI Motohiro <[email protected]> Cc: Michael Rubin <[email protected]> Cc: Wu Fengguang <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2011-02-03memory hotplug: Update phys_index to [start|end]_section_nrNathan Fontenot1-4/+8
Update the 'phys_index' property of a the memory_block struct to be called start_section_nr, and add a end_section_nr property. The data tracked here is the same but the updated naming is more in line with what is stored here, namely the first and last section number that the memory block spans. The names presented to userspace remain the same, phys_index for start_section_nr and end_phys_index for end_section_nr, to avoid breaking anything in userspace. This also updates the node sysfs code to be aware of the new capability for a memory block to contain multiple memory sections and be aware of the memory block structure name changes (start_section_nr). This requires an additional parameter to unregister_mem_sect_under_nodes so that we know which memory section of the memory block to unregister. Signed-off-by: Nathan Fontenot <[email protected]> Reviewed-by: Robin Holt <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2011-01-13thp: transparent hugepage sysfs meminfoDavid Rientjes1-3/+18
Add hugepage statistics to per-node sysfs meminfo Reviewed-by: Rik van Riel <[email protected]> Signed-off-by: David Rientjes <[email protected]> Signed-off-by: Andrea Arcangeli <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-26writeback: add /sys/devices/system/node/<node>/vmstatMichael Rubin1-0/+14
For NUMA node systems it is important to have visibility in memory characteristics. Two of the /proc/vmstat values "nr_written" and "nr_dirtied" are added here. # cat /sys/devices/system/node/node20/vmstat nr_written 0 nr_dirtied 0 Signed-off-by: Michael Rubin <[email protected]> Reviewed-by: Wu Fengguang <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Jens Axboe <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-10-22driver core: Convert link_mem_sections to use find_memory_block_hinted.Robin Holt1-3/+5
Modify link_mem_sections() to pass in the previous mem_block as a hint to locating the next mem_block. Since they are typically added in order this results in a massive saving in time during boot of a very large system. For example, on a 16TB x86_64 machine, it reduced the total time spent linking all node's memory sections from 1 hour, 27 minutes to 46 seconds. Signed-off-by: Robin Holt <[email protected]> To: Gary Hade <[email protected]> To: Badari Pulavarty <[email protected]> To: Ingo Molnar <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2010-08-09drivers/base/node.c: reduce stack usage of node_read_meminfo()KOSAKI Motohiro1-23/+23
drivers/base/node.c: In function 'node_read_meminfo': drivers/base/node.c:139: warning: the frame size of 848 bytes is larger than 512 bytes Fix it by splitting the sprintf() into three parts. It has no functional change. Signed-off-by: KOSAKI Motohiro <[email protected]> Cc: Greg KH <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-05-25mm: compaction: add /sys trigger for per-node memory compactionMel Gorman1-0/+3
Add a per-node sysfs file called compact. When the file is written to, each zone in that node is compacted. The intention that this would be used by something like a job scheduler in a batch system before a job starts so that the job can allocate the maximum number of hugepages without significant start-up cost. Signed-off-by: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Reviewed-by: KOSAKI Motohiro <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2010-04-07nodemask: include slab.h from drivers/base/node.cTejun Heo1-1/+1
NODEMASK_ALLOC/FREE are mapped to kmalloc/free if NODES_SHIFT > 8. Among its several users, drivers/base/node.c wasn't including slab.h leading to build failure if NODES_SHIFT > 8. Include slab.h from drivers/base/node.c. This isn't an ideal solution but including slab.h directly from nodemask.h is not an option because nodemask.h gets included everywhere. For now, make it work by including slab.h from its users. Signed-off-by: Tejun Heo <[email protected]> Reported-by: Ingo Molnar <[email protected]>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo1-0/+1
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <[email protected]> Guess-its-ok-by: Christoph Lameter <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Lee Schermerhorn <[email protected]>
2010-03-19driver core: numa: fix BUILD_BUG_ON for node_read_distanceDavid Rientjes1-2/+5
node_read_distance() has a BUILD_BUG_ON() to prevent buffer overruns when the number of nodes printed will exceed the buffer length. Each node only needs four chars: three for distance (maximum distance is 255) and one for a seperating space or a trailing newline. Signed-off-by: David Rientjes <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2010-03-07sysdev: Use sysdev_class attribute arrays in node driverAndi Kleen1-15/+16
Convert the node driver to sysdev_class attribute arrays. This greatly cleans up the code and remove a lot of code. Saves ~150 bytes of code on x86-64. Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2010-03-07sysdev: Convert node driver class attributes to be data drivenAndi Kleen1-47/+18
Using the new attribute argument convert the node driver class attributes to carry the node state. Then use a shared function to do what a lot of individual functions did before. Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2010-03-07sysdev: Pass attribute in sysdev_class attributes show/storeAndi Kleen1-5/+12
Passing the attribute to the low level IO functions allows all kinds of cleanups, by sharing low level IO code without requiring an own function for every piece of data. Also drivers can extend the attributes with own data fields and use that in the low level function. Similar to sysdev_attributes and normal attributes. This is a tree-wide sweep, converting everything in one go. No functional changes in this patch other than passing the new argument everywhere. Tested on x86, the non x86 parts are uncompiled. Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2009-12-15mm: slab-allocate memory section nodemask for large systemsDavid Rientjes1-4/+9
Nodemasks should not be allocated on the stack for large systems (when it is larger than 256 bytes) since there is a threat of overflow. This patch causes the unregister_mem_sect_under_nodes() nodemask to be allocated on the stack for smaller systems and be allocated by slab for larger systems. GFP_KERNEL is used since remove_memory_block() can block. Cc: Gary Hade <[email protected]> Cc: Badari Pulavarty <[email protected]> Cc: Alex Chiang <[email protected]> Signed-off-by: David Rientjes <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-12-15mm: add numa node symlink for cpu devices in sysfsAlex Chiang1-1/+10
You can discover which CPUs belong to a NUMA node by examining /sys/devices/system/node/node#/ However, it's not convenient to go in the other direction, when looking at /sys/devices/system/cpu/cpu#/ Yes, you can muck about in sysfs, but adding these symlinks makes life a lot more convenient. Signed-off-by: Alex Chiang <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Gary Hade <[email protected]> Cc: Badari Pulavarty <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg KH <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: David Rientjes <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-12-15mm: refactor unregister_cpu_under_node()Alex Chiang1-6/+12
By returning early if the node is not online, we can unindent the interesting code by two levels. No functional change. Signed-off-by: Alex Chiang <[email protected]> Cc: Gary Hade <[email protected]> Cc: Badari Pulavarty <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg KH <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: David Rientjes <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-12-15mm: refactor register_cpu_under_node()Alex Chiang1-9/+11
By returning early if the node is not online, we can unindent the interesting code by one level. No functional change. Signed-off-by: Alex Chiang <[email protected]> Cc: Gary Hade <[email protected]> Cc: Badari Pulavarty <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg KH <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: David Rientjes <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-12-15mm: add numa node symlink for memory section in sysfsAlex Chiang1-1/+10
Commit c04fc586c (mm: show node to memory section relationship with symlinks in sysfs) created symlinks from nodes to memory sections, e.g. /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 If you're examining the memory section though and are wondering what node it might belong to, you can find it by grovelling around in sysfs, but it's a little cumbersome. Add a reverse symlink for each memory section that points back to the node to which it belongs. Signed-off-by: Alex Chiang <[email protected]> Cc: Gary Hade <[email protected]> Cc: Badari Pulavarty <[email protected]> Cc: Ingo Molnar <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Greg KH <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: David Rientjes <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-12-15hugetlb: offload per node attribute registrationsLee Schermerhorn1-10/+47
Offload the registration and unregistration of per node hstate sysfs attributes to a worker thread rather than attempt the allocation/attachment or detachment/freeing of the attributes in the context of the memory hotplug handler. I don't know that this is absolutely required, but the registration can sleep in allocations and other mem hot plug handlers do it this way. If it turns out this is NOT required, we can drop this patch. N.B., Only tested build, boot, libhugetlbfs regression. i.e., no memory hotplug testing. Signed-off-by: Lee Schermerhorn <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Nishanth Aravamudan <[email protected]> Cc: David Rientjes <[email protected]> Cc: Adam Litke <[email protected]> Cc: Andy Whitcroft <[email protected]> Cc: Eric Whitney <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-12-15hugetlb: handle memory hot-plug eventsLee Schermerhorn1-5/+48
Register per node hstate attributes only for nodes with memory. As suggested by David Rientjes. With Memory Hotplug, memory can be added to a memoryless node and a node with memory can become memoryless. Therefore, add a memory on/off-line notifier callback to [un]register a node's attributes on transition to/from memoryless state. N.B., Only tested build, boot, libhugetlbfs regression. i.e., no memory hotplug testing. Signed-off-by: Lee Schermerhorn <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Lee Schermerhorn <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Nishanth Aravamudan <[email protected]> Cc: Adam Litke <[email protected]> Cc: Andy Whitcroft <[email protected]> Cc: Eric Whitney <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-12-15hugetlb: add per node hstate attributesLee Schermerhorn1-0/+39
Add the per huge page size control/query attributes to the per node sysdevs: /sys/devices/system/node/node<ID>/hugepages/hugepages-<size>/ nr_hugepages - r/w free_huge_pages - r/o surplus_huge_pages - r/o The patch attempts to re-use/share as much of the existing global hstate attribute initialization and handling, and the "nodes_allowed" constraint processing as possible. Calling set_max_huge_pages() with no node indicates a change to global hstate parameters. In this case, any non-default task mempolicy will be used to generate the nodes_allowed mask. A valid node id indicates an update to that node's hstate parameters, and the count argument specifies the target count for the specified node. From this info, we compute the target global count for the hstate and construct a nodes_allowed node mask contain only the specified node. Setting the node specific nr_hugepages via the per node attribute effectively ignores any task mempolicy or cpuset constraints. With this patch: (me):ls /sys/devices/system/node/node0/hugepages/hugepages-2048kB ./ ../ free_hugepages nr_hugepages surplus_hugepages Starting from: Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 0 Node 2 HugePages_Free: 0 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 vm.nr_hugepages = 0 Allocate 16 persistent huge pages on node 2: (me):echo 16 >/sys/devices/system/node/node2/hugepages/hugepages-2048kB/nr_hugepages [Note that this is equivalent to: numactl -m 2 hugeadmin --pool-pages-min 2M:+16 ] Yields: Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 16 Node 2 HugePages_Free: 16 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 vm.nr_hugepages = 16 Global controls work as expected--reduce pool to 8 persistent huge pages: (me):echo 8 >/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages Node 0 HugePages_Total: 0 Node 0 HugePages_Free: 0 Node 0 HugePages_Surp: 0 Node 1 HugePages_Total: 0 Node 1 HugePages_Free: 0 Node 1 HugePages_Surp: 0 Node 2 HugePages_Total: 8 Node 2 HugePages_Free: 8 Node 2 HugePages_Surp: 0 Node 3 HugePages_Total: 0 Node 3 HugePages_Free: 0 Node 3 HugePages_Surp: 0 Signed-off-by: Lee Schermerhorn <[email protected]> Acked-by: Mel Gorman <[email protected]> Reviewed-by: Andi Kleen <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Nishanth Aravamudan <[email protected]> Cc: David Rientjes <[email protected]> Cc: Adam Litke <[email protected]> Cc: Andy Whitcroft <[email protected]> Cc: Eric Whitney <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-09-22mm: oom analysis: add shmem vmstatKOSAKI Motohiro1-0/+2
Recently we encountered OOM problems due to memory use of the GEM cache. Generally a large amuont of Shmem/Tmpfs pages tend to create a memory shortage problem. We often use the following calculation to determine the amount of shmem pages: shmem = NR_ACTIVE_ANON + NR_INACTIVE_ANON - NR_ANON_PAGES however the expression does not consider isolated and mlocked pages. This patch adds explicit accounting for pages used by shmem and tmpfs. Signed-off-by: KOSAKI Motohiro <[email protected]> Acked-by: Rik van Riel <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Acked-by: Wu Fengguang <[email protected]> Cc: David Rientjes <[email protected]> Cc: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-09-22mm: oom analysis: Show kernel stack usage in /proc/meminfo and OOM log outputKOSAKI Motohiro1-0/+3
The amount of memory allocated to kernel stacks can become significant and cause OOM conditions. However, we do not display the amount of memory consumed by stacks. Add code to display the amount of memory used for stacks in /proc/meminfo. Signed-off-by: KOSAKI Motohiro <[email protected]> Reviewed-by: Christoph Lameter <[email protected]> Reviewed-by: Minchan Kim <[email protected]> Reviewed-by: Rik van Riel <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-06-16mm: remove CONFIG_UNEVICTABLE_LRU config optionKOSAKI Motohiro1-4/+0
Currently, nobody wants to turn UNEVICTABLE_LRU off. Thus this configurability is unnecessary. Signed-off-by: KOSAKI Motohiro <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Andi Kleen <[email protected]> Acked-by: Minchan Kim <[email protected]> Cc: David Woodhouse <[email protected]> Cc: Matt Mackall <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Lee Schermerhorn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-03-13cpumask: replace node_to_cpumask with cpumask_of_node.Rusty Russell1-1/+1
Impact: cleanup node_to_cpumask (and the blecherous node_to_cpumask_ptr which contained a declaration) are replaced now everyone implements cpumask_of_node. Signed-off-by: Rusty Russell <[email protected]>
2009-03-10mm: get_nid_for_pfn() returns intRoel Kluin1-1/+1
get_nid_for_pfn() returns int Presumably the (nid < 0) case has never happened. We do know that it is happening on one system while creating a symlink for a memory section so it should also happen on the same system if unregister_mem_sect_under_nodes() were called to remove the same symlink. The test was actually added in response to a problem with an earlier version reported by Yasunori Goto where one or more of the leading pages of a memory section on the 2nd node of one of his systems was uninitialized because I believe they coincided with a memory hole. That earlier version did not ignore uninitialized pages and determined the nid by considering only the 1st page of each memory section. This caused the symlink to the 1st memory section on the 2nd node to be incorrectly created in /sys/devices/system/node/node0 instead of /sys/devices/system/node/node1. The problem was fixed by adding the test to skip over uninitialized pages. I suspect we have not seen any reports of the non-removal of a symlink due to the incorrect declaration of the nid variable in unregister_mem_sect_under_nodes() because - systems where a memory section could have an uninitialized range of leading pages are probably rare. - memory remove is probably not done very frequently on the systems that are capable of demonstrating the problem. - lingering symlink(s) that should have been removed may have simply gone unnoticed. [[email protected]: wrote changelog] Signed-off-by: Roel Kluin <[email protected]> Cc: Gary Hade <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2009-01-06mm: show node to memory section relationship with symlinks in sysfsGary Hade1-0/+103
Show node to memory section relationship with symlinks in sysfs Add /sys/devices/system/node/nodeX/memoryY symlinks for all the memory sections located on nodeX. For example: /sys/devices/system/node/node1/memory135 -> ../../memory/memory135 indicates that memory section 135 resides on node1. Also revises documentation to cover this change as well as updating Documentation/ABI/testing/sysfs-devices-memory to include descriptions of memory hotremove files 'phys_device', 'phys_index', and 'state' that were previously not described there. In addition to it always being a good policy to provide users with the maximum possible amount of physical location information for resources that can be hot-added and/or hot-removed, the following are some (but likely not all) of the user benefits provided by this change. Immediate: - Provides information needed to determine the specific node on which a defective DIMM is located. This will reduce system downtime when the node or defective DIMM is swapped out. - Prevents unintended onlining of a memory section that was previously offlined due to a defective DIMM. This could happen during node hot-add when the user or node hot-add assist script onlines _all_ offlined sections due to user or script inability to identify the specific memory sections located on the hot-added node. The consequences of reintroducing the defective memory could be ugly. - Provides information needed to vary the amount and distribution of memory on specific nodes for testing or debugging purposes. Future: - Will provide information needed to identify the memory sections that need to be offlined prior to physical removal of a specific node. Symlink creation during boot was tested on 2-node x86_64, 2-node ppc64, and 2-node ia64 systems. Symlink creation during physical memory hot-add tested on a 2-node x86_64 system. Signed-off-by: Gary Hade <[email protected]> Signed-off-by: Badari Pulavarty <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-12-13cpumask: change cpumask_scnprintf, cpumask_parse_user, cpulist_parse, and ↵Rusty Russell1-2/+2
cpulist_scnprintf to take pointers. Impact: change calling convention of existing cpumask APIs Most cpumask functions started with cpus_: these have been replaced by cpumask_ ones which take struct cpumask pointers as expected. These four functions don't have good replacement names; fortunately they're rarely used, so we just change them over. Signed-off-by: Rusty Russell <[email protected]> Signed-off-by: Mike Travis <[email protected]> Acked-by: Ingo Molnar <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Cc: [email protected]
2008-10-20vmscan: unevictable LRU scan sysctlLee Schermerhorn1-0/+5
This patch adds a function to scan individual or all zones' unevictable lists and move any pages that have become evictable onto the respective zone's inactive list, where shrink_inactive_list() will deal with them. Adds sysctl to scan all nodes, and per node attributes to individual nodes' zones. Kosaki: If evictable page found in unevictable lru when write /proc/sys/vm/scan_unevictable_pages, print filename and file offset of these pages. [[email protected]: fix one CONFIG_MMU=n build error] [[email protected]: adapt vmscan-unevictable-lru-scan-sysctl.patch to new sysfs API] Signed-off-by: Lee Schermerhorn <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: KOSAKI Motohiro <[email protected]> Signed-off-by: KOSAKI Motohiro <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-10-20vmstat: mlocked pages statisticsNick Piggin1-1/+3
Add NR_MLOCK zone page state, which provides a (conservative) count of mlocked pages (actually, the number of mlocked pages moved off the LRU). Reworked by lts to fit in with the modified mlock page support in the Reclaim Scalability series. [[email protected]: fix incorrect Mlocked field of /proc/meminfo] [[email protected]: mlocked-pages: add event counting with statistics] Signed-off-by: Nick Piggin <[email protected]> Signed-off-by: Lee Schermerhorn <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: KOSAKI Motohiro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-10-20Unevictable LRU Page StatisticsLee Schermerhorn1-0/+6
Report unevictable pages per zone and system wide. Kosaki Motohiro added support for memory controller unevictable statistics. [[email protected]: fix printk in show_free_areas()] [[email protected]: fix units in /proc/vmstats] Signed-off-by: Lee Schermerhorn <[email protected]> Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: KOSAKI Motohiro <[email protected]> Debugged-by: Hiroshi Shimamoto <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-10-20vmscan: split LRU lists into anon & file setsRik van Riel1-23/+33
Split the LRU lists in two, one set for pages that are backed by real file systems ("file") and one for pages that are backed by memory and swap ("anon"). The latter includes tmpfs. The advantage of doing this is that the VM will not have to scan over lots of anonymous pages (which we generally do not want to swap out), just to find the page cache pages that it should evict. This patch has the infrastructure and a basic policy to balance how much we scan the anon lists and how much we scan the file lists. The big policy changes are in separate patches. [[email protected]: collect lru meminfo statistics from correct offset] [[email protected]: prevent incorrect oom under split_lru] [[email protected]: fix pagevec_move_tail() doesn't treat unevictable page] [[email protected]: memcg swapbacked pages active] [[email protected]: splitlru: BDI_CAP_SWAP_BACKED] [[email protected]: fix /proc/vmstat units] [[email protected]: memcg: fix handling of shmem migration] [[email protected]: adjust Quicklists field of /proc/meminfo] [[email protected]: fix style issue of get_scan_ratio()] Signed-off-by: Rik van Riel <[email protected]> Signed-off-by: Lee Schermerhorn <[email protected]> Signed-off-by: KOSAKI Motohiro <[email protected]> Signed-off-by: Hugh Dickins <[email protected]> Signed-off-by: Daisuke Nishimura <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-07-21sysdev: Pass the attribute to the low level sysdev show/store functionAndi Kleen1-5/+10
This allow to dynamically generate attributes and share show/store functions between attributes. Right now most attributes are generated by special macros and lots of duplicated code. With the attribute passed it's instead possible to attach some data to the attribute and then use that in shared low level functions to do different things. I need this for the dynamically generated bank attributes in the x86 machine check code, but it'll allow some further cleanups. I converted all users in tree to the new show/store prototype. It's a single huge patch to avoid unbisectable sections. Runtime tested: x86-32, x86-64 Compiled only: ia64, powerpc Not compile tested/only grep converted: sh, arm, avr32 Signed-off-by: Andi Kleen <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2008-07-04mm: switch node meminfo Active & Inactive pages to KbytesJohn Blackwood1-2/+2
There is a bug in the output of /sys/devices/system/node/node[n]/meminfo where the Active and Inactive values are in pages instead of Kbytes. Looks like this occurred back in 2.6.20 when the code was changed over to use node_page_state(). Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-30mm: Add NR_WRITEBACK_TEMP counterMiklos Szeredi1-0/+2
Fuse will use temporary buffers to write back dirty data from memory mappings (normal writes are done synchronously). This is needed, because there cannot be any guarantee about the time in which a write will complete. By using temporary buffers, from the MM's point if view the page is written back immediately. If the writeout was due to memory pressure, this effectively migrates data from a full zone to a less full zone. This patch adds a new counter (NR_WRITEBACK_TEMP) for the number of pages used as temporary buffers. [[email protected]: add vmstat_text for NR_WRITEBACK_TEMP] Signed-off-by: Miklos Szeredi <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Lee Schermerhorn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2008-04-19cpumask: use new cpus_scnprintf functionMike Travis1-5/+19
* Cleaned up references to cpumask_scnprintf() and added new cpulist_scnprintf() interfaces where appropriate. * Fix some small bugs (or code efficiency improvments) for various uses of cpumask_scnprintf. * Clean up some checkpatch errors. Signed-off-by: Mike Travis <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2008-04-19nodemask: use new node_to_cpumask_ptr functionMike Travis1-3/+4
* Use new node_to_cpumask_ptr. This creates a pointer to the cpumask for a given node. This definition is in mm patch: asm-generic-add-node_to_cpumask_ptr-macro.patch * Use new set_cpus_allowed_ptr function. Depends on: [mm-patch]: asm-generic-add-node_to_cpumask_ptr-macro.patch [sched-devel]: sched: add new set_cpus_allowed_ptr function [x86/latest]: x86: add cpus_scnprintf function Cc: Greg Kroah-Hartman <[email protected]> Cc: Greg Banks <[email protected]> Cc: H. Peter Anvin <[email protected]> Signed-off-by: Mike Travis <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2008-01-24Driver core: change sysdev classes to use dynamic kobject namesKay Sievers1-1/+1
All kobjects require a dynamically allocated name now. We no longer need to keep track if the name is statically assigned, we can just unconditionally free() all kobject names on cleanup. Signed-off-by: Kay Sievers <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2007-10-16mm: add node states sysfs class attributeSLee Schermerhorn1-1/+90
Add a per node state sysfs class attribute file to /sys/devices/system/node to display node state masks. E.g., on a 4-cell HP ia64 NUMA platform, we have 5 nodes: 4 representing the actual hardware cells and one memory-only pseudo-node representing a small amount [512MB] of "hardware interleaved" memory. With this patch, in /sys/devices/system/node we see: #ls -1F /sys/devices/system/node has_cpu has_normal_memory node0/ node1/ node2/ node3/ node4/ online possible #cat /sys/devices/system/node/possible 0-255 #cat /sys/devices/system/node/online 0-4 #cat /sys/devices/system/node/has_normal_memory 0-4 #cat /sys/devices/system/node/has_cpu 0-3 Signed-off-by: Lee Schermerhorn <[email protected]> Acked-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2007-02-17Replace remaining references to "driverfs" with "sysfs".Robert P. J. Day1-1/+1
Globally, s/driverfs/sysfs/g. Signed-off-by: Robert P. J. Day <[email protected]> Signed-off-by: Adrian Bunk <[email protected]>
2007-02-11[PATCH] Drop __get_zone_counts()Christoph Lameter1-7/+2
Values are readily available via ZVC per node and global sums. Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-09-26[PATCH] ZVC: Support NR_SLAB_RECLAIMABLE / NR_SLAB_UNRECLAIMABLEChristoph Lameter1-2/+7
Remove the atomic counter for slab_reclaim_pages and replace the counter and NR_SLAB with two ZVC counter that account for unreclaimable and reclaimable slab pages: NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE. Change the check in vmscan.c to refer to to NR_SLAB_RECLAIMABLE. The intend seems to be to check for slab pages that could be freed. Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-09-26[PATCH] reduce MAX_NR_ZONES: make display of highmem counters conditional on ↵Christoph Lameter1-0/+4
CONFIG_HIGHMEM Do not display HIGHMEM memory sizes if CONFIG_HIGHMEM is not set. Make HIGHMEM dependent texts and make display of highmem counters optional Some texts are depending on CONFIG_HIGHMEM. Remove those strings and remove the display of highmem counter values if CONFIG_HIGHMEM is not set. [[email protected]: remove some ifdefs] Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-08-27[PATCH] /proc/meminfo: don't put spaces in namesAndrew Morton1-1/+1
None of the other /proc/meminfo lines have a space in the identifier. This post-2.6.17 addition has the potential to break existing parsers, so use an underscore instead (like Committed_AS). Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-06-30[PATCH] Use Zoned VM Counters for NUMA statisticsChristoph Lameter1-28/+6
The numa statistics are really event counters. But they are per node and so we have had special treatment for these counters through additional fields on the pcp structure. We can now use the per zone nature of the zoned VM counters to realize these. This will shrink the size of the pcp structure on NUMA systems. We will have some room to add additional per zone counters that will all still fit in the same cacheline. Bits Prior pcp size Size after patch We can add ------------------------------------------------------------------ 64 128 bytes (16 words) 80 bytes (10 words) 48 32 76 bytes (19 words) 56 bytes (14 words) 8 (64 byte cacheline) 72 (128 byte) Remove the special statistics for numa and replace them with zoned vm counters. This has the side effect that global sums of these events now show up in /proc/vmstat. Also take the opportunity to move the zone_statistics() function from page_alloc.c into vmstat.c. Discussions: V2 http://marc.theaimsgroup.com/?t=115048227000002&r=1&w=2 Signed-off-by: Christoph Lameter <[email protected]> Acked-by: Andi Kleen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-06-30[PATCH] zoned vm counters: conversion of nr_bounce to per zone counterChristoph Lameter1-0/+2
Conversion of nr_bounce to a per zone counter nr_bounce is only used for proc output. So it could be left as an event counter. However, the event counters may not be accurate and nr_bounce is categorizing types of pages in a zone. So we really need this to also be a per zone counter. [[email protected]: bugfix] Signed-off-by: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-06-30[PATCH] zoned vm counters: conversion of nr_unstable to per zone counterChristoph Lameter1-2/+2
Conversion of nr_unstable to a per zone counter We need to do some special modifications to the nfs code since there are multiple cases of disposition and we need to have a page ref for proper accounting. This converts the last critical page state of the VM and therefore we need to remove several functions that were depending on GET_PAGE_STATE_LAST in order to make the kernel compile again. We are only left with event type counters in page state. [[email protected]: bugfixes] Signed-off-by: Christoph Lameter <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2006-06-30[PATCH] zoned vm counters: conversion of nr_writeback to per zone counterChristoph Lameter1-4/+1
Conversion of nr_writeback to per zone counter. This removes the last page_state counter from arch/i386/mm/pgtable.c so we drop the page_state from there. [[email protected]: bugfix] Signed-off-by: Christoph Lameter <[email protected]> Cc: Trond Myklebust <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>