aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2015-06-30kernel/relay.c: use kvfree() in relay_free_page_array()Pekka Enberg1-4/+1
Use kvfree() instead of open-coding it. Signed-off-by: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30printk: improve the description of /dev/kmsg line formatAntonio Ospite1-4/+4
The comment about /dev/kmsg does not mention the additional values which may actually be exported, fix that. Also move up the part of the comment instructing the users to ignore these additional values, this way the reading is more fluent and logically compact. Signed-off-by: Antonio Ospite <[email protected]> Cc: Joe Perches <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30arch/unicore32/kernel/fpu-ucf64.c: remove unnecessary KERN_ERRMasanari Iida1-2/+2
Signed-off-by: Masanari Iida <[email protected]> Cc: Guan Xuetao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30drivers/scsi/scsi_debug.c: resolve sg buffer const-ness issueDave Gordon3-11/+10
do_device_access() takes a separate parameter to indicate the direction of data transfer, which it used to use to select the appropriate function out of sg_pcopy_{to,from}_buffer(). However these two functions now have So this patch makes it bypass these wrappers and call the underlying function sg_copy_buffer() directly; this has the same calling style as do_device_access() i.e. a separate direction-of-transfer parameter and no pointers-to-const, so skipping the wrappers not only eliminates the warning, it also make the code simpler :) [[email protected]: fix very broken build] Signed-off-by: Dave Gordon <[email protected]> Acked-by: Arnd Bergmann <[email protected]> Cc: James Bottomley <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30lib/scatterlist: mark input buffer parameters as 'const'Dave Gordon2-6/+6
The 'buf' parameter of sg(p)copy_from_buffer() can and should be const-qualified, although because of the shared implementation of _to_buffer() and _from_buffer(), we have to cast this away internally. This means that callers who have a 'const' buffer containing the data to be copied to the sg-list no longer have to cast away the const-ness themselves. It also enables improved coverage by code analysis tools. Signed-off-by: Dave Gordon <[email protected]> Cc: Akinobu Mita <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30lib/scatterlist.c: fix kerneldoc for sg_pcopy_{to,from}_buffer()Dave Gordon1-2/+2
The kerneldoc for the functions doesn't match the code; the last two parameters (buflen, skip) have been transposed, which is confusing, especially as they're both integral types and the compiler won't warn about swapping them. These functions and the kerneldoc were introduced in commit: df642cea lib/scatterlist: introduce sg_pcopy_from_buffer() ... Author: Akinobu Mita <[email protected]> Date: Mon Jul 8 16:01:54 2013 -0700 The only difference between sg_pcopy_{from,to}_buffer() and sg_copy_{from,to}_buffer() is an additional argument that specifies the number of bytes to skip the SG list before copying. The functions have the extra argument at the end, but the kerneldoc lists it in penultimate position. Signed-off-by: Dave Gordon <[email protected]> Reviewed-by: Akinobu Mita <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30ipc,sysv: return -EINVAL upon incorrect id/seqnumDavidlohr Bueso1-1/+1
In ipc_obtain_object_check we return -EIDRM when a bogus sequence number is detected via ipc_checkid, while the ipc manpages state the following return codes for such errors: EIDRM <ID> points to a removed identifier. EINVAL Invalid <ID> value, or unaligned, etc. EIDRM should only be returned upon a RMID call (->deleted check), and thus return EINVAL for wrong seq. This difference in semantics has also caused real bugs, ie: https://bugzilla.redhat.com/show_bug.cgi?id=246509 Signed-off-by: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30ipc,sysv: make return -EIDRM when racing with RMID consistentDavidlohr Bueso1-5/+8
The ipc_lock helper is used by all forms of sysv ipc to acquire the ipc object's spinlock. Upon error (bogus identifier), we always return -EINVAL, whether the problem be in the idr path or because we raced with a task performing RMID. For the later, however, all ipc related manpages, state the that for: EIDRM <ID> points to a removed identifier. And return: EINVAL Invalid <ID> value, or unaligned, etc. Which (EINVAL) should only return once the ipc resource is deleted. For all types of ipc this is done immediately upon a RMID command. However, shared memory behaves slightly different as it can merely mark a segment for deletion, and delay the actual freeing until there are no more active consumers. Per shmctl(IPC_RMID) manpage: "" Mark the segment to be destroyed. The segment will only actually be destroyed after the last process detaches it (i.e., when the shm_nattch member of the associated structure shmid_ds is zero). "" Unlike ipc_lock, paths that behave "correctly", at least per the manpage, involve controlling the ipc resource via *ctl(), doing the exact same validity check as ipc_lock after right acquiring the spinlock: if (!ipc_valid_object()) { err = -EIDRM; goto out_unlock; } Thus make ipc_lock consistent with the rest of ipc code and return -EIDRM in ipc_lock when !ipc_valid_object(). Signed-off-by: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30ipc: rename ipc_obtain_objectDavidlohr Bueso5-9/+9
... to ipc_obtain_object_idr, which is more meaningful and makes the code slightly easier to follow. Signed-off-by: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30ipc,msg: provide barrier pairings for lockless receiveDavidlohr Bueso1-10/+38
We currently use a full barrier on the sender side to to avoid receiver tasks disappearing on us while still performing on the sender side wakeup. We lack however, the proper CPU-CPU interactions pairing on the receiver side which busy-waits for the message. Similarly, we do not need a full smp_mb, and can relax the semantics for the writer and reader sides of the message. This is safe as we are only ordering loads and stores to r_msg. And in both smp_wmb and smp_rmb, there are no stores after the calls _anyway_. This obviously applies for pipelined_send and expunge_all, for EIRDM when destroying a queue. Signed-off-by: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30ipc,shm: move BUG_ON check into shm_lockDavidlohr Bueso1-5/+5
Upon every shm_lock call, we BUG_ON if an error was returned, indicating racing either in idr or in shm_destroy. Move this logic into the locking. [[email protected]: simplify code] Signed-off-by: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30ipc/util.c: use kvfree() in ipc_rcu_free()Pekka Enberg1-4/+1
Use kvfree() instead of open-coding it. Signed-off-by: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30arc: use for_each_sg()Akinobu Mita1-5/+7
This replaces the plain loop over the sglist array with for_each_sg() macro which consists of sg_next() function calls. Since arc doesn't select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in order to loop over each sg element. But this can help find problems with drivers that do not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled. Signed-off-by: Akinobu Mita <[email protected]> Acked-by: Vineet Gupta <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30devpts: if initialization failed, don't crash when opening /dev/ptmxJosh Triplett1-7/+24
If devpts failed to initialize, it would store an ERR_PTR in the global devpts_mnt. A subsequent open of /dev/ptmx would call devpts_new_index, which would dereference devpts_mnt and crash. Avoid storing invalid values in devpts_mnt; leave it NULL instead. Make both devpts_new_index and devpts_pty_new fail gracefully with ENODEV in that case, which then becomes the return value to the userspace open call on /dev/ptmx. [[email protected]: remove unneeded static] Signed-off-by: Josh Triplett <[email protected]> Reported-by: Fengguang Wu <[email protected]> Reviewed-by: Peter Hurley <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30scripts/gdb: remove useless global instructionThiébaud Weksteen1-2/+0
Signed-off-by: Thiébaud Weksteen <[email protected]> Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30scripts/gdb: add ps commandThiébaud Weksteen1-0/+16
Signed-off-by: Thiébaud Weksteen <[email protected]> Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30scripts/gdb: fix PEP8 complianceThiébaud Weksteen4-7/+7
Signed-off-by: Thiébaud Weksteen <[email protected]> Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30scripts/gdb: fix typo in exception nameThiébaud Weksteen1-1/+1
Signed-off-by: Thiébaud Weksteen <[email protected]> Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30scripts/gdb: enable completion for lx-list-check parameterJan Kiszka1-1/+2
Signed-off-by: Jan Kiszka <[email protected]> Cc: Thiébaud Weksteen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30scripts/gdb: also allow list_head pointer as lx-list-check paramterJan Kiszka1-2/+4
This makes the usage more flexible. Signed-off-by: Jan Kiszka <[email protected]> Cc: Thiébaud Weksteen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30scripts/gdb: add command to check list consistencyThiébaud Weksteen2-0/+90
Add a gdb script to verify the consistency of lists. Signed-off-by: Thiébaud Weksteen <[email protected]> Signed-off-by: Jan Kiszka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30memstick: remove deprecated use of pci apiQuentin Lambert2-11/+11
Replace occurences of the pci api by appropriate call to the dma api. A simplified version of the semantic patch that finds this problem is as follows: (http://coccinelle.lip6.fr) @deprecated@ idexpression id; position p; @@ ( pci_dma_supported@p ( id, ...) | pci_alloc_consistent@p ( id, ...) ) @bad1@ idexpression id; position deprecated.p; @@ ...when != &id->dev when != pci_get_drvdata ( id ) when != pci_enable_device ( id ) ( pci_dma_supported@p ( id, ...) | pci_alloc_consistent@p ( id, ...) ) @depends on !bad1@ idexpression id; expression direction; position deprecated.p; @@ ( - pci_dma_supported@p ( id, + dma_supported ( &id->dev, ... + , GFP_ATOMIC ) | - pci_alloc_consistent@p ( id, + dma_alloc_coherent ( &id->dev, ... + , GFP_ATOMIC ) ) Signed-off-by: Quentin Lambert <[email protected]> Cc: Maxim Levitsky <[email protected]> Cc: Greg KH <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30fs/affs/symlink.c: remove unneeded err variableFabian Frederick1-3/+1
err is only assigned to -EIO. Return that value at the end of fail context. Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30fs/affs/amigaffs.c: remove unneeded initializationFabian Frederick1-1/+1
bh is initialized unconditionally in affs_remove_link() Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30fs/affs/inode.c: remove unneeded initializationFabian Frederick1-1/+1
bh is initialized unconditionally in affs_add_entry() Signed-off-by: Fabian Frederick <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30fs/adfs: remove unneeded castFiro Yang1-1/+1
kmem_cache_alloc() returns void*. Signed-off-by: Firo Yang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30gcov: add support for GCC 5.1Lorenzo Stoakes2-1/+9
Fix kernel gcov support for GCC 5.1. Similar to commit a992bf836f9 ("gcov: add support for GCC 4.9"), this patch takes into account the existence of a new gcov counter (see gcc's gcc/gcov-counter.def.) Firstly, it increments GCOV_COUNTERS (to 10), which makes the data structure struct gcov_info compatible with GCC 5.1. Secondly, a corresponding counter function __gcov_merge_icall_topn (Top N value tracking for indirect calls) is included in base.c with the other gcov counters unused for kernel profiling. Signed-off-by: Lorenzo Stoakes <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Yuan Pengfei <[email protected]> Tested-by: Peter Oberparleiter <[email protected]> Reviewed-by: Peter Oberparleiter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30kernel/panic/kexec: fix "crash_kexec_post_notifiers" option issue in oops pathHATAYAMA Daisuke3-1/+15
Commit f06e5153f4ae2e ("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers") introduced "crash_kexec_post_notifiers" kernel boot option, which toggles wheather panic() calls crash_kexec() before panic_notifiers and dump kmsg or after. The problem is that the commit overlooks panic_on_oops kernel boot option. If it is enabled, crash_kexec() is called directly without going through panic() in oops path. To fix this issue, this patch adds a check to "crash_kexec_post_notifiers" in the condition of kexec_should_crash(). Also, put a comment in kexec_should_crash() to explain not obvious things on this patch. Signed-off-by: HATAYAMA Daisuke <[email protected]> Acked-by: Baoquan He <[email protected]> Tested-by: Hidehiro Kawai <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Hidehiro Kawai <[email protected]> Cc: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30kernel/panic: call the 2nd crash_kexec() only if crash_kexec_post_notifiers ↵HATAYAMA Daisuke1-1/+2
is enabled For compatibility with the behaviour before the commit f06e5153f4ae2e ("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers"), the 2nd crash_kexec() should be called only if crash_kexec_post_notifiers is enabled. Note that crash_kexec() returns immediately if kdump crash kernel is not loaded, so in this case, this patch makes no functionality change, but the point is to make it explicit, from the caller panic() side, that the 2nd crash_kexec() does nothing. Signed-off-by: HATAYAMA Daisuke <[email protected]> Suggested-by: Ingo Molnar <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Hidehiro Kawai <[email protected]> Cc: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30x86/kexec: prepend elfcorehdr instead of appending it to the crash-kernel ↵KarimAllah Ahmed1-5/+6
command-line. Any parameter passed after '--' in the kernel command-line will not be parsed by the kernel at all, instead it will be passed directly to init process. Currently the kernel appends elfcorehdr=<paddr> to the cmdline passed from kexec load, and if this command-line is used to pass parameters to init process this means that 'elfcorehdr' will not be parsed as a kernel parameter at all which will be a problem for vmcore subsystem since it will know nothing about the location of the ELF structure! Prepending 'elfcorehdr' instead of appending it fixes this problem since it ensures that it always comes before '--' and so it's always parsed as a kernel command-line parameter. Even with this patch things can still go wrong if 'CONFIG_CMDLINE' was also used to embedd a command-line to the crash dump kernel and this command-line contains '--' since the current behavior of the kernel is to actually append the boot loader command-line to the embedded command-line. Signed-off-by: KarimAllah Ahmed <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]> Acked-by: Vivek Goyal <[email protected]> Cc: Haren Myneni <[email protected]> Cc: Eric Biederman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30fs: document seq_open()'s usage of file->private_dataYann Droneaud1-0/+2
seq_open() stores its struct seq_file in file->private_data, thus it must not be modified by user of seq_file. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yann Droneaud <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30fs: allocate structure unconditionally in seq_open()Yann Droneaud1-8/+9
Since patch described below, from v2.6.15-rc1, seq_open() could use a struct seq_file already allocated by the caller if the pointer to the structure is stored in file->private_data before calling the function. Commit 1abe77b0fc4b485927f1f798ae81a752677e1d05 Author: Al Viro <[email protected]> Date: Mon Nov 7 17:15:34 2005 -0500 [PATCH] allow callers of seq_open do allocation themselves Allow caller of seq_open() to kmalloc() seq_file + whatever else they want and set ->private_data to it. seq_open() will then abstain from doing allocation itself. As there's no more use for such feature, as it could be easily replaced by calls to seq_open_private() (see commit 39699037a5c9 ("[FS] seq_file: Introduce the seq_open_private()")) and seq_release_private() (see v2.6.0-test3), support for this uncommon feature can be removed from seq_open(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yann Droneaud <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30fs: use seq_open_private() for proc_mountsYann Droneaud3-24/+19
A patchset to remove support for passing pre-allocated struct seq_file to seq_open(). Such feature is undocumented and prone to error. In particular, if seq_release() is used in release handler, it will kfree() a pointer which was not allocated by seq_open(). So this patchset drops support for pre-allocated struct seq_file: it's only of use in proc_namespace.c and can be easily replaced by using seq_open_private()/seq_release_private(). Additionally, it documents the use of file->private_data to hold pointer to struct seq_file by seq_open(). This patch (of 3): Since patch described below, from v2.6.15-rc1, seq_open() could use a struct seq_file already allocated by the caller if the pointer to the structure is stored in file->private_data before calling the function. Commit 1abe77b0fc4b485927f1f798ae81a752677e1d05 Author: Al Viro <[email protected]> Date: Mon Nov 7 17:15:34 2005 -0500 [PATCH] allow callers of seq_open do allocation themselves Allow caller of seq_open() to kmalloc() seq_file + whatever else they want and set ->private_data to it. seq_open() will then abstain from doing allocation itself. Such behavior is only used by mounts_open_common(). In order to drop support for such uncommon feature, proc_mounts is converted to use seq_open_private(), which take care of allocating the proc_mounts structure, making it available through ->private in struct seq_file. Conversely, proc_mounts is converted to use seq_release_private(), in order to release the private structure allocated by seq_open_private(). Then, ->private is used directly instead of proc_mounts() macro to access to the proc_mounts structure. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yann Droneaud <[email protected]> Cc: Al Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30mm: meminit: finish initialisation of struct pages before basic setupMel Gorman5-37/+49
Waiman Long reported that 24TB machines hit OOM during basic setup when struct page initialisation was deferred. One approach is to initialise memory on demand but it interferes with page allocator paths. This patch creates dedicated threads to initialise memory before basic setup. It then blocks on a rw_semaphore until completion as a wait_queue and counter is overkill. This may be slower to boot but it's simplier overall and also gets rid of a section mangling which existed so kswapd could do the initialisation. [[email protected]: include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast] Signed-off-by: Mel Gorman <[email protected]> Cc: Waiman Long <[email protected] Cc: Nathan Zimmer <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Scott Norton <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30mm: meminit: remove mminit_verify_page_linksMel Gorman3-17/+0
mminit_verify_page_links() is an extremely paranoid check that was introduced when memory initialisation was being heavily reworked. Profiles indicated that up to 10% of parallel memory initialisation was spent on checking this for every page. The cost could be reduced but in practice this check only found problems very early during the initialisation rewrite and has found nothing since. This patch removes an expensive unnecessary check. Signed-off-by: Mel Gorman <[email protected]> Tested-by: Nate Zimmer <[email protected]> Tested-by: Waiman Long <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Robin Holt <[email protected]> Cc: Nate Zimmer <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30mm: meminit: reduce number of times pageblocks are set during struct page initMel Gorman1-22/+24
During parallel sturct page initialisation, ranges are checked for every PFN unnecessarily which increases boot times. This patch alters when the ranges are checked. Signed-off-by: Mel Gorman <[email protected]> Tested-by: Nate Zimmer <[email protected]> Tested-by: Waiman Long <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Robin Holt <[email protected]> Cc: Nate Zimmer <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30mm: meminit: free pages in large chunks where possibleMel Gorman1-6/+49
Parallel struct page frees pages one at a time. Try free pages as single large pages where possible. Signed-off-by: Mel Gorman <[email protected]> Tested-by: Nate Zimmer <[email protected]> Tested-by: Waiman Long <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Robin Holt <[email protected]> Cc: Nate Zimmer <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30x86: mm: enable deferred struct page initialisation on x86-64Mel Gorman1-0/+1
Subject says it all. Other architectures may enable on a case-by-case basis after auditing early_pfn_to_nid and testing. Signed-off-by: Mel Gorman <[email protected]> Tested-by: Nate Zimmer <[email protected]> Tested-by: Waiman Long <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Robin Holt <[email protected]> Cc: Nate Zimmer <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30mm: meminit: minimise number of pfn->page lookups during initialisationMel Gorman1-5/+24
Deferred struct page initialisation is using pfn_to_page() on every PFN unnecessarily. This patch minimises the number of lookups and scheduler checks. Signed-off-by: Mel Gorman <[email protected]> Tested-by: Nate Zimmer <[email protected]> Tested-by: Waiman Long <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Robin Holt <[email protected]> Cc: Nate Zimmer <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30mm: meminit: initialise remaining struct pages in parallel with kswapdMel Gorman4-6/+130
Only a subset of struct pages are initialised at the moment. When this patch is applied kswapd initialise the remaining struct pages in parallel. This should boot faster by spreading the work to multiple CPUs and initialising data that is local to the CPU. The user-visible effect on large machines is that free memory will appear to rapidly increase early in the lifetime of the system until kswapd reports that all memory is initialised in the kernel log. Once initialised there should be no other user-visibile effects. Signed-off-by: Mel Gorman <[email protected]> Tested-by: Nate Zimmer <[email protected]> Tested-by: Waiman Long <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Robin Holt <[email protected]> Cc: Nate Zimmer <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30mm: meminit: initialise a subset of struct pages if ↵Mel Gorman5-4/+124
CONFIG_DEFERRED_STRUCT_PAGE_INIT is set This patch initalises all low memory struct pages and 2G of the highest zone on each node during memory initialisation if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. That config option cannot be set but will be available in a later patch. Parallel initialisation of struct page depends on some features from memory hotplug and it is necessary to alter alter section annotations. Signed-off-by: Mel Gorman <[email protected]> Tested-by: Nate Zimmer <[email protected]> Tested-by: Waiman Long <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Robin Holt <[email protected]> Cc: Nate Zimmer <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30mm: meminit: inline some helper functionsMel Gorman2-46/+52
early_pfn_in_nid() and meminit_pfn_in_nid() are small functions that are unnecessarily visible outside memory initialisation. As well as unnecessary visibility, it's unnecessary function call overhead when initialising pages. This patch moves the helpers inline. [[email protected]: fix build] [[email protected]: fix build] Signed-off-by: Mel Gorman <[email protected]> Tested-by: Nate Zimmer <[email protected]> Tested-by: Waiman Long <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Robin Holt <[email protected]> Cc: Nate Zimmer <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30mm: meminit: make __early_pfn_to_nid SMP-safe and introduce meminit_pfn_in_nidMel Gorman4-30/+51
__early_pfn_to_nid() use static variables to cache recent lookups as memblock lookups are very expensive but it assumes that memory initialisation is single-threaded. Parallel initialisation of struct pages will break that assumption so this patch makes __early_pfn_to_nid() SMP-safe by requiring the caller to cache recent search information. early_pfn_to_nid() keeps the same interface but is only safe to use early in boot due to the use of a global static variable. meminit_pfn_in_nid() is an SMP-safe version that callers must maintain their own state for. Signed-off-by: Mel Gorman <[email protected]> Tested-by: Nate Zimmer <[email protected]> Tested-by: Waiman Long <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Robin Holt <[email protected]> Cc: Nate Zimmer <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30mm: page_alloc: pass PFN to __free_pages_bootmemMel Gorman5-11/+14
__free_pages_bootmem prepares a page for release to the buddy allocator and assumes that the struct page is initialised. Parallel initialisation of struct pages defers initialisation and __free_pages_bootmem can be called for struct pages that cannot yet map struct page to PFN. This patch passes PFN to __free_pages_bootmem with no other functional change. Signed-off-by: Mel Gorman <[email protected]> Tested-by: Nate Zimmer <[email protected]> Tested-by: Waiman Long <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Robin Holt <[email protected]> Cc: Nate Zimmer <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30mm: meminit: only set page reserved in the memblock regionNathan Zimmer3-1/+21
Currently each page struct is set as reserved upon initialization. This patch leaves the reserved bit clear and only sets the reserved bit when it is known the memory was allocated by the bootmem allocator. This makes it easier to distinguish between uninitialised struct pages and reserved struct pages in later patches. Signed-off-by: Robin Holt <[email protected]> Signed-off-by: Nathan Zimmer <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Tested-by: Nate Zimmer <[email protected]> Tested-by: Waiman Long <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Robin Holt <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30mm: meminit: move page initialization into a separate functionRobin Holt1-33/+46
Currently, memmap_init_zone() has all the smarts for initializing a single page. A subset of this is required for parallel page initialisation and so this patch breaks up the monolithic function in preparation. Signed-off-by: Robin Holt <[email protected]> Signed-off-by: Nathan Zimmer <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Tested-by: Nate Zimmer <[email protected]> Tested-by: Waiman Long <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Robin Holt <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30memblock: introduce a for_each_reserved_mem_region iteratorRobin Holt2-0/+50
Struct page initialisation had been identified as one of the reasons why large machines take a long time to boot. Patches were posted a long time ago to defer initialisation until they were first used. This was rejected on the grounds it should not be necessary to hurt the fast paths. This series reuses much of the work from that time but defers the initialisation of memory to kswapd so that one thread per node initialises memory local to that node. After applying the series and setting the appropriate Kconfig variable I see this in the boot log on a 64G machine [ 7.383764] kswapd 0 initialised deferred memory in 188ms [ 7.404253] kswapd 1 initialised deferred memory in 208ms [ 7.411044] kswapd 3 initialised deferred memory in 216ms [ 7.411551] kswapd 2 initialised deferred memory in 216ms On a 1TB machine, I see [ 8.406511] kswapd 3 initialised deferred memory in 1116ms [ 8.428518] kswapd 1 initialised deferred memory in 1140ms [ 8.435977] kswapd 0 initialised deferred memory in 1148ms [ 8.437416] kswapd 2 initialised deferred memory in 1148ms Once booted the machine appears to work as normal. Boot times were measured from the time shutdown was called until ssh was available again. In the 64G case, the boot time savings are negligible. On the 1TB machine, the savings were 16 seconds. Nate Zimmer said: : On an older 8 TB box with lots and lots of cpus the boot time, as : measure from grub to login prompt, the boot time improved from 1484 : seconds to exactly 1000 seconds. Waiman Long said: : I ran a bootup timing test on a 12-TB 16-socket IvyBridge-EX system. From : grub menu to ssh login, the bootup time was 453s before the patch and 265s : after the patch - a saving of 188s (42%). Daniel Blueman said: : On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're seeing : stock 4.0 boot in 7136s. This drops to 2159s, or a 70% reduction with : this patchset. Non-temporal PMD init (https://lkml.org/lkml/2015/4/23/350) : drops this to 1045s. This patch (of 13): As part of initializing struct page's in 2MiB chunks, we noticed that at the end of free_all_bootmem(), there was nothing which had forced the reserved/allocated 4KiB pages to be initialized. This helper function will be used for that expansion. Signed-off-by: Robin Holt <[email protected]> Signed-off-by: Nate Zimmer <[email protected]> Signed-off-by: Mel Gorman <[email protected]> Tested-by: Nate Zimmer <[email protected]> Tested-by: Waiman Long <[email protected]> Tested-by: Daniel J Blueman <[email protected]> Acked-by: Pekka Enberg <[email protected]> Cc: Robin Holt <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Waiman Long <[email protected]> Cc: Scott Norton <[email protected]> Cc: "Luck, Tony" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2015-06-30sock_diag: don't broadcast kernel socketsCraig Gallek1-1/+1
Kernel sockets do not hold a reference for the network namespace to which they point. Socket destruction broadcasting relies on the network namespace and will cause the splat below when a kernel socket is destroyed. This fix simply ignores kernel sockets when they are destroyed. Reported as: general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC CPU: 1 PID: 9130 Comm: kworker/1:1 Not tainted 4.1.0-gelk-debug+ #1 Workqueue: sock_diag_events sock_diag_broadcast_destroy_work Stack: ffff8800b9c586c0 ffff8800b9c586c0 ffff8800ac4692c0 ffff8800936d4a90 ffff8800352efd38 ffffffff8469a93e ffff8800352efd98 ffffffffc09b9b90 ffff8800352efd78 ffff8800ac4692c0 ffff8800b9c586c0 ffff8800831b6ab8 Call Trace: [<ffffffff8469a93e>] ? mutex_unlock+0xe/0x10 [<ffffffffc09b9b90>] ? inet_diag_handler_get_info+0x110/0x1fb [inet_diag] [<ffffffff845c868d>] netlink_broadcast+0x1d/0x20 [<ffffffff8469a93e>] ? mutex_unlock+0xe/0x10 [<ffffffff845b2bf5>] sock_diag_broadcast_destroy_work+0xd5/0x160 [<ffffffff8408ea97>] process_one_work+0x147/0x420 [<ffffffff8408f0f9>] worker_thread+0x69/0x470 [<ffffffff8409fda3>] ? preempt_count_sub+0xa3/0xf0 [<ffffffff8408f090>] ? rescuer_thread+0x320/0x320 [<ffffffff84093cd7>] kthread+0x107/0x120 [<ffffffff84093bd0>] ? kthread_create_on_node+0x1b0/0x1b0 [<ffffffff8469d31f>] ret_from_fork+0x3f/0x70 [<ffffffff84093bd0>] ? kthread_create_on_node+0x1b0/0x1b0 Tested: Using a debug kernel while 'ss -E' is running: ip netns add test-ns ip netns delete test-ns Fixes: eb4cb008529c sock_diag: define destruction multicast groups Fixes: 26abe14379f8 net: Modify sk_alloc to not reference count the netns of kernel sockets. Reported-by: Dave Jones <[email protected]> Suggested-by: Eric Dumazet <[email protected]> Signed-off-by: Craig Gallek <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2015-06-30Merge branch 'mvneta-jumbo-frames'David S. Miller7-7/+46
Simon Guinot says: ==================== Fix Ethernet jumbo frames support for Armada 370 and 38x This patch series fixes the Ethernet jumbo frames support for the SoCs Armada 370, 380 and 385. Unlike Armada XP, the Ethernet controller for this SoCs don't support TCP/IP checksumming with a frame size larger than 1600 bytes. This patches should be applied to the -stable kernels 3.8 and onwards. Changes since v1: - Use a new compatible string for the Ethernet IP found in Armada XP SoCs (instead of using an optional property). - Fix the issue for the Armada 380 and 385 SoCs as well. Changes since v2: - Add Acked-by from Gregory Clement. - Add "Fixes:" tag to each commits. Changes since v3: - Fix patch 3 name: replace prefix "ARM: mvebu:" with "net: mvneta:". ==================== Signed-off-by: David S. Miller <[email protected]>
2015-06-30net: mvneta: disable IP checksum with jumbo frames for Armada 370Simon Guinot1-1/+25
The Ethernet controller found in the Armada 370, 380 and 385 SoCs don't support TCP/IP checksumming with frame sizes larger than 1600 bytes. This patch fixes the issue by disabling the features NETIF_F_IP_CSUM and NETIF_F_TSO for the Armada 370 and compatibles SoCs when the MTU is set to a value greater than 1600 bytes. Signed-off-by: Simon Guinot <[email protected]> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Cc: <[email protected]> # v3.8+ Acked-by: Thomas Petazzoni <[email protected]> Signed-off-by: David S. Miller <[email protected]>