aboutsummaryrefslogtreecommitdiff
path: root/ipc
AgeCommit message (Collapse)AuthorFilesLines
2017-09-08ipc: optimize semget/shmget/msgget for lots of keysGuillaume Knispel6-50/+124
ipc_findkey() used to scan all objects to look for the wanted key. This is slow when using a high number of keys. This change adds an rhashtable of kern_ipc_perm objects in ipc_ids, so that one lookup cease to be O(n). This change gives a 865% improvement of benchmark reaim.jobs_per_min on a 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory [1] Other (more micro) benchmark results, by the author: On an i5 laptop, the following loop executed right after a reboot took, without and with this change: for (int i = 0, k=0x424242; i < KEYS; ++i) semget(k++, 1, IPC_CREAT | 0600); total total max single max single KEYS without with call without call with 1 3.5 4.9 µs 3.5 4.9 10 7.6 8.6 µs 3.7 4.7 32 16.2 15.9 µs 4.3 5.3 100 72.9 41.8 µs 3.7 4.7 1000 5,630.0 502.0 µs * * 10000 1,340,000.0 7,240.0 µs * * 31900 17,600,000.0 22,200.0 µs * * *: unreliable measure: high variance The duration for a lookup-only usage was obtained by the same loop once the keys are present: total total max single max single KEYS without with call without call with 1 2.1 2.5 µs 2.1 2.5 10 4.5 4.8 µs 2.2 2.3 32 13.0 10.8 µs 2.3 2.8 100 82.9 25.1 µs * 2.3 1000 5,780.0 217.0 µs * * 10000 1,470,000.0 2,520.0 µs * * 31900 17,400,000.0 7,810.0 µs * * Finally, executing each semget() in a new process gave, when still summing only the durations of these syscalls: creation: total total KEYS without with 1 3.7 5.0 µs 10 32.9 36.7 µs 32 125.0 109.0 µs 100 523.0 353.0 µs 1000 20,300.0 3,280.0 µs 10000 2,470,000.0 46,700.0 µs 31900 27,800,000.0 219,000.0 µs lookup-only: total total KEYS without with 1 2.5 2.7 µs 10 25.4 24.4 µs 32 106.0 72.6 µs 100 591.0 352.0 µs 1000 22,400.0 2,250.0 µs 10000 2,510,000.0 25,700.0 µs 31900 28,200,000.0 115,000.0 µs [1] http://lkml.kernel.org/r/20170814060507.GE23258@yexl-desktop Link: http://lkml.kernel.org/r/20170815194954.ck32ta2z35yuzpwp@debix Signed-off-by: Guillaume Knispel <[email protected]> Reviewed-by: Marc Pardo <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Kees Cook <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: "Peter Zijlstra (Intel)" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Andrey Vagin <[email protected]> Cc: Guillaume Knispel <[email protected]> Cc: Marc Pardo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-09-08ipc/sem: play nicer with large nsops allocationsDavidlohr Bueso1-2/+2
Replacing semop()'s kmalloc for kvmalloc was originally proposed by Manfred on the premise that it can be called for large (than order-1) sizes. For example, while Oracle recommends setting SEMOPM to a _minimum_ of 100, some distros[1] encourage the setting to be a factor of the amount of db tasks (PROCESSES), which can get fishy for large systems (easily going beyond 1000). [1] An Example of Semaphore Settings https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/sect-Oracle_9i_and_10g_Tuning_Guide-Setting_Semaphores-An_Example_of_Semaphore_Settings.html So let's just convert this to kvmalloc, just like the rest of the allocations we do in ipc. While the fallback vmalloc obviously involves more overhead, this by far the uncommon path, and it's better for the user than just erroring out with kmalloc. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-09-08ipc/sem: drop sem_checkid helperDavidlohr Bueso1-2/+0
... 'tis not used. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-09-08ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_tElena Reshetova1-3/+3
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: David Windsor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-09-08ipc: convert sem_undo_list.refcnt from atomic_t to refcount_tElena Reshetova1-4/+4
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: David Windsor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-09-08ipc: convert ipc_namespace.count from atomic_t to refcount_tElena Reshetova2-3/+3
refcount_t type and corresponding API should be used instead of atomic_t when the variable is used as a reference counter. This allows to avoid accidental refcounter overflows that might lead to use-after-free situations. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Hans Liljestrand <[email protected]> Signed-off-by: Kees Cook <[email protected]> Signed-off-by: David Windsor <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-21Merge branch 'for-mingo' of ↵Ingo Molnar1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Removal of spin_unlock_wait() - SRCU updates - Torture-test updates - Documentation updates - Miscellaneous fixes - CPU-hotplug fixes - Miscellaneous non-RCU fixes Signed-off-by: Ingo Molnar <[email protected]>
2017-08-17ipc: Replace spin_unlock_wait() with lock/unlock pairPaul E. McKenney1-1/+2
There is no agreed-upon definition of spin_unlock_wait()'s semantics, and it appears that all callers could do just as well with a lock/unlock pair. This commit therefore replaces the spin_unlock_wait() call in exit_sem() with spin_lock() followed immediately by spin_unlock(). This should be safe from a performance perspective because exit_sem() is rarely invoked in production. Signed-off-by: Paul E. McKenney <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Will Deacon <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Alan Stern <[email protected]> Cc: Andrea Parri <[email protected]> Cc: Linus Torvalds <[email protected]> Acked-by: Manfred Spraul <[email protected]>
2017-08-02ipc: add missing container_of()s for randstructKees Cook3-3/+7
When building with the randstruct gcc plugin, the layout of the IPC structs will be randomized, which requires any sub-structure accesses to use container_of(). The proc display handlers were missing the needed container_of()s since the iterator is passing in the top-level struct kern_ipc_perm. This would lead to crashes when running the "lsipc" program after the system had IPC registered (e.g. after starting up Gnome): general protection fault: 0000 [#1] PREEMPT SMP ... RIP: 0010:shm_add_rss_swap.isra.1+0x13/0xa0 ... Call Trace: sysvipc_shm_proc_show+0x5e/0x150 sysvipc_proc_show+0x1a/0x30 seq_read+0x2e9/0x3f0 ... Link: http://lkml.kernel.org/r/20170730205950.GA55841@beast Fixes: 3859a271a003 ("randstruct: Mark various structs for randomization") Signed-off-by: Kees Cook <[email protected]> Reported-by: Dominik Brodowski <[email protected]> Acked-by: Davidlohr Bueso <[email protected]> Acked-by: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/util.h: update documentation for ipc_getref() and ipc_putref()Manfred Spraul1-0/+3
Now that ipc_rcu_alloc() and ipc_rcu_free() are removed, document when it is valid to use ipc_getref() and ipc_putref(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/sem: drop __sem_free()Kees Cook1-7/+2
The remaining users of __sem_free() can simply call kvfree() instead for better readability. [[email protected]: Rediff to keep rcu protection for security_sem_alloc()] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/msg: remove special msg_alloc/freeKees Cook1-20/+4
There is nothing special about the msg_alloc/free routines any more, so remove them to make code more readable. [[email protected]: Rediff to keep rcu protection for security_msg_queue_alloc()] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/shm: remove special shm_alloc/freeKees Cook1-20/+4
There is nothing special about the shm_alloc/free routines any more, so remove them to make code more readable. [[email protected]: Rediff, to continue to keep rcu for free calls after a successful security_shm_alloc()] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc: move atomic_set() to where it is neededKees Cook4-5/+1
Only after ipc_addid() has succeeded will refcounting be used, so move initialization into ipc_addid() and remove from open-coded *_alloc() routines. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/msg.c: avoid ipc_rcu_putref for failed ipc_addid()Manfred Spraul1-5/+5
Loosely based on a patch from Kees Cook <[email protected]>: - id and retval can be merged - if ipc_addid() fails, then use call_rcu() directly. The difference is that call_rcu is used for failed ipc_addid() calls, to continue to guaranteed an rcu delay for security_msg_queue_free(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Manfred Spraul <[email protected]> Cc: Kees Cook <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/shm.c: avoid ipc_rcu_putref for failed ipc_addid()Manfred Spraul1-6/+3
Loosely based on a patch from Kees Cook <[email protected]>: - id and error can be merged - if operations before ipc_addid() fail, then use call_rcu() directly. The difference is that call_rcu is used for failures after security_shm_alloc(), to continue to guaranteed an rcu delay for security_sem_free(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Manfred Spraul <[email protected]> Cc: Kees Cook <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/sem.c: avoid ipc_rcu_putref for failed ipc_addid()Manfred Spraul1-5/+4
Loosely based on a patch from Kees Cook <[email protected]>: - id and retval can be merged - if ipc_addid() fails, then use call_rcu() directly. The difference is that call_rcu is used for failed ipc_addid() calls, to continue to guaranteed an rcu delay for security_sem_free(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Manfred Spraul <[email protected]> Cc: Kees Cook <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/util: drop ipc_rcu_alloc()Kees Cook2-24/+0
No callers remain for ipc_rcu_alloc(). Drop the function. [[email protected]: Rediff because the memset was temporarily inside ipc_rcu_free()] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/msg: avoid ipc_rcu_alloc()Kees Cook1-4/+14
Instead of using ipc_rcu_alloc() which only performs the refcount bump, open code it. This also allows for msg_queue structure layout to be randomized in the future. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/shm: avoid ipc_rcu_alloc()Kees Cook1-4/+14
Instead of using ipc_rcu_alloc() which only performs the refcount bump, open code it. This also allows for shmid_kernel structure layout to be randomized in the future. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/sem: avoid ipc_rcu_alloc()Kees Cook1-5/+20
Instead of using ipc_rcu_alloc() which only performs the refcount bump, open code it to perform better sem-specific checks. This also allows for sem_array structure layout to be randomized in the future. [[email protected]: Rediff, because the memset was temporarily inside ipc_rcu_alloc()] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/util: drop ipc_rcu_free()Kees Cook2-8/+0
There are no more callers of ipc_rcu_free(), so remove it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/msg: do not use ipc_rcu_free()Kees Cook1-2/+7
Avoid using ipc_rcu_free, since it just re-finds the original structure pointer. For the pre-list-init failure path, there is no RCU needed, since it was just allocated. It can be directly freed. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/shm: do not use ipc_rcu_free()Kees Cook1-2/+7
Avoid using ipc_rcu_free, since it just re-finds the original structure pointer. For the pre-list-init failure path, there is no RCU needed, since it was just allocated. It can be directly freed. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/sem: do not use ipc_rcu_free()Kees Cook1-2/+7
Avoid using ipc_rcu_free, since it just re-finds the original structure pointer. For the pre-list-init failure path, there is no RCU needed, since it was just allocated. It can be directly freed. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc: drop non-RCU allocationKees Cook3-33/+6
The only users of ipc_alloc() were ipc_rcu_alloc() and the on-heap sem_io fall-back memory. Better to just open-code these to make things easier to read. [[email protected]: Rediff due to inclusion of memset() into ipc_rcu_alloc()] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc: merge ipc_rcu and kern_ipc_permManfred Spraul5-61/+63
ipc has two management structures that exist for every id: - struct kern_ipc_perm, it contains e.g. the permissions. - struct ipc_rcu, it contains the rcu head for rcu handling and the refcount. The patch merges both structures. As a bonus, we may save one cacheline, because both structures are cacheline aligned. In addition, it reduces the number of casts, instead most codepaths can use container_of. To simplify code, the ipc_rcu_alloc initializes the allocation to 0. [[email protected]: really include the memset() into ipc_alloc_rcu()] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Manfred Spraul <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Kees Cook <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-12ipc/sem.c: remove sem_base, embed struct semManfred Spraul1-54/+34
sma->sem_base is initialized with sma->sem_base = (struct sem *) &sma[1]; The current code has four problems: - There is an unnecessary pointer dereference - sem_base is not needed. - Alignment for struct sem only works by chance. - The current code causes false positive for static code analysis. - This is a cast between different non-void types, which the future randstruct GCC plugin warns on. And, as bonus, the code size gets smaller: Before: 0 .text 00003770 After: 0 .text 0000374e [[email protected]: s/[0]/[]/, per hch] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Manfred Spraul <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Kees Cook <[email protected]> Cc: <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Fabian Frederick <[email protected]> Cc: Christoph Hellwig <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-07-09mqueue: fix a use-after-free in sys_mq_notify()Cong Wang1-1/+3
The retry logic for netlink_attachskb() inside sys_mq_notify() is nasty and vulnerable: 1) The sock refcnt is already released when retry is needed 2) The fd is controllable by user-space because we already release the file refcnt so we when retry but the fd has been just closed by user-space during this small window, we end up calling netlink_detachskb() on the error path which releases the sock again, later when the user-space closes this socket a use-after-free could be triggered. Setting 'sock' to NULL here should be sufficient to fix it. Reported-by: GeneBlue <[email protected]> Signed-off-by: Cong Wang <[email protected]> Cc: Andrew Morton <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2017-07-07Merge tag 'for-linus-v4.13-1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux Pull Writeback error handling fixes from Jeff Layton: "The main rationale for all of these changes is to tighten up writeback error reporting to userland. There are many ways now that writeback errors can be lost, such that fsync/fdatasync/msync return 0 when writeback actually failed. This pile contains a small set of cleanups and writeback error handling fixes that I was able to break off from the main pile (#2). Two of the patches in this pile are trivial. The exceptions are the patch to fix up error handling in write_one_page, and the patch to make JFS pay attention to write_one_page errors" * tag 'for-linus-v4.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux: fs: remove call_fsync helper function mm: clean up error handling in write_one_page JFS: do not ignore return code from write_one_page() mm: drop "wait" parameter from write_one_page()
2017-07-05fs: remove call_fsync helper functionJeff Layton1-1/+1
Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Reviewed-by: Carlos Maiolino <[email protected]> Signed-off-by: Jeff Layton <[email protected]>
2017-07-04mqueue: move compat syscalls to native onesAl Viro3-228/+262
... and stop messing with compat_alloc_user_space() and friends [braino fix from Colin King folded in] Signed-off-by: Al Viro <[email protected]>
2017-05-08mm: introduce kv[mz]alloc helpersMichal Hocko1-6/+1
Patch series "kvmalloc", v5. There are many open coded kmalloc with vmalloc fallback instances in the tree. Most of them are not careful enough or simply do not care about the underlying semantic of the kmalloc/page allocator which means that a) some vmalloc fallbacks are basically unreachable because the kmalloc part will keep retrying until it succeeds b) the page allocator can invoke a really disruptive steps like the OOM killer to move forward which doesn't sound appropriate when we consider that the vmalloc fallback is available. As it can be seen implementing kvmalloc requires quite an intimate knowledge if the page allocator and the memory reclaim internals which strongly suggests that a helper should be implemented in the memory subsystem proper. Most callers, I could find, have been converted to use the helper instead. This is patch 6. There are some more relying on __GFP_REPEAT in the networking stack which I have converted as well and Eric Dumazet was not opposed [2] to convert them as well. [1] http://lkml.kernel.org/r/[email protected] [2] http://lkml.kernel.org/r/1485273626.16328.301.camel@edumazet-glaptop3.roam.corp.google.com This patch (of 9): Using kmalloc with the vmalloc fallback for larger allocations is a common pattern in the kernel code. Yet we do not have any common helper for that and so users have invented their own helpers. Some of them are really creative when doing so. Let's just add kv[mz]alloc and make sure it is implemented properly. This implementation makes sure to not make a large memory pressure for > PAGE_SZE requests (__GFP_NORETRY) and also to not warn about allocation failures. This also rules out the OOM killer as the vmalloc is a more approapriate fallback than a disruptive user visible action. This patch also changes some existing users and removes helpers which are specific for them. In some cases this is not possible (e.g. ext4_kvmalloc, libcfs_kvzalloc) because those seems to be broken and require GFP_NO{FS,IO} context which is not vmalloc compatible in general (note that the page table allocation is GFP_KERNEL). Those need to be fixed separately. While we are at it, document that __vmalloc{_node} about unsupported gfp mask because there seems to be a lot of confusion out there. kvmalloc_node will warn about GFP_KERNEL incompatible (which are not superset) flags to catch new abusers. Existing ones would have to die slowly. [[email protected]: f2fs fixup] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Reviewed-by: Andreas Dilger <[email protected]> [ext4 part] Acked-by: Vlastimil Babka <[email protected]> Cc: John Hubbard <[email protected]> Cc: David Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-08ipc/shm: some shmat cleanupsDavidlohr Bueso1-9/+7
Clean up early flag and address some minutia. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-05-05Merge branch 'for-linus' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "This is a set of small fixes that were mostly stumbled over during more significant development. This proc fix and the fix to posix-timers are the most significant of the lot. There is a lot of good development going on but unfortunately it didn't quite make the merge window" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc: Fix unbalanced hard link numbers signal: Make kill_proc_info static rlimit: Properly call security_task_setrlimit signal: Remove unused definition of sig_user_definied ia64: Remove unused IA64_TASK_SIGHAND_OFFSET and IA64_SIGHAND_SIGLOCK_OFFSET ipc: Remove unused declaration of recompute_msgmni posix-timers: Correct sanity check in posix_cpu_nsleep sysctl: Remove dead register_sysctl_root
2017-04-17ipc: Remove unused declaration of recompute_msgmniEric W. Biederman1-2/+0
The function recompute_msgmni was removed a while ago but it is still declared in a header file remove it. Signed-off-by: "Eric W. Biederman" <[email protected]>
2017-04-02kernel-api.rst: fix a series of errors when parsing C files[email protected]1-5/+7
./lib/string.c:134: WARNING: Inline emphasis start-string without end-string. ./mm/filemap.c:522: WARNING: Inline interpreted text or phrase reference start-string without end-string. ./mm/filemap.c:1283: ERROR: Unexpected indentation. ./mm/filemap.c:3003: WARNING: Inline interpreted text or phrase reference start-string without end-string. ./mm/vmalloc.c:1544: WARNING: Inline emphasis start-string without end-string. ./mm/page_alloc.c:4245: ERROR: Unexpected indentation. ./ipc/util.c:676: ERROR: Unexpected indentation. ./drivers/pci/irq.c:35: WARNING: Block quote ends without a blank line; unexpected unindent. ./security/security.c:109: ERROR: Unexpected indentation. ./security/security.c:110: WARNING: Definition list ends without a blank line; unexpected unindent. ./block/genhd.c:275: WARNING: Inline strong start-string without end-string. ./block/genhd.c:283: WARNING: Inline strong start-string without end-string. ./include/linux/clk.h:134: WARNING: Inline emphasis start-string without end-string. ./include/linux/clk.h:134: WARNING: Inline emphasis start-string without end-string. ./ipc/util.c:477: ERROR: Unknown target name: "s". Signed-off-by: Mauro Carvalho Chehab <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Signed-off-by: Jonathan Corbet <[email protected]>
2017-03-03Merge branch 'WIP.sched-core-for-linus' of ↵Linus Torvalds4-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull sched.h split-up from Ingo Molnar: "The point of these changes is to significantly reduce the <linux/sched.h> header footprint, to speed up the kernel build and to have a cleaner header structure. After these changes the new <linux/sched.h>'s typical preprocessed size goes down from a previous ~0.68 MB (~22K lines) to ~0.45 MB (~15K lines), which is around 40% faster to build on typical configs. Not much changed from the last version (-v2) posted three weeks ago: I eliminated quirks, backmerged fixes plus I rebased it to an upstream SHA1 from yesterday that includes most changes queued up in -next plus all sched.h changes that were pending from Andrew. I've re-tested the series both on x86 and on cross-arch defconfigs, and did a bisectability test at a number of random points. I tried to test as many build configurations as possible, but some build breakage is probably still left - but it should be mostly limited to architectures that have no cross-compiler binaries available on kernel.org, and non-default configurations" * 'WIP.sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (146 commits) sched/headers: Clean up <linux/sched.h> sched/headers: Remove #ifdefs from <linux/sched.h> sched/headers: Remove the <linux/topology.h> include from <linux/sched.h> sched/headers, hrtimer: Remove the <linux/wait.h> include from <linux/hrtimer.h> sched/headers, x86/apic: Remove the <linux/pm.h> header inclusion from <asm/apic.h> sched/headers, timers: Remove the <linux/sysctl.h> include from <linux/timer.h> sched/headers: Remove <linux/magic.h> from <linux/sched/task_stack.h> sched/headers: Remove <linux/sched.h> from <linux/sched/init.h> sched/core: Remove unused prefetch_stack() sched/headers: Remove <linux/rculist.h> from <linux/sched.h> sched/headers: Remove the 'init_pid_ns' prototype from <linux/sched.h> sched/headers: Remove <linux/signal.h> from <linux/sched.h> sched/headers: Remove <linux/rwsem.h> from <linux/sched.h> sched/headers: Remove the runqueue_is_locked() prototype sched/headers: Remove <linux/sched.h> from <linux/sched/hotplug.h> sched/headers: Remove <linux/sched.h> from <linux/sched/debug.h> sched/headers: Remove <linux/sched.h> from <linux/sched/nohz.h> sched/headers: Remove <linux/sched.h> from <linux/sched/stat.h> sched/headers: Remove the <linux/gfp.h> include from <linux/sched.h> sched/headers: Remove <linux/rtmutex.h> from <linux/sched.h> ...
2017-03-02Merge branch 'for-linus' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile two from Al Viro: - orangefs fix - series of fs/namei.c cleanups from me - VFS stuff coming from overlayfs tree * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: orangefs: Use RCU for destroy_inode vfs: use helper for calling f_op->fsync() mm: use helper for calling f_op->mmap() vfs: use helpers for calling f_op->{read,write}_iter() vfs: pass type instead of fn to do_{loop,iter}_readv_writev() vfs: extract common parts of {compat_,}do_readv_writev() vfs: wrap write f_ops with file_{start,end}_write() vfs: deny copy_file_range() for non regular files vfs: deny fallocate() on directory vfs: create vfs helper vfs_tmpfile() namei.c: split unlazy_walk() namei.c: fold the check for DCACHE_OP_REVALIDATE into d_revalidate() lookup_fast(): clean up the logics around the fallback to non-rcu mode namei: fold unlazy_link() into its sole caller
2017-03-02Merge remote-tracking branch 'ovl/for-viro' into for-linusAl Viro1-2/+2
Overlayfs-related series from Miklos and Amir
2017-03-02sched/headers: Move the wake-queue types and interfaces from sched.h into ↵Ingo Molnar1-1/+0
<linux/sched/wake_q.h> Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-02sched/headers: Prepare to move the task_lock()/unlock() APIs to ↵Ingo Molnar1-0/+1
<linux/sched/task.h> But first update the code that uses these facilities with the new header. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-02sched/headers: Prepare to remove <linux/cred.h> inclusion from <linux/sched.h>Ingo Molnar1-0/+1
Add #include <linux/cred.h> dependencies to all .c files rely on sched.h doing that for them. Note that even if the count where we need to add extra headers seems high, it's still a net win, because <linux/sched.h> is included in over 2,200 files ... Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<linux/sched/user.h> We are going to split <linux/sched/user.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/user.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar1-0/+1
<linux/sched/signal.h> We are going to split <linux/sched/signal.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/signal.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-03-02sched/headers: Prepare for new header dependencies before moving code to ↵Ingo Molnar3-0/+3
<linux/sched/wake_q.h> We are going to split <linux/sched/wake_q.h> out of <linux/sched.h>, which will have to be picked up from other headers and a couple of .c files. Create a trivial placeholder <linux/sched/wake_q.h> file that just maps to <linux/sched.h> to make this patch obviously correct and bisectable. Include the new header in the files that are going to need it. Acked-by: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2017-02-27ipc/shm: Fix shmat mmap nil-page protectionDavidlohr Bueso1-4/+9
The issue is described here, with a nice testcase: https://bugzilla.kernel.org/show_bug.cgi?id=192931 The problem is that shmat() calls do_mmap_pgoff() with MAP_FIXED, and the address rounded down to 0. For the regular mmap case, the protection mentioned above is that the kernel gets to generate the address -- arch_get_unmapped_area() will always check for MAP_FIXED and return that address. So by the time we do security_mmap_addr(0) things get funky for shmat(). The testcase itself shows that while a regular user crashes, root will not have a problem attaching a nil-page. There are two possible fixes to this. The first, and which this patch does, is to simply allow root to crash as well -- this is also regular mmap behavior, ie when hacking up the testcase and adding mmap(... |MAP_FIXED). While this approach is the safer option, the second alternative is to ignore SHM_RND if the rounded address is 0, thus only having MAP_SHARED flags. This makes the behavior of shmat() identical to the mmap() case. The downside of this is obviously user visible, but does make sense in that it maintains semantics after the round-down wrt 0 address and mmap. Passes shm related ltp tests. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Reported-by: Gareth Evans <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-27ipc/mqueue: add missing sparse annotationLuc Van Oostenryck1-0/+1
Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Luc Van Oostenryck <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-27ipc/sem: add hysteresisManfred Spraul1-25/+61
sysv sem has two lock modes: One with per-semaphore locks, one lock mode with a single global lock for the whole array. When switching from the per-semaphore locks to the global lock, all per-semaphore locks must be scanned for ongoing operations. The patch adds a hysteresis for switching from the global lock to the per semaphore locks. This reduces how often the per-semaphore locks must be scanned. Compared to the initial patch, this is a simplified solution: Setting USE_GLOBAL_LOCK_HYSTERESIS to 1 restores the current behavior. In theory, a workload with exactly 10 simple sops and then one complex op now scales a bit worse, but this is pure theory: If there is concurrency, the it won't be exactly 10:1:10:1:10:1:... If there is no concurrency, then there is no need for scalability. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Manfred Spraul <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: <[email protected]> Cc: kernel test robot <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-27ipc/sem.c: avoid using spin_unlock_wait()Manfred Spraul1-22/+3
a) The ACQUIRE in spin_lock() applies to the read, not to the store, at least for powerpc. This forces to add a smp_mb() into the fast path. b) The memory barrier provided by spin_unlock_wait() is right now arch dependent. Therefore: Use spin_lock()/spin_unlock() instead of spin_unlock_wait(). Advantage: faster single op semop calls(), observed +8.9% on x86. (the other solution would be arch dependencies in ipc/sem). Disadvantage: slower complex op semop calls, if (and only if) there are no sleeping operations. The next patch adds hysteresis, this further reduces the probability that the slow path is used. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Manfred Spraul <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: H. Peter Anvin <[email protected]> Cc: <[email protected]> Cc: kernel test robot <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>