aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-01-20mm: memcontrol: account "kmem" consumers in cgroup2 memory controllerJohannes Weiner1-7/+11
The original cgroup memory controller has an extension to account slab memory (and other "kernel memory" consumers) in a separate "kmem" counter, once the user set an explicit limit on that "kmem" pool. However, this includes various consumers whose sizes are directly linked to userspace activity. Accounting them as an optional "kmem" extension is problematic for several reasons: 1. It leaves the main memory interface with incomplete semantics. A user who puts their workload into a cgroup and configures a memory limit does not expect us to leave holes in the containment as big as the dentry and inode cache, or the kernel stack pages. 2. If the limit set on this random historical subgroup of consumers is reached, subsequent allocations will fail even when the main memory pool available to the cgroup is not yet exhausted and/or has reclaimable memory in it. 3. Calling it 'kernel memory' is misleading. The dentry and inode caches are no more 'kernel' (or no less 'user') memory than the page cache itself. Treating these consumers as different classes is a historical implementation detail that should not leak to users. So, in addition to page cache, anonymous memory, and network socket memory, account the following memory consumers per default in the cgroup2 memory controller: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This should give us reasonable memory isolation for most common workloads out of the box. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20mm: memcontrol: move kmem accounting code to CONFIG_MEMCGJohannes Weiner11-57/+72
The cgroup2 memory controller will account important in-kernel memory consumers per default. Move all necessary components to CONFIG_MEMCG. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Arnd Bergmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20mm: memcontrol: separate kmem code from legacy tcp accounting codeJohannes Weiner1-21/+12
The cgroup2 memory controller will include important in-kernel memory consumers per default, including socket memory, but it will no longer carry the historic tcp control interface. Separate the kmem state init from the tcp control interface init in preparation for that. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20mm: memcontrol: group kmem init and exit functions togetherJohannes Weiner1-81/+76
Put all the related code to setup and teardown the kmem accounting state into the same location. No functional change intended. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20mm: memcontrol: give the kmem states more descriptive namesJohannes Weiner4-35/+38
On any given memcg, the kmem accounting feature has three separate states: not initialized, structures allocated, and actively accounting slab memory. These are represented through a combination of the kmem_acct_activated and kmem_acct_active flags, which is confusing. Convert to a kmem_state enum with the states NONE, ALLOCATED, and ONLINE. Then rename the functions to modify the state accordingly. This follows the nomenclature of css object states more closely. Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20mm: memcontrol: remove double kmem page_counter initJohannes Weiner1-14/+10
The kmem page_counter's limit is initialized to PAGE_COUNTER_MAX inside mem_cgroup_css_online(). There is no need to repeat this from memcg_propagate_kmem(). Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20mm: memcontrol: drop unused @css argument in memcg_init_kmemJohannes Weiner3-5/+6
This series adds accounting of the historical "kmem" memory consumers to the cgroup2 memory controller. These consumers include the dentry cache, the inode cache, kernel stack pages, and a few others that are pointed out in patch 7/8. The footprint of these consumers is directly tied to userspace activity in common workloads, and so they have to be part of the minimally viable configuration in order to present a complete feature to our users. The cgroup2 interface of the memory controller is far from complete, but this series, along with the socket memory accounting series, provides the final semantic changes for the existing memory knobs in the cgroup2 interface, which is scheduled for initial release in the next merge window. This patch (of 8): Remove unused css argument frmo memcg_init_kmem() Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tejun Heo <[email protected]> Acked-by: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20memstick: use sector_div instead of do_divArnd Bergmann1-1/+1
do_div is the wrong way to divide a sector_t, as it is less efficient when sector_t is 32-bit wide. With the upcoming do_div optimizations, the kernel starts warning about this: drivers/memstick/core/ms_block.c: In function 'msb_io_work': include/asm-generic/div64.h:207:28: warning: comparison of distinct pointer types lacks a cast This changes the code to use sector_div instead, which always produces optimal code. Signed-off-by: Arnd Bergmann <[email protected]> Cc: Maxim Levitsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20dma-mapping: use offset_in_page macroGeliang Tang1-2/+2
Use offset_in_page macro instead of (addr & ~PAGE_MASK). Signed-off-by: Geliang Tang <[email protected]> Acked-by: Will Deacon <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20dma-mapping: remove <asm-generic/dma-coherent.h>Christoph Hellwig4-42/+29
This wasn't an asm-generic header to start with, and can be merged into dma-mapping.h trivially. Signed-off-by: Christoph Hellwig <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Aurelien Jacquiot <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: David Howells <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Haavard Skinnemoen <[email protected]> Cc: Hans-Christian Egtvedt <[email protected]> Cc: Helge Deller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Koichi Yasutake <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Mikael Starvik <[email protected]> Cc: Steven Miao <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20dma-mapping: always provide the dma_map_ops based implementationChristoph Hellwig70-633/+369
Move the generic implementation to <linux/dma-mapping.h> now that all architectures support it and remove the HAVE_DMA_ATTR Kconfig symbol now that everyone supports them. [[email protected]: remove leftovers in Kconfig] Signed-off-by: Christoph Hellwig <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Aurelien Jacquiot <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: David Howells <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Haavard Skinnemoen <[email protected]> Cc: Hans-Christian Egtvedt <[email protected]> Cc: Helge Deller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Koichi Yasutake <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Mark Salter <[email protected]> Cc: Mikael Starvik <[email protected]> Cc: Steven Miao <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Valentin Rothberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20tile: uninline dma_set_maskChristoph Hellwig2-28/+30
We'll soon merge <asm-generic/dma-mapping-common.h> into <linux/dma-mapping.h> and the reference to dma_capable in the tile dma_set_mask would create a circular dependency. Fix this by moving the implementation out of line. Signed-off-by: Christoph Hellwig <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20sparc: use generic dma_set_maskChristoph Hellwig1-15/+0
Sparc already uses the same code as the generic code for the PCI implementation but just fails the call sbus. This moves to the generic implemenation which eventually return -EIO due to the NULL dma_mask pointer in the device. Signed-off-by: Christoph Hellwig <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20metag: convert to dma_map_opsChristoph Hellwig3-209/+117
Signed-off-by: Christoph Hellwig <[email protected]> Cc: James Hogan <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20m68k: convert to dma_map_opsChristoph Hellwig3-142/+32
Signed-off-by: Christoph Hellwig <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20mn10300: convert to dma_map_opsChristoph Hellwig3-163/+67
Signed-off-by: Christoph Hellwig <[email protected]> Cc: David Howells <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20parisc: convert to dma_map_opsChristoph Hellwig6-270/+124
Signed-off-by: Christoph Hellwig <[email protected]> Tested-by: Helge Deller <[email protected]> Acked-by: Helge Deller <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20frv: convert to dma_map_opsChristoph Hellwig4-179/+101
Signed-off-by: Christoph Hellwig <[email protected]> Cc: David Howells <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20nios2: convert to dma_map_opsChristoph Hellwig3-186/+87
Signed-off-by: Christoph Hellwig <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20cris: convert to dma_map_opsChristoph Hellwig3-165/+51
Signed-off-by: Christoph Hellwig <[email protected]> Cc: Mikael Starvik <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20c6x: convert to dma_map_opsChristoph Hellwig4-147/+58
[[email protected]: C6X: fix build breakage] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Mark Salter <[email protected]> Cc: Aurelien Jacquiot <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20blackfin: convert to dma_map_opsChristoph Hellwig3-137/+43
Signed-off-by: Christoph Hellwig <[email protected]> Cc: Steven Miao <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20avr32: convert to dma_map_opsChristoph Hellwig3-373/+85
Signed-off-by: Christoph Hellwig <[email protected]> Cc: Haavard Skinnemoen <[email protected]> Cc: Hans-Christian Egtvedt <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20arc: convert to dma_map_opsChristoph Hellwig3-230/+110
[[email protected]: ARC: dma mapping fixes #2] Signed-off-by: Christoph Hellwig <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Vineet Gupta <[email protected]> Cc: Carlos Palminha <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20dma-mapping: make the generic coherent dma mmap implementation optionalChristoph Hellwig2-2/+5
This series converts all remaining architectures to use dma_map_ops and the generic implementation of the DMA API. This not only simplifies the code a lot, but also prepares for possible future changes like more generic non-iommu dma_ops implementations or generic per-device dma_map_ops. This patch (of 16): We have a couple architectures that do not want to support this code, so add another Kconfig symbol that disables the code similar to what we do for the nommu case. Signed-off-by: Christoph Hellwig <[email protected]> Cc: Haavard Skinnemoen <[email protected]> Cc: Hans-Christian Egtvedt <[email protected]> Cc: Steven Miao <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: David Howells <[email protected]> Cc: Koichi Yasutake <[email protected]> Cc: Chris Metcalf <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Aurelien Jacquiot <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Helge Deller <[email protected]> Cc: James Hogan <[email protected]> Cc: Jesper Nilsson <[email protected]> Cc: Mark Salter <[email protected]> Cc: Mikael Starvik <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Sebastian Ott <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20iio: core: fix ptr_ret.cocci warningsFengguang Wu1-3/+1
drivers/iio/industrialio-sw-trigger.c:169:1-3: WARNING: PTR_ERR_OR_ZERO can be used Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR Generated by: scripts/coccinelle/api/ptr_ret.cocci Signed-off-by: Fengguang Wu <[email protected]> Cc: Joel Becker <[email protected]> Cc: Lars-Peter Clausen <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Hartmut Knaack <[email protected]> Cc: Octavian Purdila <[email protected]> Cc: Paul Bolle <[email protected]> Cc: Adriana Reus <[email protected]> Cc: Daniel Baluta <[email protected]> Cc: Cristina Opriceana <[email protected]> Cc: Peter Meerwald <[email protected]> Cc: Alexander Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20fs/adfs/adfs.h: tidy up commentsAndrew Morton1-14/+14
Lots of needless 80-col overflows. Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20fs/overlayfs/super.c needs pagemap.hAndrew Morton1-0/+1
i386 allmodconfig: In file included from fs/overlayfs/super.c:10:0: fs/overlayfs/super.c: In function 'ovl_fill_super': include/linux/fs.h:898:36: error: 'PAGE_CACHE_SIZE' undeclared (first use in this function) #define MAX_LFS_FILESIZE (((loff_t)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1) ^ fs/overlayfs/super.c:939:19: note: in expansion of macro 'MAX_LFS_FILESIZE' sb->s_maxbytes = MAX_LFS_FILESIZE; ^ include/linux/fs.h:898:36: note: each undeclared identifier is reported only once for each function it appears in #define MAX_LFS_FILESIZE (((loff_t)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1) ^ fs/overlayfs/super.c:939:19: note: in expansion of macro 'MAX_LFS_FILESIZE' sb->s_maxbytes = MAX_LFS_FILESIZE; ^ Cc: Miklos Szeredi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20ipc/shm.c: is_file_shm_hugepages() can be booleanYaowei Bai2-4/+4
Make is_file_shm_hugepages() return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: Yaowei Bai <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20lz4: fix wrong compress buffer size for 64-bitsBongkyu Kim1-2/+2
The current lz4 compress buffer is 16kb on 32-bits, 32kb on 64-bits system. But, lz4 needs only 16kb on both. On 64-bits, this causes wasted cpu cycles for additional memset during every compression. In case of lz4hc, the current buffer size is (256kb + 8) on 32-bits, (512kb + 16) on 64-bits. But, lz4hc needs only (256kb + 2 * pointer) on both. This patch fixes these wrong compress buffer sizes for 64-bits. Signed-off-by: Bongkyu Kim <[email protected]> Cc: Chanho Min <[email protected]> Cc: Yann Collet <[email protected]> Cc: Kyungsik Lee <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20proc read mm's {arg,env}_{start,end} with mmap semaphore taken.Mateusz Guzik2-7/+22
Only functions doing more than one read are modified. Consumeres happened to deal with possibly changing data, but it does not seem like a good thing to rely on. Signed-off-by: Mateusz Guzik <[email protected]> Acked-by: Cyrill Gorcunov <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Jarod Wilson <[email protected]> Cc: Jan Stancek <[email protected]> Cc: Al Viro <[email protected]> Cc: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20prctl: take mmap sem for writing to protect against othersMateusz Guzik1-10/+10
An unprivileged user can trigger an oops on a kernel with CONFIG_CHECKPOINT_RESTORE. proc_pid_cmdline_read takes mmap_sem for reading and obtains args + env start/end values. These get sanity checked as follows: BUG_ON(arg_start > arg_end); BUG_ON(env_start > env_end); These can be changed by prctl_set_mm. Turns out also takes the semaphore for reading, effectively rendering it useless. This results in: kernel BUG at fs/proc/base.c:240! invalid opcode: 0000 [#1] SMP Modules linked in: virtio_net CPU: 0 PID: 925 Comm: a.out Not tainted 4.4.0-rc8-next-20160105dupa+ #71 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 task: ffff880077a68000 ti: ffff8800784d0000 task.ti: ffff8800784d0000 RIP: proc_pid_cmdline_read+0x520/0x530 RSP: 0018:ffff8800784d3db8 EFLAGS: 00010206 RAX: ffff880077c5b6b0 RBX: ffff8800784d3f18 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 00007f78e8857000 RDI: 0000000000000246 RBP: ffff8800784d3e40 R08: 0000000000000008 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000050 R13: 00007f78e8857800 R14: ffff88006fcef000 R15: ffff880077c5b600 FS: 00007f78e884a740(0000) GS:ffff88007b200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00007f78e8361770 CR3: 00000000790a5000 CR4: 00000000000006f0 Call Trace: __vfs_read+0x37/0x100 vfs_read+0x82/0x130 SyS_read+0x58/0xd0 entry_SYSCALL_64_fastpath+0x12/0x76 Code: 4c 8b 7d a8 eb e9 48 8b 9d 78 ff ff ff 4c 8b 7d 90 48 8b 03 48 39 45 a8 0f 87 f0 fe ff ff e9 d1 fe ff ff 4c 8b 7d 90 eb c6 0f 0b <0f> 0b 0f 0b 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 RIP proc_pid_cmdline_read+0x520/0x530 ---[ end trace 97882617ae9c6818 ]--- Turns out there are instances where the code just reads aformentioned values without locking whatsoever - namely environ_read and get_cmdline. Interestingly these functions look quite resilient against bogus values, but I don't believe this should be relied upon. The first patch gets rid of the oops bug by grabbing mmap_sem for writing. The second patch is optional and puts locking around aformentioned consumers for safety. Consumers of other fields don't seem to benefit from similar treatment and are left untouched. This patch (of 2): The code was taking the semaphore for reading, which does not protect against readers nor concurrent modifications. The problem could cause a sanity checks to fail in procfs's cmdline reader, resulting in an OOPS. Note that some functions perform an unlocked read of various mm fields, but they seem to be fine despite possible modificaton. Signed-off-by: Mateusz Guzik <[email protected]> Acked-by: Cyrill Gorcunov <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Jarod Wilson <[email protected]> Cc: Jan Stancek <[email protected]> Cc: Al Viro <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20powerpc: enable UBSAN supportDaniel Axtens5-1/+11
This hooks up UBSAN support for PowerPC. So far it's found some interesting cases where we don't properly sanitise input to shifts, including one in our futex handling. Nothing critical, but interesting and worth fixing. [[email protected]: arch/powerpc/Kconfig: fix typo in select statement] Signed-off-by: Daniel Axtens <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Tested-by: Andrew Donnellan <[email protected]> Acked-by: Michael Ellerman <[email protected]> Signed-off-by: Valentin Rothberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20UBSAN: run-time undefined behavior sanity checkerAndrey Ryabinin17-1/+693
UBSAN uses compile-time instrumentation to catch undefined behavior (UB). Compiler inserts code that perform certain kinds of checks before operations that could cause UB. If check fails (i.e. UB detected) __ubsan_handle_* function called to print error message. So the most of the work is done by compiler. This patch just implements ubsan handlers printing errors. GCC has this capability since 4.9.x [1] (see -fsanitize=undefined option and its suboptions). However GCC 5.x has more checkers implemented [2]. Article [3] has a bit more details about UBSAN in the GCC. [1] - https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Debugging-Options.html [2] - https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html [3] - http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/ Issues which UBSAN has found thus far are: Found bugs: * out-of-bounds access - 97840cb67ff5 ("netfilter: nfnetlink: fix insufficient validation in nfnetlink_bind") undefined shifts: * d48458d4a768 ("jbd2: use a better hash function for the revoke table") * 10632008b9e1 ("clockevents: Prevent shift out of bounds") * 'x << -1' shift in ext4 - http://lkml.kernel.org/r/<[email protected]> * undefined rol32(0) - http://lkml.kernel.org/r/<[email protected]> * undefined dirty_ratelimit calculation - http://lkml.kernel.org/r/<[email protected]> * undefined roundown_pow_of_two(0) - http://lkml.kernel.org/r/<[email protected]> * [WONTFIX] undefined shift in __bpf_prog_run - http://lkml.kernel.org/r/<CACT4Y+ZxoR3UjLgcNdUm4fECLMx2VdtfrENMtRRCdgHB2n0bJA@mail.gmail.com> WONTFIX here because it should be fixed in bpf program, not in kernel. signed overflows: * 32a8df4e0b33f ("sched: Fix odd values in effective_load() calculations") * mul overflow in ntp - http://lkml.kernel.org/r/<[email protected]> * incorrect conversion into rtc_time in rtc_time64_to_tm() - http://lkml.kernel.org/r/<[email protected]> * unvalidated timespec in io_getevents() - http://lkml.kernel.org/r/<CACT4Y+bBxVYLQ6LtOKrKtnLthqLHcw-BMp3aqP3mjdAvr9FULQ@mail.gmail.com> * [NOTABUG] signed overflow in ktime_add_safe() - http://lkml.kernel.org/r/<CACT4Y+aJ4muRnWxsUe1CMnA6P8nooO33kwG-c8YZg=0Xc8rJqw@mail.gmail.com> [[email protected]: fix unused local warning] [[email protected]: fix __int128 build woes] Signed-off-by: Andrey Ryabinin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michal Marek <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Yury Gribov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Kostya Serebryany <[email protected]> Cc: Johannes Berg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20net/mac80211/debugfs.c: prevent build failure with CONFIG_UBSAN=yAndrey Ryabinin1-5/+2
With upcoming CONFIG_UBSAN the following BUILD_BUG_ON in net/mac80211/debugfs.c starts to trigger: BUILD_BUG_ON(hw_flag_names[NUM_IEEE80211_HW_FLAGS] != (void *)0x1); It seems, that compiler instrumentation causes some code deoptimizations. Because of that GCC is not being able to resolve condition in BUILD_BUG_ON() at compile time. We could make size of hw_flag_names array unspecified and replace the condition in BUILD_BUG_ON() with following: ARRAY_SIZE(hw_flag_names) != NUM_IEEE80211_HW_FLAGS That will have the same effect as before (adding new flag without updating array will trigger build failure) except it doesn't fail with CONFIG_UBSAN. As a bonus this patch slightly decreases size of hw_flag_names array. Signed-off-by: Andrey Ryabinin <[email protected]> Cc: Johannes Berg <[email protected]> Cc: "David S. Miller" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20kernel: printk: specify alignment for struct printk_logAndrey Ryabinin1-5/+5
On architectures that have support for efficient unaligned access struct printk_log has 4-byte alignment. Specify alignment attribute in type declaration. The whole point of this patch is to fix deadlock which happening when UBSAN detects unaligned access in printk() thus UBSAN recursively calls printk() with logbuf_lock held by top printk() call. Signed-off-by: Andrey Ryabinin <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sasha Levin <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Michal Marek <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Yury Gribov <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Cc: Kostya Serebryany <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20sysctl: enable strict writesKees Cook2-9/+8
SYSCTL_WRITES_WARN was added in commit f4aacea2f5d1 ("sysctl: allow for strict write position handling"), and released in v3.16 in August of 2014. Since then I can find only 1 instance of non-zero offset writing[1], and it was fixed immediately in CRIU[2]. As such, it appears safe to flip this to the strict state now. [1] https://www.google.com/search?q="when%20file%20position%20was%20not%200" [2] http://lists.openvz.org/pipermail/criu/2015-April/019819.html Signed-off-by: Kees Cook <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20rbtree: use READ_ONCE in RB_EMPTY_ROOTDavidlohr Bueso1-1/+1
With commit d72da4a4d97 ("rbtree: Make lockless searches non-fatal") our rbtrees provide weak guarantees that allows us to do lockless (and very speculative) reads of the tree. Such readers cannot see partial stores on nodes, ie left/right as well as root. As such, similar to the WRITE_ONCE semantics when doing rotations, use READ_ONCE when checking the root node in RB_EMPTY_ROOT. Signed-off-by: Davidlohr Bueso <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Rusty Russell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20rapidio: use kobj_to_dev()Geliang Tang1-4/+2
Use kobj_to_dev() instead of open-coding it. Signed-off-by: Geliang Tang <[email protected]> Acked-by: "Bounine, Alexandre" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20kexec: move some memembers and definitions within the scope of CONFIG_KEXEC_FILEXunlei Pang4-37/+50
Move the stuff currently only used by the kexec file code within CONFIG_KEXEC_FILE (and CONFIG_KEXEC_VERIFY_SIG). Also move internal "struct kexec_sha_region" and "struct kexec_buf" into "kexec_internal.h". Signed-off-by: Xunlei Pang <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Dave Young <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20kernel/kexec_core.c: use list_for_each_entry_safe in kimage_free_page_listGeliang Tang1-5/+2
Use list_for_each_entry_safe() instead of list_for_each_safe() to simplify the code. Signed-off-by: Geliang Tang <[email protected]> Cc: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Acked-by: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20kexec: set KEXEC_TYPE_CRASH before sanity_check_segment_list()Xunlei Pang1-5/+5
sanity_check_segment_list() checks KEXEC_TYPE_CRASH flag to ensure all the segments of the loaded crash kernel are within the kernel crash resource limits, so set the flag beforehand. Signed-off-by: Xunlei Pang <[email protected]> Acked-by: Dave Young <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Vivek Goyal <[email protected]> Acked-by: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20kernel/cpu.c: make set_cpu_* static inlinesRasmus Villemoes2-38/+39
Almost all callers of the set_cpu_* functions pass an explicit true or false. Making them static inline thus replaces the function calls with a simple set_bit/clear_bit, saving some .text. Signed-off-by: Rasmus Villemoes <[email protected]> Acked-by: Rusty Russell <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20kernel/cpu.c: eliminate cpu_*_maskRasmus Villemoes2-12/+4
Replace the variables cpu_possible_mask, cpu_online_mask, cpu_present_mask and cpu_active_mask with macros expanding to expressions of the same type and value, eliminating some indirection. Signed-off-by: Rasmus Villemoes <[email protected]> Acked-by: Rusty Russell <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20drivers/base/cpu.c: use __cpu_*_mask directlyRasmus Villemoes1-5/+5
The only user of the lvalue-ness of the cpu_*_mask variables is in drivers/base/cpu.c, and that is mostly a work-around for the fact that not even const variables can be used in static initialization. Now that the underlying struct cpumasks are exposed we can take their address. Signed-off-by: Rasmus Villemoes <[email protected]> Acked-by: Rusty Russell <[email protected]> Acked-by: Greg Kroah-Hartman <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20kernel/cpu.c: export __cpu_*_maskRasmus Villemoes2-5/+13
Exporting the cpumasks __cpu_possible_mask and friends will allow us to remove the extra indirection through the cpu_*_mask variables. It will also allow the set_cpu_* functions to become static inlines, which will give a .text reduction. Signed-off-by: Rasmus Villemoes <[email protected]> Acked-by: Rusty Russell <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20kernel/cpu.c: change type of cpu_possible_bits and friendsRasmus Villemoes1-22/+22
Change cpu_possible_bits and friends (online, present, active) from being bitmaps that happen to have the right size to actually being struct cpumasks. Also rename them to __cpu_xyz_mask. This is mostly a small cleanup in preparation for exporting them and, eventually, eliminating the extra indirection through the cpu_xyz_mask variables. Signed-off-by: Rasmus Villemoes <[email protected]> Acked-by: Rusty Russell <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Michael Ellerman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20powerpc/fadump: rename cpu_online_mask member of struct fadump_crash_info_headerRasmus Villemoes2-3/+3
The four cpumasks cpu_{possible,online,present,active}_bits are exposed readonly via the corresponding const variables cpu_xyz_mask. But they are also accessible for arbitrary writing via the exposed functions set_cpu_xyz. There's quite a bit of code throughout the kernel which iterates over or otherwise accesses these bitmaps, and having the access go via the cpu_xyz_mask variables is nowadays [1] simply a useless indirection. It may be that any problem in CS can be solved by an extra level of indirection, but that doesn't mean every extra indirection solves a problem. In this case, it even necessitates some minor ugliness (see 4/6). Patch 1/6 is new in v2, and fixes a build failure on ppc by renaming a struct member, to avoid problems when the identifier cpu_online_mask becomes a macro later in the series. The next four patches eliminate the cpu_xyz_mask variables by simply exposing the actual bitmaps, after renaming them to discourage direct access - that still happens through cpu_xyz_mask, which are now simply macros with the same type and value as they used to have. After that, there's no longer any reason to have the setter functions be out-of-line: The boolean parameter is almost always a literal true or false, so by making them static inlines they will usually compile to one or two instructions. For a defconfig build on x86_64, bloat-o-meter says we save ~3000 bytes. We also save a little stack (stackdelta says 127 functions have a 16 byte smaller stack frame, while two grow by that amount). Mostly because, when iterating over the mask, gcc typically loads the value of cpu_xyz_mask into a callee-saved register and from there into %rdi before each find_next_bit call - now it can just load the appropriate immediate address into %rdi before each call. [1] See Rusty's kind explanation http://thread.gmane.org/gmane.linux.kernel/2047078/focus=2047722 for some historic context. This patch (of 6): As preparation for eliminating the indirect access to the various global cpu_*_bits bitmaps via the pointer variables cpu_*_mask, rename the cpu_online_mask member of struct fadump_crash_info_header to simply online_mask, thus allowing cpu_online_mask to become a macro. Signed-off-by: Rasmus Villemoes <[email protected]> Acked-by: Michael Ellerman <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Rusty Russell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20exit: remove unneeded declaration of exit_mm()Dmitry Safonov1-2/+0
Signed-off-by: Dmitry Safonov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-01-20fs/coredump: prevent "" / "." / ".." core path componentsJann Horn1-0/+20
Let %h and %e print empty values as "!", "." as "!" and ".." as "!.". This prevents hostnames and comm values that are empty or consist of one or two dots from changing the directory level at which the corefile will be stored. Consider the case where someone decides to sort coredumps by hostname with a core pattern like "/cores/%h/core.%e.%p.%t" or so. In this case, hostnames "" and "." would cause the coredump to land directly in /cores, which is not what the intent behind the core pattern is, and ".." would cause the coredump to land in /. Yeah, there probably aren't many people who do that, but I still don't want this edgecase to be kind of broken. It seems very unlikely that this caused security issues anywhere, so I'm not requesting a stable backport. [[email protected]: tweak code comment] Signed-off-by: Jann Horn <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Alexander Viro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>