aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-04-11bnxt_en: Fix ethtool -x crash when device is down.Michael Chan1-3/+8
Fix ethtool .get_rxfh() crash by checking for valid indirection table address before copying the data. Signed-off-by: Michael Chan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2018-04-11drm/amd/display: fix brightness level after resume from suspendRoman Li3-1/+18
Adding missing call to cache current backlight values. Otherwise the brightness resets to default value on resume. Signed-off-by: Roman Li <[email protected]> Reviewed-by: Charlene Liu <[email protected]> Acked-by: Harry Wentland <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-04-11drm/amd/display: HDMI has no sound after Panel power off/onCharlene Liu1-0/+2
Signed-off-by: Charlene Liu <[email protected]> Reviewed-by: Krunoslav Kovac <[email protected]> Acked-by: Harry Wentland <[email protected]> Cc: [email protected] Signed-off-by: Alex Deucher <[email protected]>
2018-04-11drm/amdgpu: add MP1 and THM hw ip base reg offsetEvan Quan2-1/+4
Signed-off-by: Evan Quan <[email protected]> Reviewed-by: Rex Zhu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-04-11drm/amdgpu: fix null pointer panic with direct fw loading on gpu resetHuang Rui1-0/+3
When system uses fw direct loading, then psp context structure won't be initiliazed. And it is also unable to execute mode reset. [ 434.601474] amdgpu 0000:0c:00.0: GPU reset begin! [ 434.694326] amdgpu 0000:0c:00.0: GPU reset [ 434.743152] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 [ 434.838474] IP: psp_gpu_reset+0xc/0x30 [amdgpu] [ 434.893532] PGD 406ed9067 [ 434.893533] P4D 406ed9067 [ 434.926376] PUD 400b46067 [ 434.959217] PMD 0 [ 435.033379] Oops: 0000 [#1] SMP [ 435.072573] Modules linked in: amdgpu(OE) chash(OE) gpu_sched(OE) ttm(OE) drm_kms_helper(OE) drm(OE) fb_sys_fops syscopyarea sysfillrect sysimgblt rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace fscache snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm edac_mce_amd snd_seq_midi snd_seq_midi_event kvm_amd snd_rawmidi kvm irqbypass crct10dif_pclmul crc32_pclmul snd_seq ghash_clmulni_intel snd_seq_device pcbc snd_timer eeepc_wmi aesni_intel snd asus_wmi aes_x86_64 sparse_keymap crypto_simd glue_helper joydev soundcore wmi_bmof cryptd video i2c_piix4 shpchp 8250_dw i2c_designware_platform mac_hid i2c_designware_core sunrpc parport_pc ppdev lp parport autofs4 hid_generic igb usbhid dca ptp mxm_wmi pps_core ahci hid i2c_algo_bit [ 435.931754] libahci wmi Signed-off-by: Huang Rui <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2018-04-11drm/radeon: add PX quirk for Asus K73TKNico Sneck1-0/+4
With this the dGPU turns on correctly. Signed-off-by: Nico Sneck <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2018-04-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds182-2095/+4159
Merge more updates from Andrew Morton: - almost all of the rest of MM - kasan updates - lots of procfs work - misc things - lib/ updates - checkpatch - rapidio - ipc/shm updates - the start of willy's XArray conversion * emailed patches from Andrew Morton <[email protected]>: (140 commits) page cache: use xa_lock xarray: add the xa_lock to the radix_tree_root fscache: use appropriate radix tree accessors export __set_page_dirty unicore32: turn flush_dcache_mmap_lock into a no-op arm64: turn flush_dcache_mmap_lock into a no-op mac80211_hwsim: use DEFINE_IDA radix tree: use GFP_ZONEMASK bits of gfp_t for flags linux/const.h: refactor _BITUL and _BITULL a bit linux/const.h: move UL() macro to include/linux/const.h linux/const.h: prefix include guard of uapi/linux/const.h with _UAPI xen, mm: allow deferred page initialization for xen pv domains elf: enforce MAP_FIXED on overlaying elf segments fs, elf: drop MAP_FIXED usage from elf_map mm: introduce MAP_FIXED_NOREPLACE MAINTAINERS: update bouncing [email protected] addresses fs/dcache.c: add cond_resched() in shrink_dentry_list() include/linux/kfifo.h: fix comment ipc/shm.c: shm_split(): remove unneeded test for NULL shm_file_data.vm_ops kernel/sysctl.c: add kdoc comments to do_proc_do{u}intvec_minmax_conv_param ...
2018-04-11arm64: assembler: add macros to conditionally yield the NEON under PREEMPTArd Biesheuvel2-0/+76
Add support macros to conditionally yield the NEON (and thus the CPU) that may be called from the assembler code. In some cases, yielding the NEON involves saving and restoring a non trivial amount of context (especially in the CRC folding algorithms), and so the macro is split into three, and the code in between is only executed when the yield path is taken, allowing the context to be preserved. The third macro takes an optional label argument that marks the resume path after a yield has been performed. Signed-off-by: Ard Biesheuvel <[email protected]> Reviewed-by: Dave Martin <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2018-04-11arm64: assembler: add utility macros to push/pop stack framesArd Biesheuvel1-0/+63
We are going to add code to all the NEON crypto routines that will turn them into non-leaf functions, so we need to manage the stack frames. To make this less tedious and error prone, add some macros that take the number of callee saved registers to preserve and the extra size to allocate in the stack frame (for locals) and emit the ldp/stp sequences. Signed-off-by: Ard Biesheuvel <[email protected]> Reviewed-by: Dave Martin <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2018-04-11arm64: Move the content of bpi.S to hyp-entry.SMarc Zyngier4-91/+65
bpi.S was introduced as we were starting to build the Spectre v2 mitigation framework, and it was rather unclear that it would become strictly KVM specific. Now that the picture is a lot clearer, let's move the content of that file to hyp-entry.S, where it actually belong. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2018-04-11arm64: Get rid of __smccc_workaround_1_hvc_*Marc Zyngier2-16/+5
The very existence of __smccc_workaround_1_hvc_* is a thinko, as KVM will never use a HVC call to perform the branch prediction invalidation. Even as a nested hypervisor, it would use an SMC instruction. Let's get rid of it. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2018-04-11arm64: capabilities: Rework EL2 vector hardening entryMarc Zyngier1-9/+11
Since 5e7951ce19ab ("arm64: capabilities: Clean up midr range helpers"), capabilities must be represented with a single entry. If multiple CPU types can use the same capability, then they need to be enumerated in a list. The EL2 hardening stuff (which affects both A57 and A72) managed to escape the conversion in the above patch thanks to the 4.17 merge window. Let's fix it now. Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2018-04-11arm64: KVM: Use SMCCC_ARCH_WORKAROUND_1 for Falkor BP hardeningShanker Donthineni6-86/+25
The function SMCCC_ARCH_WORKAROUND_1 was introduced as part of SMC V1.1 Calling Convention to mitigate CVE-2017-5715. This patch uses the standard call SMCCC_ARCH_WORKAROUND_1 for Falkor chips instead of Silicon provider service ID 0xC2001700. Cc: <[email protected]> # 4.14+ Signed-off-by: Shanker Donthineni <[email protected]> [maz: reworked errata framework integration] Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2018-04-11page cache: use xa_lockMatthew Wilcox39-366/+345
Remove the address_space ->tree_lock and use the xa_lock newly added to the radix_tree_root. Rename the address_space ->page_tree to ->i_pages, since we don't really care that it's a tree. [[email protected]: fix nds32, fs/dax.c] Link: http://lkml.kernel.org/r/[email protected]: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Jeff Layton <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Ryusuke Konishi <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11xarray: add the xa_lock to the radix_tree_rootMatthew Wilcox6-13/+42
This results in no change in structure size on 64-bit machines as it fits in the padding between the gfp_t and the void *. 32-bit machines will grow the structure from 8 to 12 bytes. Almost all radix trees are protected with (at least) a spinlock, so as they are converted from radix trees to xarrays, the data structures will shrink again. Initialising the spinlock requires a name for the benefit of lockdep, so RADIX_TREE_INIT() now needs to know the name of the radix tree it's initialising, and so do IDR_INIT() and IDA_INIT(). Also add the xa_lock() and xa_unlock() family of wrappers to make it easier to use the lock. If we could rely on -fplan9-extensions in the compiler, we could avoid all of this syntactic sugar, but that wasn't added until gcc 4.6. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Ryusuke Konishi <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11fscache: use appropriate radix tree accessorsMatthew Wilcox2-2/+2
Don't open-code accesses to data structure internals. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Ryusuke Konishi <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11export __set_page_dirtyMatthew Wilcox3-14/+5
XFS currently contains a copy-and-paste of __set_page_dirty(). Export it from buffer.c instead. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Jeff Layton <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Cc: Ryusuke Konishi <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11unicore32: turn flush_dcache_mmap_lock into a no-opMatthew Wilcox1-4/+2
Unicore doesn't walk the VMA tree in its flush_dcache_page() implementation, so has no need to take the tree_lock. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Jeff Layton <[email protected]> Cc: Ryusuke Konishi <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11arm64: turn flush_dcache_mmap_lock into a no-opMatthew Wilcox1-4/+2
ARM64 doesn't walk the VMA tree in its flush_dcache_page() implementation, so has no need to take the tree_lock. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Reviewed-by: Will Deacon <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Jeff Layton <[email protected]> Cc: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11mac80211_hwsim: use DEFINE_IDAMatthew Wilcox1-1/+1
This is preferred to opencoding an IDA_INIT. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11radix tree: use GFP_ZONEMASK bits of gfp_t for flagsMatthew Wilcox4-5/+9
Patch series "XArray", v9. (First part thereof). This patchset is, I believe, appropriate for merging for 4.17. It contains the XArray implementation, to eventually replace the radix tree, and converts the page cache to use it. This conversion keeps the radix tree and XArray data structures in sync at all times. That allows us to convert the page cache one function at a time and should allow for easier bisection. Other than renaming some elements of the structures, the data structures are fundamentally unchanged; a radix tree walk and an XArray walk will touch the same number of cachelines. I have changes planned to the XArray data structure, but those will happen in future patches. Improvements the XArray has over the radix tree: - The radix tree provides operations like other trees do; 'insert' and 'delete'. But what most users really want is an automatically resizing array, and so it makes more sense to give users an API that is like an array -- 'load' and 'store'. We still have an 'insert' operation for users that really want that semantic. - The XArray considers locking as part of its API. This simplifies a lot of users who formerly had to manage their own locking just for the radix tree. It also improves code generation as we can now tell RCU that we're holding a lock and it doesn't need to generate as much fencing code. The other advantage is that tree nodes can be moved (not yet implemented). - GFP flags are now parameters to calls which may need to allocate memory. The radix tree forced users to decide what the allocation flags would be at creation time. It's much clearer to specify them at allocation time. - Memory is not preloaded; we don't tie up dozens of pages on the off chance that the slab allocator fails. Instead, we drop the lock, allocate a new node and retry the operation. We have to convert all the radix tree, IDA and IDR preload users before we can realise this benefit, but I have not yet found a user which cannot be converted. - The XArray provides a cmpxchg operation. The radix tree forces users to roll their own (and at least four have). - Iterators take a 'max' parameter. That simplifies many users and will reduce the amount of iteration done. - Iteration can proceed backwards. We only have one user for this, but since it's called as part of the pagefault readahead algorithm, that seemed worth mentioning. - RCU-protected pointers are not exposed as part of the API. There are some fun bugs where the page cache forgets to use rcu_dereference() in the current codebase. - Value entries gain an extra bit compared to radix tree exceptional entries. That gives us the extra bit we need to put huge page swap entries in the page cache. - Some iterators now take a 'filter' argument instead of having separate iterators for tagged/untagged iterations. The page cache is improved by this: - Shorter, easier to read code - More efficient iterations - Reduction in size of struct address_space - Fewer walks from the top of the data structure; the XArray API encourages staying at the leaf node and conducting operations there. This patch (of 8): None of these bits may be used for slab allocations, so we can use them as radix tree flags as long as we mask them off before passing them to the slab allocator. Move the IDR flag from the high bits to the GFP_ZONEMASK bits. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Jeff Layton <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Ryusuke Konishi <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11linux/const.h: refactor _BITUL and _BITULL a bitMasahiro Yamada1-2/+2
Minor cleanups available by _UL and _ULL. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: David Howells <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Russell King <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11linux/const.h: move UL() macro to include/linux/const.hMasahiro Yamada5-18/+12
ARM, ARM64 and UniCore32 duplicate the definition of UL(): #define UL(x) _AC(x, UL) This is not actually arch-specific, so it will be useful to move it to a common header. Currently, we only have the uapi variant for linux/const.h, so I am creating include/linux/const.h. I also added _UL(), _ULL() and ULL() because _AC() is mostly used in the form either _AC(..., UL) or _AC(..., ULL). I expect they will be replaced in follow-up cleanups. The underscore-prefixed ones should be used for exported headers. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Guan Xuetao <[email protected]> Acked-by: Catalin Marinas <[email protected]> Acked-by: Russell King <[email protected]> Cc: David Howells <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11linux/const.h: prefix include guard of uapi/linux/const.h with _UAPIMasahiro Yamada1-3/+3
Patch series "linux/const.h: cleanups of macros such as UL(), _BITUL(), BIT() etc", v3. ARM, ARM64, UniCore32 define UL() as a shorthand of _AC(..., UL). More architectures may introduce it in the future. UL() is arch-agnostic, and useful. So let's move it to include/linux/const.h Currently, <asm/memory.h> must be included to use UL(). It pulls in more bloats just for defining some bit macros. I posted V2 one year ago. The previous posts are: https://patchwork.kernel.org/patch/9498273/ https://patchwork.kernel.org/patch/9498275/ https://patchwork.kernel.org/patch/9498269/ https://patchwork.kernel.org/patch/9498271/ At that time, what blocked this series was a comment from David Howells: You need to be very careful doing this. Some userspace stuff depends on the guard macro names on the kernel header files. (https://patchwork.kernel.org/patch/9498275/) Looking at the code closer, I noticed this is not a problem. See the following line. https://github.com/torvalds/linux/blob/v4.16-rc2/scripts/headers_install.sh#L40 scripts/headers_install.sh rips off _UAPI prefix from guard macro names. I ran "make headers_install" and confirmed the result is what I expect. So, we can prefix the include guard of include/uapi/linux/const.h, and add a new include/linux/const.h. This patch (of 4): I am going to add include/linux/const.h for the kernel space. Add _UAPI to the include guard of include/uapi/linux/const.h to prepare for that. Please notice the guard name of the exported one will be kept as-is. So, this commit has no impact to the userspace even if some userspace stuff depends on the guard macro names. scripts/headers_install.sh processes exported headers by SED, and rips off "_UAPI" from guard macro names. #ifndef _UAPI_LINUX_CONST_H #define _UAPI_LINUX_CONST_H will be turned into #ifndef _LINUX_CONST_H #define _LINUX_CONST_H Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Cc: David Howells <[email protected]> Cc: Will Deacon <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Russell King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11xen, mm: allow deferred page initialization for xen pv domainsPavel Tatashin6-16/+31
Juergen Gross noticed that commit f7f99100d8d ("mm: stop zeroing memory during allocation in vmemmap") broke XEN PV domains when deferred struct page initialization is enabled. This is because the xen's PagePinned() flag is getting erased from struct pages when they are initialized later in boot. Juergen fixed this problem by disabling deferred pages on xen pv domains. It is desirable, however, to have this feature available as it reduces boot time. This fix re-enables the feature for pv-dmains, and fixes the problem the following way: The fix is to delay setting PagePinned flag until struct pages for all allocated memory are initialized, i.e. until after free_all_bootmem(). A new x86_init.hyper op init_after_bootmem() is called to let xen know that boot allocator is done, and hence struct pages for all the allocated memory are now initialized. If deferred page initialization is enabled, the rest of struct pages are going to be initialized later in boot once page_alloc_init_late() is called. xen_after_bootmem() walks page table's pages and marks them pinned. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Pavel Tatashin <[email protected]> Acked-by: Ingo Molnar <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Tested-by: Juergen Gross <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Alok Kataria <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Mathias Krause <[email protected]> Cc: Jinbum Park <[email protected]> Cc: Dan Williams <[email protected]> Cc: Baoquan He <[email protected]> Cc: Jia Zhang <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Stefano Stabellini <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11elf: enforce MAP_FIXED on overlaying elf segmentsMichal Hocko1-3/+10
Anshuman has reported that with "fs, elf: drop MAP_FIXED usage from elf_map" applied, some ELF binaries in his environment fail to start with [ 23.423642] 9148 (sed): Uhuuh, elf segment at 0000000010030000 requested but the memory is mapped already [ 23.423706] requested [10030000, 10040000] mapped [10030000, 10040000] 100073 anon The reason is that the above binary has overlapping elf segments: LOAD 0x0000000000000000 0x0000000010000000 0x0000000010000000 0x0000000000013a8c 0x0000000000013a8c R E 10000 LOAD 0x000000000001fd40 0x000000001002fd40 0x000000001002fd40 0x00000000000002c0 0x00000000000005e8 RW 10000 LOAD 0x0000000000020328 0x0000000010030328 0x0000000010030328 0x0000000000000384 0x00000000000094a0 RW 10000 That binary has two RW LOAD segments, the first crosses a page border into the second 0x1002fd40 (LOAD2-vaddr) + 0x5e8 (LOAD2-memlen) == 0x10030328 (LOAD3-vaddr) Handle this situation by enforcing MAP_FIXED when we establish a temporary brk VMA to handle overlapping segments. All other mappings will still use MAP_FIXED_NOREPLACE. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Anshuman Khandual <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Cc: Andrei Vagin <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Kees Cook <[email protected]> Cc: Abdul Haleem <[email protected]> Cc: Joel Stanley <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Mark Brown <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11fs, elf: drop MAP_FIXED usage from elf_mapMichal Hocko2-5/+12
Both load_elf_interp and load_elf_binary rely on elf_map to map segments on a controlled address and they use MAP_FIXED to enforce that. This is however dangerous thing prone to silent data corruption which can be even exploitable. Let's take CVE-2017-1000253 as an example. At the time (before commit eab09532d400: "binfmt_elf: use ELF_ET_DYN_BASE only for PIE") ELF_ET_DYN_BASE was at TASK_SIZE / 3 * 2 which is not that far away from the stack top on 32b (legacy) memory layout (only 1GB away). Therefore we could end up mapping over the existing stack with some luck. The issue has been fixed since then (a87938b2e246: "fs/binfmt_elf.c: fix bug in loading of PIE binaries"), ELF_ET_DYN_BASE moved moved much further from the stack (eab09532d400 and later by c715b72c1ba4: "mm: revert x86_64 and arm64 ELF_ET_DYN_BASE base changes") and excessive stack consumption early during execve fully stopped by da029c11e6b1 ("exec: Limit arg stack to at most 75% of _STK_LIM"). So we should be safe and any attack should be impractical. On the other hand this is just too subtle assumption so it can break quite easily and hard to spot. I believe that the MAP_FIXED usage in load_elf_binary (et. al) is still fundamentally dangerous. Moreover it shouldn't be even needed. We are at the early process stage and so there shouldn't be unrelated mappings (except for stack and loader) existing so mmap for a given address should succeed even without MAP_FIXED. Something is terribly wrong if this is not the case and we should rather fail than silently corrupt the underlying mapping. Address this issue by changing MAP_FIXED to the newly added MAP_FIXED_NOREPLACE. This will mean that mmap will fail if there is an existing mapping clashing with the requested one without clobbering it. [[email protected]: fix build] [[email protected]: coding-style fixes] [[email protected]: don't use the same value for MAP_FIXED_NOREPLACE and MAP_SYNC] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Signed-off-by: Andrei Vagin <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Acked-by: Michael Ellerman <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Abdul Haleem <[email protected]> Cc: Joel Stanley <[email protected]> Cc: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11mm: introduce MAP_FIXED_NOREPLACEMichal Hocko6-0/+16
Patch series "mm: introduce MAP_FIXED_NOREPLACE", v2. This has started as a follow up discussion [3][4] resulting in the runtime failure caused by hardening patch [5] which removes MAP_FIXED from the elf loader because MAP_FIXED is inherently dangerous as it might silently clobber an existing underlying mapping (e.g. stack). The reason for the failure is that some architectures enforce an alignment for the given address hint without MAP_FIXED used (e.g. for shared or file backed mappings). One way around this would be excluding those archs which do alignment tricks from the hardening [6]. The patch is really trivial but it has been objected, rightfully so, that this screams for a more generic solution. We basically want a non-destructive MAP_FIXED. The first patch introduced MAP_FIXED_NOREPLACE which enforces the given address but unlike MAP_FIXED it fails with EEXIST if the given range conflicts with an existing one. The flag is introduced as a completely new one rather than a MAP_FIXED extension because of the backward compatibility. We really want a never-clobber semantic even on older kernels which do not recognize the flag. Unfortunately mmap sucks wrt flags evaluation because we do not EINVAL on unknown flags. On those kernels we would simply use the traditional hint based semantic so the caller can still get a different address (which sucks) but at least not silently corrupt an existing mapping. I do not see a good way around that. Except we won't export expose the new semantic to the userspace at all. It seems there are users who would like to have something like that. Jemalloc has been mentioned by Michael Ellerman [7] Florian Weimer has mentioned the following: : glibc ld.so currently maps DSOs without hints. This means that the kernel : will map right next to each other, and the offsets between them a completely : predictable. We would like to change that and supply a random address in a : window of the address space. If there is a conflict, we do not want the : kernel to pick a non-random address. Instead, we would try again with a : random address. John Hubbard has mentioned CUDA example : a) Searches /proc/<pid>/maps for a "suitable" region of available : VA space. "Suitable" generally means it has to have a base address : within a certain limited range (a particular device model might : have odd limitations, for example), it has to be large enough, and : alignment has to be large enough (again, various devices may have : constraints that lead us to do this). : : This is of course subject to races with other threads in the process. : : Let's say it finds a region starting at va. : : b) Next it does: : p = mmap(va, ...) : : *without* setting MAP_FIXED, of course (so va is just a hint), to : attempt to safely reserve that region. If p != va, then in most cases, : this is a failure (almost certainly due to another thread getting a : mapping from that region before we did), and so this layer now has to : call munmap(), before returning a "failure: retry" to upper layers. : : IMPROVEMENT: --> if instead, we could call this: : : p = mmap(va, ... MAP_FIXED_NOREPLACE ...) : : , then we could skip the munmap() call upon failure. This : is a small thing, but it is useful here. (Thanks to Piotr : Jaroszynski and Mark Hairgrove for helping me get that detail : exactly right, btw.) : : c) After that, CUDA suballocates from p, via: : : q = mmap(sub_region_start, ... MAP_FIXED ...) : : Interestingly enough, "freeing" is also done via MAP_FIXED, and : setting PROT_NONE to the subregion. Anyway, I just included (c) for : general interest. Atomic address range probing in the multithreaded programs in general sounds like an interesting thing to me. The second patch simply replaces MAP_FIXED use in elf loader by MAP_FIXED_NOREPLACE. I believe other places which rely on MAP_FIXED should follow. Actually real MAP_FIXED usages should be docummented properly and they should be more of an exception. [1] http://lkml.kernel.org/r/[email protected] [2] http://lkml.kernel.org/r/[email protected] [3] http://lkml.kernel.org/r/[email protected] [4] http://lkml.kernel.org/r/[email protected] [5] http://lkml.kernel.org/r/[email protected] [6] http://lkml.kernel.org/r/[email protected] [7] http://lkml.kernel.org/r/[email protected] This patch (of 2): MAP_FIXED is used quite often to enforce mapping at the particular range. The main problem of this flag is, however, that it is inherently dangerous because it unmaps existing mappings covered by the requested range. This can cause silent memory corruptions. Some of them even with serious security implications. While the current semantic might be really desiderable in many cases there are others which would want to enforce the given range but rather see a failure than a silent memory corruption on a clashing range. Please note that there is no guarantee that a given range is obeyed by the mmap even when it is free - e.g. arch specific code is allowed to apply an alignment. Introduce a new MAP_FIXED_NOREPLACE flag for mmap to achieve this behavior. It has the same semantic as MAP_FIXED wrt. the given address request with a single exception that it fails with EEXIST if the requested address is already covered by an existing mapping. We still do rely on get_unmaped_area to handle all the arch specific MAP_FIXED treatment and check for a conflicting vma after it returns. The flag is introduced as a completely new one rather than a MAP_FIXED extension because of the backward compatibility. We really want a never-clobber semantic even on older kernels which do not recognize the flag. Unfortunately mmap sucks wrt. flags evaluation because we do not EINVAL on unknown flags. On those kernels we would simply use the traditional hint based semantic so the caller can still get a different address (which sucks) but at least not silently corrupt an existing mapping. I do not see a good way around that. [[email protected]: fix whitespace] [fail on clashing range with EEXIST as per Florian Weimer] [set MAP_FIXED before round_hint_to_min as per Khalid Aziz] Link: http://lkml.kernel.org/r/[email protected] Reviewed-by: Khalid Aziz <[email protected]> Signed-off-by: Michal Hocko <[email protected]> Acked-by: Michael Ellerman <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Russell King - ARM Linux <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Florian Weimer <[email protected]> Cc: John Hubbard <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Abdul Haleem <[email protected]> Cc: Joel Stanley <[email protected]> Cc: Kees Cook <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Jason Evans <[email protected]> Cc: David Goldblatt <[email protected]> Cc: Edward Tomasz NapieraÅ‚a <[email protected]> Cc: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11MAINTAINERS: update bouncing [email protected] addressesJoe Perches1-2/+2
Adaptec is now part of Microsemi. Commit 2a81ffdd9da1 ("MAINTAINERS: Update email address for aacraid") updated only one of the driver maintainer addresses. Update the other two sections as the [email protected] address bounces. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joe Perches <[email protected]> Cc: Dave Carroll <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11fs/dcache.c: add cond_resched() in shrink_dentry_list()Nikolay Borisov1-3/+2
As previously reported (https://patchwork.kernel.org/patch/8642031/) it's possible to call shrink_dentry_list with a large number of dentries (> 10000). This, in turn, could trigger the softlockup detector and possibly trigger a panic. In addition to the unmount path being vulnerable to this scenario, at SuSE we've observed similar situation happening during process exit on processes that touch a lot of dentries. Here is an excerpt from a crash dump. The number after the colon are the number of dentries on the list passed to shrink_dentry_list: PID 99760: 10722 PID 107530: 215 PID 108809: 24134 PID 108877: 21331 PID 141708: 16487 So we want to kill between 15k-25k dentries without yielding. And one possible call stack looks like: 4 [ffff8839ece41db0] _raw_spin_lock at ffffffff8152a5f8 5 [ffff8839ece41db0] evict at ffffffff811c3026 6 [ffff8839ece41dd0] __dentry_kill at ffffffff811bf258 7 [ffff8839ece41df0] shrink_dentry_list at ffffffff811bf593 8 [ffff8839ece41e18] shrink_dcache_parent at ffffffff811bf830 9 [ffff8839ece41e50] proc_flush_task at ffffffff8120dd61 10 [ffff8839ece41ec0] release_task at ffffffff81059ebd 11 [ffff8839ece41f08] do_exit at ffffffff8105b8ce 12 [ffff8839ece41f78] sys_exit at ffffffff8105bd53 13 [ffff8839ece41f80] system_call_fastpath at ffffffff81532909 While some of the callers of shrink_dentry_list do use cond_resched, this is not sufficient to prevent softlockups. So just move cond_resched into shrink_dentry_list from its callers. David said: I've found hundreds of occurrences of warnings that we emit when need_resched stays set for a prolonged period of time with the stack trace that is included in the change log. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Nikolay Borisov <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Goldwyn Rodrigues <[email protected]> Cc: Jeff Mahoney <[email protected]> Cc: Davidlohr Bueso <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11include/linux/kfifo.h: fix commentValentin Vidic1-4/+4
Clean up unusual formatting in the note about locking. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Valentin Vidic <[email protected]> Cc: Stefani Seibold <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Sean Young <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11ipc/shm.c: shm_split(): remove unneeded test for NULL shm_file_data.vm_opsAndrew Morton1-1/+1
This was added by the recent "ipc/shm.c: add split function to shm_vm_ops", but it is not necessary. Reviewed-by: Mike Kravetz <[email protected]> Cc: Laurent Dufour <[email protected]> Cc: Dan Williams <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11kernel/sysctl.c: add kdoc comments to do_proc_do{u}intvec_minmax_conv_paramWaiman Long1-2/+20
Kdoc comments are added to the do_proc_dointvec_minmax_conv_param and do_proc_douintvec_minmax_conv_param structures thare are used internally for range checking. The error codes returned by proc_dointvec_minmax() and proc_douintvec_minmax() are also documented. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Waiman Long <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Luis R. Rodriguez <[email protected]> Cc: Al Viro <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Kees Cook <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11fs/proc/proc_sysctl.c: fix typo in sysctl_check_table_array()Waiman Long1-1/+1
Patch series "ipc: Clamp *mni to the real IPCMNI limit", v3. The sysctl parameters msgmni, shmmni and semmni have an inherent limit of IPC_MNI (32k). However, users may not be aware of that because they can write a value much higher than that without getting any error or notification. Reading the parameters back will show the newly written values which are not real. Enforcing the limit by failing sysctl parameter write, however, can break existing user applications. To address this delemma, a new flags field is introduced into the ctl_table. The value CTL_FLAGS_CLAMP_RANGE can be added to any ctl_table entries to enable a looser range clamping without returning any error. For example, .flags = CTL_FLAGS_CLAMP_RANGE, This flags value are now used for the range checking of shmmni, msgmni and semmni without breaking existing applications. If any out of range value is written to those sysctl parameters, the following warning will be printed instead. Kernel parameter "shmmni" was set out of range [0, 32768], clamped to 32768. Reading the values back will show 32768 instead of some fake values. This patch (of 6): Fix a typo. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Waiman Long <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Luis R. Rodriguez <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Kees Cook <[email protected]> Cc: Al Viro <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11ipc/msg: introduce msgctl(MSG_STAT_ANY)Davidlohr Bueso4-5/+15
There is a permission discrepancy when consulting msq ipc object metadata between /proc/sysvipc/msg (0444) and the MSG_STAT shmctl command. The later does permission checks for the object vs S_IRUGO. As such there can be cases where EACCESS is returned via syscall but the info is displayed anyways in the procfs files. While this might have security implications via info leaking (albeit no writing to the msq metadata), this behavior goes way back and showing all the objects regardless of the permissions was most likely an overlook - so we are stuck with it. Furthermore, modifying either the syscall or the procfs file can cause userspace programs to break (ie ipcs). Some applications require getting the procfs info (without root privileges) and can be rather slow in comparison with a syscall -- up to 500x in some reported cases for shm. This patch introduces a new MSG_STAT_ANY command such that the msq ipc object permissions are ignored, and only audited instead. In addition, I've left the lsm security hook checks in place, as if some policy can block the call, then the user has no other choice than just parsing the procfs file. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Reported-by: Robert Kettler <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Kees Cook <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11ipc/sem: introduce semctl(SEM_STAT_ANY)Davidlohr Bueso4-5/+15
There is a permission discrepancy when consulting shm ipc object metadata between /proc/sysvipc/sem (0444) and the SEM_STAT semctl command. The later does permission checks for the object vs S_IRUGO. As such there can be cases where EACCESS is returned via syscall but the info is displayed anyways in the procfs files. While this might have security implications via info leaking (albeit no writing to the sma metadata), this behavior goes way back and showing all the objects regardless of the permissions was most likely an overlook - so we are stuck with it. Furthermore, modifying either the syscall or the procfs file can cause userspace programs to break (ie ipcs). Some applications require getting the procfs info (without root privileges) and can be rather slow in comparison with a syscall -- up to 500x in some reported cases for shm. This patch introduces a new SEM_STAT_ANY command such that the sem ipc object permissions are ignored, and only audited instead. In addition, I've left the lsm security hook checks in place, as if some policy can block the call, then the user has no other choice than just parsing the procfs file. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Reported-by: Robert Kettler <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Kees Cook <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11ipc/shm: introduce shmctl(SHM_STAT_ANY)Davidlohr Bueso4-7/+23
Patch series "sysvipc: introduce STAT_ANY commands", v2. The following patches adds the discussed (see [1]) new command for shm as well as for sems and msq as they are subject to the same discrepancies for ipc object permission checks between the syscall and via procfs. These new commands are justified in that (1) we are stuck with this semantics as changing syscall and procfs can break userland; and (2) some users can benefit from performance (for large amounts of shm segments, for example) from not having to parse the procfs interface. Once merged, I will submit the necesary manpage updates. But I'm thinking something like: : diff --git a/man2/shmctl.2 b/man2/shmctl.2 : index 7bb503999941..bb00bbe21a57 100644 : --- a/man2/shmctl.2 : +++ b/man2/shmctl.2 : @@ -41,6 +41,7 @@ : .\" 2005-04-25, mtk -- noted aberrant Linux behavior w.r.t. new : .\" attaches to a segment that has already been marked for deletion. : .\" 2005-08-02, mtk: Added IPC_INFO, SHM_INFO, SHM_STAT descriptions. : +.\" 2018-02-13, dbueso: Added SHM_STAT_ANY description. : .\" : .TH SHMCTL 2 2017-09-15 "Linux" "Linux Programmer's Manual" : .SH NAME : @@ -242,6 +243,18 @@ However, the : argument is not a segment identifier, but instead an index into : the kernel's internal array that maintains information about : all shared memory segments on the system. : +.TP : +.BR SHM_STAT_ANY " (Linux-specific)" : +Return a : +.I shmid_ds : +structure as for : +.BR SHM_STAT . : +However, the : +.I shm_perm.mode : +is not checked for read access for : +.IR shmid , : +resembing the behaviour of : +/proc/sysvipc/shm. : .PP : The caller can prevent or allow swapping of a shared : memory segment with the following \fIcmd\fP values: : @@ -287,7 +300,7 @@ operation returns the index of the highest used entry in the : kernel's internal array recording information about all : shared memory segments. : (This information can be used with repeated : -.B SHM_STAT : +.B SHM_STAT/SHM_STAT_ANY : operations to obtain information about all shared memory segments : on the system.) : A successful : @@ -328,7 +341,7 @@ isn't accessible. : \fIshmid\fP is not a valid identifier, or \fIcmd\fP : is not a valid command. : Or: for a : -.B SHM_STAT : +.B SHM_STAT/SHM_STAT_ANY : operation, the index value specified in : .I shmid : referred to an array slot that is currently unused. This patch (of 3): There is a permission discrepancy when consulting shm ipc object metadata between /proc/sysvipc/shm (0444) and the SHM_STAT shmctl command. The later does permission checks for the object vs S_IRUGO. As such there can be cases where EACCESS is returned via syscall but the info is displayed anyways in the procfs files. While this might have security implications via info leaking (albeit no writing to the shm metadata), this behavior goes way back and showing all the objects regardless of the permissions was most likely an overlook - so we are stuck with it. Furthermore, modifying either the syscall or the procfs file can cause userspace programs to break (ie ipcs). Some applications require getting the procfs info (without root privileges) and can be rather slow in comparison with a syscall -- up to 500x in some reported cases. This patch introduces a new SHM_STAT_ANY command such that the shm ipc object permissions are ignored, and only audited instead. In addition, I've left the lsm security hook checks in place, as if some policy can block the call, then the user has no other choice than just parsing the procfs file. [1] https://lkml.org/lkml/2017/12/19/220 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Davidlohr Bueso <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Michael Kerrisk <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Kees Cook <[email protected]> Cc: Robert Kettler <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11kernel/params.c: downgrade warning for unsafe parametersChris Wilson1-2/+2
As using an unsafe module parameter is, by its very definition, an expected user action, emitting a warning is overkill. Nothing has yet gone wrong, and we add a taint flag for any future oops should something actually go wrong. So instead of having a user controllable pr_warn, downgrade it to a pr_notice for "a normal, but significant condition". We make use of unsafe kernel parameters in igt (https://cgit.freedesktop.org/drm/igt-gpu-tools/) (we have not yet succeeded in removing all such debugging options), which generates a warning and taints the kernel. The warning is unhelpful as we then need to filter it out again as we check that every test themselves do not provoke any kernel warnings. Link: http://lkml.kernel.org/r/[email protected] Fixes: 91f9d330cc14 ("module: make it possible to have unsafe, tainting module params") Signed-off-by: Chris Wilson <[email protected]> Acked-by: Jani Nikula <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Jean Delvare <[email protected]> Cc: Li Zhong <[email protected]> Cc: Petri Latvala <[email protected]> Cc: Daniel Vetter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11kernel/sysctl.c: fix sizeof argument to match variable nameRandy Dunlap1-1/+1
Fix sizeof argument to be the same as the data variable name. Probably a copy/paste error. Mostly harmless since both variables are unsigned int. Fixes kernel bugzilla #197371: Possible access to unintended variable in "kernel/sysctl.c" line 1339 https://bugzilla.kernel.org/show_bug.cgi?id=197371 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Reported-by: Petru Mihancea <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11rapidio: use a reference count for struct mport_dma_reqIoan Nicu1-104/+18
Once the dma request is passed to the DMA engine, the DMA subsystem would hold a pointer to this structure and could call the completion callback after do_dma_request() has timed out. The current code deals with this by putting timed out SYNC requests to a pending list and freeing them later, when the mport cdev device is released. This still does not guarantee that the DMA subsystem is really done with those transfers, so in theory dma_xfer_callback/dma_req_free could be called after mport_cdev_release_dma and could potentially access already freed memory. This patch simplifies the current handling by using a kref in the mport dma request structure, so that it gets freed only when nobody uses it anymore. This also simplifies the code a bit, as FAF transfers are now handled in the same way as SYNC and ASYNC transfers. There is no need anymore for the pending list and for the dma workqueue which was used in case of FAF transfers, so we remove them both. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ioan Nicu <[email protected]> Acked-by: Alexandre Bounine <[email protected]> Cc: Barry Wood <[email protected]> Cc: Matt Porter <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Al Viro <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: Frank Kunz <[email protected]> Cc: Alexander Sverdlin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11drivers/rapidio/rio-scan.c: fix typo in commentVasyl Gomonovych1-3/+3
Fix typo in the words 'receiver', 'specified', 'during' Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vasyl Gomonovych <[email protected]> Cc: Matt Porter <[email protected]> Cc: Alexandre Bounine <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11exec: pin stack limit during execKees Cook2-12/+17
Since the stack rlimit is used in multiple places during exec and it can be changed via other threads (via setrlimit()) or processes (via prlimit()), the assumption that the value doesn't change cannot be made. This leads to races with mm layout selection and argument size calculations. This changes the exec path to use the rlimit stored in bprm instead of in current. Before starting the thread, the bprm stack rlimit is stored back to current. Link: http://lkml.kernel.org/r/[email protected] Fixes: 64701dee4178e ("exec: Use sane stack rlimit under secureexec") Signed-off-by: Kees Cook <[email protected]> Reported-by: Ben Hutchings <[email protected]> Reported-by: Andy Lutomirski <[email protected]> Reported-by: Brad Spengler <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Greg KH <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: "Jason A. Donenfeld" <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Willy Tarreau <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11exec: introduce finalize_exec() before start_thread()Kees Cook6-0/+11
Provide a final callback into fs/exec.c before start_thread() takes over, to handle any last-minute changes, like the coming restoration of the stack limit. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Brad Spengler <[email protected]> Cc: Greg KH <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: "Jason A. Donenfeld" <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Willy Tarreau <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11exec: pass stack rlimit into mm layout functionsKees Cook11-58/+81
Patch series "exec: Pin stack limit during exec". Attempts to solve problems with the stack limit changing during exec continue to be frustrated[1][2]. In addition to the specific issues around the Stack Clash family of flaws, Andy Lutomirski pointed out[3] other places during exec where the stack limit is used and is assumed to be unchanging. Given the many places it gets used and the fact that it can be manipulated/raced via setrlimit() and prlimit(), I think the only way to handle this is to move away from the "current" view of the stack limit and instead attach it to the bprm, and plumb this down into the functions that need to know the stack limits. This series implements the approach. [1] 04e35f4495dd ("exec: avoid RLIMIT_STACK races with prlimit()") [2] 779f4e1c6c7c ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"") [3] to [email protected], "Subject: existing rlimit races?" This patch (of 3): Since it is possible that the stack rlimit can change externally during exec (either via another thread calling setrlimit() or another process calling prlimit()), provide a way to pass the rlimit down into the per-architecture mm layout functions so that the rlimit can stay in the bprm structure instead of sitting in the signal structure until exec is finalized. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Willy Tarreau <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: "Jason A. Donenfeld" <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Greg KH <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Brad Spengler <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11seq_file: account everything to kmemcgAlexey Dobriyan1-4/+4
All it takes to open a file and read 1 byte from it. seq_file will be allocated along with any private allocations, and more importantly seq file buffer which is 1 page by default. Link: http://lkml.kernel.org/r/20180310085252.GB17121@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Al Viro <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11seq_file: allocate seq_file from kmem_cacheAlexey Dobriyan3-2/+12
For fine-grained debugging and usercopy protection. Link: http://lkml.kernel.org/r/20180310085027.GA17121@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Al Viro <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11fs/reiserfs/journal.c: add missing resierfs_warning() argAndrew Morton1-1/+1
One use of the reiserfs_warning() macro in journal_init_dev() is missing a parameter, causing the following warning: REISERFS warning (device loop0): journal_init_dev: Cannot open '%s': %i journal_init_dev: This also causes a WARN_ONCE() warning in the vsprintf code, and then a panic if panic_on_warn is set. Please remove unsupported %/ in format string WARNING: CPU: 1 PID: 4480 at lib/vsprintf.c:2138 format_decode+0x77f/0x830 lib/vsprintf.c:2138 Kernel panic - not syncing: panic_on_warn set ... Just add another string argument to the macro invocation. Addresses https://syzkaller.appspot.com/bug?id=0627d4551fdc39bf1ef5d82cd9eef587047f7718 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Reported-by: <[email protected]> Tested-by: Randy Dunlap <[email protected]> Acked-by: Jeff Mahoney <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Jan Kara <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11autofs4: use wait_event_killableMatthew Wilcox1-27/+2
This playing with signals to allow only fatal signals appears to predate the introduction of wait_event_killable(), and I'm fairly sure that wait_event_killable is what was meant to happen here. [[email protected]: use wake_up() instead of wake_up_interruptible] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Ian Kent <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11init/ramdisk: use pr_cont() at the end of ramdisk loadingAaro Koskinen1-2/+2
Use pr_cont() at the end of ramdisk loading. This will avoid the rotator and an extra newline appearing in the dmesg. Before: RAMDISK: Loading 2436KiB [1 disk] into ram disk... | done. After: RAMDISK: Loading 2436KiB [1 disk] into ram disk... done. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aaro Koskinen <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11checkpatch: whinge about bool bitfieldsJoe Perches1-0/+6
Using bool in a bitfield isn't a good idea as the alignment behavior is arch implementation defined. Suggest using unsigned int or u<8|16|32> instead. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joe Perches <[email protected]> Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>