blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2017-02-24	mm,fs,dax: change ->pmd_fault to ->huge_fault	Dave Jiang	2	-7/+9
	Patch series "1G transparent hugepage support for device dax", v2. The following series implements support for 1G trasparent hugepage on x86 for device dax. The bulk of the code was written by Mathew Wilcox a while back supporting transparent 1G hugepage for fs DAX. I have forward ported the relevant bits to 4.10-rc. The current submission has only the necessary code to support device DAX. Comments from Dan Williams: So the motivation and intended user of this functionality mirrors the motivation and users of 1GB page support in hugetlbfs. Given expected capacities of persistent memory devices an in-memory database may want to reduce tlb pressure beyond what they can already achieve with 2MB mappings of a device-dax file. We have customer feedback to that effect as Willy mentioned in his previous version of these patches [1]. [1]: https://lkml.org/lkml/2016/1/31/52 Comments from Nilesh @ Oracle: There are applications which have a process model; and if you assume 10,000 processes attempting to mmap all the 6TB memory available on a server; we are looking at the following: processes : 10,000 memory : 6TB pte @ 4k page size: 8 bytes / 4K of memory * #processes = 6TB / 4k * 8 * 10000 = 1.5GB * 80000 = 120,000GB pmd @ 2M page size: 120,000 / 512 = ~240GB pud @ 1G page size: 240GB / 512 = ~480MB As you can see with 2M pages, this system will use up an exorbitant amount of DRAM to hold the page tables; but the 1G pages finally brings it down to a reasonable level. Memory sizes will keep increasing; so this number will keep increasing. An argument can be made to convert the applications from process model to thread model, but in the real world that may not be always practical. Hopefully this helps explain the use case where this is valuable. This patch (of 3): In preparation for adding the ability to handle PUD pages, convert vm_operations_struct.pmd_fault to vm_operations_struct.huge_fault. The vm_fault structure is extended to include a union of the different page table pointers that may be needed, and three flag bits are reserved to indicate which type of pointer is in the union. [[email protected]: remove unused function ext4_dax_huge_fault()] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: clear PMD or PUD size flags when in fall through path] Link: http://lkml.kernel.org/r/148589842696.5820.16078080610311444794.stgit@djiang5-desk3.ch.intel.com Link: http://lkml.kernel.org/r/148545058784.17912.6353162518188733642.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Matthew Wilcox <[email protected]> Signed-off-by: Dave Jiang <[email protected]> Signed-off-by: Ross Zwisler <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Jan Kara <[email protected]> Cc: Dan Williams <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Nilesh Choudhury <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Dave Jiang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24	mm, fs: reduce fault, page_mkwrite, and pfn_mkwrite to take only vmf	Dave Jiang	3	-10/+8
	->fault(), ->page_mkwrite(), and ->pfn_mkwrite() calls do not need to take a vma and vmf parameter when the vma already resides in vmf. Remove the vma parameter to simplify things. [[email protected]: fix ARM build] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/148521301778.19116.10840599906674778980.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Dave Jiang <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Ross Zwisler <[email protected]> Cc: Theodore Ts'o <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jan Kara <[email protected]> Cc: Dan Williams <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24	mm: vmscan: move dirty pages out of the way until they're flushed	Johannes Weiner	1	-0/+7
	We noticed a performance regression when moving hadoop workloads from 3.10 kernels to 4.0 and 4.6. This is accompanied by increased pageout activity initiated by kswapd as well as frequent bursts of allocation stalls and direct reclaim scans. Even lowering the dirty ratios to the equivalent of less than 1% of memory would not eliminate the issue, suggesting that dirty pages concentrate where the scanner is looking. This can be traced back to recent efforts of thrash avoidance. Where 3.10 would not detect refaulting pages and continuously supply clean cache to the inactive list, a thrashing workload on 4.0+ will detect and activate refaulting pages right away, distilling used-once pages on the inactive list much more effectively. This is by design, and it makes sense for clean cache. But for the most part our workload's cache faults are refaults and its use-once cache is from streaming writes. We end up with most of the inactive list dirty, and we don't go after the active cache as long as we have use-once pages around. But waiting for writes to avoid reclaiming clean cache that might refault is a bad trade-off. Even if the refaults happen, reads are faster than writes. Before getting bogged down on writeback, reclaim should first look at all cache in the system, even active cache. To accomplish this, activate pages that are dirty or under writeback when they reach the end of the inactive LRU. The pages are marked for immediate reclaim, meaning they'll get moved back to the inactive LRU tail as soon as they're written back and become reclaimable. But in the meantime, by reducing the inactive list to only immediately reclaimable pages, we allow the scanner to deactivate and refill the inactive list with clean cache from the active list tail to guarantee forward progress. [[email protected]: update comment] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24	mm: vmscan: kick flushers when we encounter dirty pages on the LRU	Johannes Weiner	1	-1/+1
	Memory pressure can put dirty pages at the end of the LRU without anybody running into dirty limits. Don't start writing individual pages from kswapd while the flushers might be asleep. Unlike the old direct reclaim flusher wakeup (removed in the next patch) that flushes the number of pages just scanned, this patch wakes the flushers for all outstanding dirty pages. That seemed to perform better in a synthetic test that pushes dirty pages to the end of the LRU and into reclaim, because we know LRU aging outstrips writeback already, and this way we give younger dirty pages a headstart rather than wait until reclaim runs into them as well. It also means less plugging and risk of exhausting the struct request pool from reclaim. There is a concern that this will cause temporary files that used to get dirtied and truncated before writeback to now get written to disk under memory pressure. If this turns out to be a real problem, we'll have to revisit this and tame the reclaim flusher wakeups. [[email protected]: mention dirty expiration as a condition] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24	mm: vmscan: scan dirty pages even in laptop mode	Johannes Weiner	1	-2/+0
	Patch series "mm: vmscan: fix kswapd writeback regression". We noticed a regression on multiple hadoop workloads when moving from 3.10 to 4.0 and 4.6, which involves kswapd getting tangled up in page writeout, causing direct reclaim herds that also don't make progress. I tracked it down to the thrash avoidance efforts after 3.10 that make the kernel better at keeping use-once cache and use-many cache sorted on the inactive and active list, with more aggressive protection of the active list as long as there is inactive cache. Unfortunately, our workload's use-once cache is mostly from streaming writes. Waiting for writes to avoid potential reloads in the future is not a good tradeoff. These patches do the following: 1. Wake the flushers when kswapd sees a lump of dirty pages. It's possible to be below the dirty background limit and still have cache velocity push them through the LRU. So start a-flushin'. 2. Let kswapd only write pages that have been rotated twice. This makes sure we really tried to get all the clean pages on the inactive list before resorting to horrible LRU-order writeback. 3. Move rotating dirty pages off the inactive list. Instead of churning or waiting on page writeback, we'll go after clean active cache. This might lead to thrashing, but in this state memory demand outstrips IO speed anyway, and reads are faster than writes. Mel backported the series to 4.10-rc5 with one minor conflict and ran a couple of tests on it. Mix of read/write random workload didn't show anything interesting. Write-only database didn't show much difference in performance but there were slight reductions in IO -- probably in the noise. simoop did show big differences although not as big as Mel expected. This is Chris Mason's workload that similate the VM activity of hadoop. Mel won't go through the full details but over the samples measured during an hour it reported 4.10.0-rc5 4.10.0-rc5 vanilla johannes-v1r1 Amean p50-Read 21346531.56 ( 0.00%) 21697513.24 ( -1.64%) Amean p95-Read 24700518.40 ( 0.00%) 25743268.98 ( -4.22%) Amean p99-Read 27959842.13 ( 0.00%) 28963271.11 ( -3.59%) Amean p50-Write 1138.04 ( 0.00%) 989.82 ( 13.02%) Amean p95-Write 1106643.48 ( 0.00%) 12104.00 ( 98.91%) Amean p99-Write 1569213.22 ( 0.00%) 36343.38 ( 97.68%) Amean p50-Allocation 85159.82 ( 0.00%) 79120.70 ( 7.09%) Amean p95-Allocation 204222.58 ( 0.00%) 129018.43 ( 36.82%) Amean p99-Allocation 278070.04 ( 0.00%) 183354.43 ( 34.06%) Amean final-p50-Read 21266432.00 ( 0.00%) 21921792.00 ( -3.08%) Amean final-p95-Read 24870912.00 ( 0.00%) 26116096.00 ( -5.01%) Amean final-p99-Read 28147712.00 ( 0.00%) 29523968.00 ( -4.89%) Amean final-p50-Write 1130.00 ( 0.00%) 977.00 ( 13.54%) Amean final-p95-Write 1033216.00 ( 0.00%) 2980.00 ( 99.71%) Amean final-p99-Write 1517568.00 ( 0.00%) 32672.00 ( 97.85%) Amean final-p50-Allocation 86656.00 ( 0.00%) 78464.00 ( 9.45%) Amean final-p95-Allocation 211712.00 ( 0.00%) 116608.00 ( 44.92%) Amean final-p99-Allocation 287232.00 ( 0.00%) 168704.00 ( 41.27%) The latencies are actually completely horrific in comparison to 4.4 (and 4.10-rc5 is worse than 4.9 according to historical data for reasons Mel hasn't analysed yet). Still, 95% of write latency (p95-write) is halved by the series and allocation latency is way down. Direct reclaim activity is one fifth of what it was according to vmstats. Kswapd activity is higher but this is not necessarily surprising. Kswapd efficiency is unchanged at 99% (99% of pages scanned were reclaimed) but direct reclaim efficiency went from 77% to 99% In the vanilla kernel, 627MB of data was written back from reclaim context. With the series, no data was written back. With or without the patch, pages are being immediately reclaimed after writeback completes. However, with the patch, only 1/8th of the pages are reclaimed like this. This patch (of 5): We have an elaborate dirty/writeback throttling mechanism inside the reclaim scanner, but for that to work the pages have to go through shrink_page_list() and get counted for what they are. Otherwise, we mess up the LRU order and don't match reclaim speed to writeback. Especially during deactivation, there is never a reason to skip dirty pages; nothing is even trying to write them out from there. Don't mess up the LRU order for nothing, shuffle these pages along. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24	userfaultfd: non-cooperative: rename EVENT_MADVDONTNEED to EVENT_REMOVE	Mike Rapoport	1	-8/+8
	Patch series "userfaultfd: non-cooperative: add madvise() event for MADV_REMOVE request". These patches add notification of madvise(MADV_REMOVE) event to non-cooperative userfaultfd monitor. The first pacth renames EVENT_MADVDONTNEED to EVENT_REMOVE along with relevant functions and structures. Using _REMOVE instead of _MADVDONTNEED describes the event semantics more clearly and I hope it's not too late for such change in the ABI. This patch (of 3): The UFFD_EVENT_MADVDONTNEED purpose is to notify uffd monitor about removal of certain range from address space tracked by userfaultfd. Hence, UFFD_EVENT_REMOVE seems to better reflect the operation semantics. Respectively, 'madv_dn' field of uffd_msg is renamed to 'remove' and the madvise_userfault_dontneed callback is renamed to userfaultfd_remove. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Acked-by: Hillf Danton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24	memblock: embed memblock type name within struct memblock_type	Heiko Carstens	1	-0/+1
	Provide the name of each memblock type with struct memblock_type. This allows to get rid of the function memblock_type_name() and duplicating the type names in __memblock_dump_all(). The only memblock_type usage out of mm/memblock.c seems to be arch/s390/kernel/crash_dump.c. While at it, give it a name. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Heiko Carstens <[email protected]> Cc: Philipp Hachtmann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24	mm: validate device_hotplug is held for memory hotplug	Dan Williams	1	-0/+1
	mem_hotplug_begin() assumes that it can set mem_hotplug.active_writer and run the hotplug process without racing another thread. Validate this assumption with a lockdep assertion. Link: http://lkml.kernel.org/r/148693886229.16345.1770484669403334689.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Dan Williams <[email protected]> Reported-by: Ben Hutchings <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Toshi Kani <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Logan Gunthorpe <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Masayoshi Mizuma <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-24	Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md	Linus Torvalds	1	-2/+9
	Pull md updates from Shaohua Li: "Mainly fixes bugs and improves performance: - Improve scalability for raid1 from Coly - Improve raid5-cache read performance, disk efficiency and IO pattern from Song and me - Fix a race condition of disk hotplug for linear from Coly - A few cleanup patches from Ming and Byungchul - Fix a memory leak from Neil - Fix WRITE SAME IO failure from me - Add doc for raid5-cache from me" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: (23 commits) md/raid1: fix write behind issues introduced by bio_clone_bioset_partial md/raid1: handle flush request correctly md/linear: shutup lockdep warnning md/raid1: fix a use-after-free bug RAID1: avoid unnecessary spin locks in I/O barrier code RAID1: a new I/O barrier implementation to remove resync window md/raid5: Don't reinvent the wheel but use existing llist API md: fast clone bio in bio_clone_mddev() md: remove unnecessary check on mddev md/raid1: use bio_clone_bioset_partial() in case of write behind md: fail if mddev->bio_set can't be created block: introduce bio_clone_bioset_partial() md: disable WRITE SAME if it fails in underlayer disks md/raid5-cache: exclude reclaiming stripes in reclaim check md/raid5-cache: stripe reclaim only counts valid stripes MD: add doc for raid5-cache Documentation: move MD related doc into a separate dir md: ensure md devices are freed before module is unloaded. md/r5cache: improve journal device efficiency md/r5cache: enable chunk_aligned_read with write back cache ...
2017-02-24	Merge branch 'for-linus' of git://git.kernel.dk/linux-block	Linus Torvalds	4	-1/+40
	Pull block updates and fixes from Jens Axboe: - NVMe updates and fixes that missed the first pull request. This includes bug fixes, and support for autonomous power management. - Fix from Christoph for missing clear of the request payload, causing a problem with (at least) the storvsc driver. - Further fixes for the queue/bdi life time issues from Jan. - The Kconfig mq scheduler update from me. - Fixing a use-after-free in dm-rq, spotted by Bart, introduced in this merge window. - Three fixes for nbd from Josef. - Bug fix from Omar, fixing a bug in sas transport code that oopses when bsg ioctls were used. From Omar. - Improvements to the queue restart and tag wait from from Omar. - Set of fixes for the sed/opal code from Scott. - Three trivial patches to cciss from Tobin * 'for-linus' of git://git.kernel.dk/linux-block: (41 commits) dm-rq: don't dereference request payload after ending request blk-mq-sched: separate mark hctx and queue restart operations blk-mq: use sbq wait queues instead of restart for driver tags block/sed-opal: Propagate original error message to userland. nvme/pci: re-check security protocol support after reset block/sed-opal: Introduce free_opal_dev to free the structure and clean up state nvme: detect NVMe controller in recent MacBooks nvme-rdma: add support for host_traddr nvmet-rdma: Fix error handling nvmet-rdma: use nvme cm status helper nvme-rdma: move nvme cm status helper to .h file nvme-fc: don't bother to validate ioccsz and iorcsz nvme/pci: No special case for queue busy on IO nvme/core: Fix race kicking freed request_queue nvme/pci: Disable on removal when disconnected nvme: Enable autonomous power state transitions nvme: Add a quirk mechanism that uses identify_ctrl nvme: make nvmf_register_transport require a create_ctrl callback nvme: Use CNS as 8-bit field and avoid endianness conversion nvme: add semicolon in nvme_command setting ...
2017-02-24	watchdog: Introduce watchdog_stop_on_unregister helper	Guenter Roeck	1	-0/+7
	Many watchdog drivers explicitly stop the watchdog when unregistering it. While it is unclear if this is actually needed (the whatdog should not be running at that time if it can be stopped), introduce a helper to explicitly stop the watchdog in the watchdog core when unregistering it. This helps reducing driver code size while retaining functionality. Signed-off-by: Guenter Roeck <[email protected]>
2017-02-24	objtool: Improve detection of BUG() and other dead ends	Josh Poimboeuf	1	-1/+12
	The BUG() macro's use of __builtin_unreachable() via the unreachable() macro tells gcc that the instruction is a dead end, and that it's safe to assume the current code path will not execute past the previous instruction. On x86, the BUG() macro is implemented with the 'ud2' instruction. When objtool's branch analysis sees that instruction, it knows the current code path has come to a dead end. Peter Zijlstra has been working on a patch to change the WARN macros to use 'ud2'. That patch will break objtool's assumption that 'ud2' is always a dead end. Generally it's best for objtool to avoid making those kinds of assumptions anyway. The more ignorant it is of kernel code internals, the better. So create a more generic way for objtool to detect dead ends by adding an annotation to the unreachable() macro. The annotation stores a pointer to the end of the unreachable code path in an '__unreachable' section. Objtool can read that section to find the dead ends. Tested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/41a6d33971462ebd944a1c60ad4bf5be86c17b77.1487712920.git.jpoimboe@redhat.com Signed-off-by: Ingo Molnar <[email protected]>
2017-02-24	locking/refcounts: Add missing kernel.h header to have UINT_MAX defined	Elena Reshetova	1	-0/+1
	Fix header dependency. Suggested-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Elena Reshetova <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-02-24	locking/refcounts: Out-of-line everything	Peter Zijlstra	1	-265/+12
	Linus asked to please make this real C code. And since size then isn't an issue what so ever anymore, remove the debug knob and make all WARN()s unconditional. Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2017-02-23	Merge branch 'for-linus' of ↵	Linus Torvalds	7	-8/+16
	git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull namespace updates from Eric Biederman: "There is a lot here. A lot of these changes result in subtle user visible differences in kernel behavior. I don't expect anything will care but I will revert/fix things immediately if any regressions show up. From Seth Forshee there is a continuation of the work to make the vfs ready for unpriviled mounts. We had thought the previous changes prevented the creation of files outside of s_user_ns of a filesystem, but it turns we missed the O_CREAT path. Ooops. Pavel Tikhomirov and Oleg Nesterov worked together to fix a long standing bug in the implemenation of PR_SET_CHILD_SUBREAPER where only children that are forked after the prctl are considered and not children forked before the prctl. The only known user of this prctl systemd forks all children after the prctl. So no userspace regressions will occur. Holding earlier forked children to the same rules as later forked children creates a semantic that is sane enough to allow checkpoing of processes that use this feature. There is a long delayed change by Nikolay Borisov to limit inotify instances inside a user namespace. Michael Kerrisk extends the API for files used to maniuplate namespaces with two new trivial ioctls to allow discovery of the hierachy and properties of namespaces. Konstantin Khlebnikov with the help of Al Viro adds code that when a network namespace exits purges it's sysctl entries from the dcache. As in some circumstances this could use a lot of memory. Vivek Goyal fixed a bug with stacked filesystems where the permissions on the wrong inode were being checked. I continue previous work on ptracing across exec. Allowing a file to be setuid across exec while being ptraced if the tracer has enough credentials in the user namespace, and if the process has CAP_SETUID in it's own namespace. Proc files for setuid or otherwise undumpable executables are now owned by the root in the user namespace of their mm. Allowing debugging of setuid applications in containers to work better. A bug I introduced with permission checking and automount is now fixed. The big change is to mark the mounts that the kernel initiates as a result of an automount. This allows the permission checks in sget to be safely suppressed for this kind of mount. As the permission check happened when the original filesystem was mounted. Finally a special case in the mount namespace is removed preventing unbounded chains in the mount hash table, and making the semantics simpler which benefits CRIU. The vfs fix along with related work in ima and evm I believe makes us ready to finish developing and merge fully unprivileged mounts of the fuse filesystem. The cleanups of the mount namespace makes discussing how to fix the worst case complexity of umount. The stacked filesystem fixes pave the way for adding multiple mappings for the filesystem uids so that efficient and safer containers can be implemented" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: proc/sysctl: Don't grab i_lock under sysctl_lock. vfs: Use upper filesystem inode in bprm_fill_uid() proc/sysctl: prune stale dentries during unregistering mnt: Tuck mounts under others instead of creating shadow/side mounts. prctl: propagate has_child_subreaper flag to every descendant introduce the walk_process_tree() helper nsfs: Add an ioctl() to return owner UID of a userns fs: Better permission checking for submounts exit: fix the setns() && PR_SET_CHILD_SUBREAPER interaction vfs: open() with O_CREAT should not create inodes with unknown ids nsfs: Add an ioctl() to return the namespace type proc: Better ownership of files for non-dumpable tasks in user namespaces exec: Remove LSM_UNSAFE_PTRACE_CAP exec: Test the ptracer's saved cred to see if the tracee can gain caps exec: Don't reset euid and egid when the tracee has CAP_SETUID inotify: Convert to using per-namespace limits
2017-02-23	Merge tag 'drm-for-v4.11-less-shouty' of ↵	Linus Torvalds	4	-35/+312
	git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "This is the main drm pull request for v4.11. Nothing too major, the tinydrm and mmu-less support should make writing smaller drivers easier for some of the simpler platforms, and there are a bunch of documentation updates. Intel grew displayport MST audio support which is hopefully useful to people, and FBC is on by default for GEN9+ (so people know where to look for regressions). AMDGPU has a lot of fixes that would like new firmware files installed for some GPUs. Other than that it's pretty scattered all over. I may have a follow up pull request as I know BenH has a bunch of AST rework and fixes and I'd like to get those in once they've been tested by AST, and I've got at least one pull request I'm just trying to get the author to fix up. Core: - drm_mm reworked - Connector list locking and iterators - Documentation updates - Format handling rework - MMU-less support for fbdev helpers - drm_crtc_from_index helper - Core CRC API - Remove drm_framebuffer_unregister_private - Debugfs cleanup - EDID/Infoframe fixes - Release callback - Tinydrm support (smaller drivers for simple hw) panel: - Add support for some new simple panels i915: - FBC by default for gen9+ - Shared dpll cleanups and docs - GEN8 powerdomain cleanup - DMC support on GLK - DP MST audio support - HuC loading support - GVT init ordering fixes - GVT IOMMU workaround fix amdgpu/radeon: - Power/clockgating improvements - Preliminary SR-IOV support - TTM buffer priority and eviction fixes - SI DPM quirks removed due to firmware fixes - Powerplay improvements - VCE/UVD powergating fixes - Cleanup SI GFX code to match CI/VI - Support for > 2 displays on 3/5 crtc asics - SI headless fixes nouveau: - Rework securre boot code in prep for GP10x secure boot - Channel recovery improvements - Initial power budget code - MMU rework preperation vmwgfx: - Bunch of fixes and cleanups exynos: - Runtime PM support for MIC driver - Cleanups to use atomic helpers - UHD Support for TM2/TM2E boards - Trigger mode fix for Rinato board etnaviv: - Shader performance fix - Command stream validator fixes - Command buffer suballocator rockchip: - CDN DisplayPort support - IOMMU support for arm64 platform imx-drm: - Fix i.MX5 TV encoder probing - Remove lower fb size limits msm: - Support for HW cursor on MDP5 devices - DSI encoder cleanup - GPU DT bindings cleanup sti: - stih410 cleanups - Create fbdev at binding - HQVDP fixes - Remove stih416 chip functionality - DVI/HDMI mode selection fixes - FPS statistic reporting omapdrm: - IRQ code cleanup dwi-hdmi bridge: - Cleanups and fixes adv-bridge: - Updates for nexus sii8520 bridge: - Add interlace mode support - Rework HDMI and lots of fixes qxl: - probing/teardown cleanups ZTE drm: - HDMI audio via SPDIF interface - Video Layer overlay plane support - Add TV encoder output device atmel-hlcdc: - Rework fbdev creation logic tegra: - OF node fix fsl-dcu: - Minor fixes mali-dp: - Assorted fixes sunxi: - Minor fix" [ This was the "fixed" pull, that still had build warnings due to people not even having build tested the result. I'm not a happy camper I've fixed the things I noticed up in this merge. - Linus ] * tag 'drm-for-v4.11-less-shouty' of git://people.freedesktop.org/~airlied/linux: (1177 commits) lib/Kconfig: make PRIME_NUMBERS not user selectable drm/tinydrm: helpers: Properly fix backlight dependency drm/tinydrm: mipi-dbi: Fix field width specifier warning drm/tinydrm: mipi-dbi: Silence: ‘cmd’ may be used uninitialized drm/sti: fix build warnings in sti_drv.c and sti_vtg.c files drm/amd/powerplay: fix PSI feature on Polars12 drm/amdgpu: refuse to reserve io mem for split VRAM buffers drm/ttm: fix use-after-free races in vm fault handling drm/tinydrm: Add support for Multi-Inno MI0283QT display dt-bindings: Add Multi-Inno MI0283QT binding dt-bindings: display/panel: Add common rotation property of: Add vendor prefix for Multi-Inno drm/tinydrm: Add MIPI DBI support drm/tinydrm: Add helper functions drm: Add DRM support for tiny LCD displays drm/amd/amdgpu: post card if there is real hw resetting performed drm/nouveau/tmr: provide backtrace when a timeout is hit drm/nouveau/pci/g92: Fix rearm drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios drm/nouveau/hwmon: expose power_max and power_crit ..
2017-02-23	Merge tag 'armsoc-drivers' of ↵	Linus Torvalds	3	-102/+88
	git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc Pull ARM SoC driver updates from Arnd Bergmann: "Driver updates for ARM SoCs. A handful of driver changes this time around. The larger changes are: - Reset drivers for hi3660 and zx2967 - AHCI driver for Davinci, acked by Tejun and brought in here due to platform dependencies - Cleanups of atmel-ebi (External Bus Interface) - Tweaks for Rockchip GRF (General Register File) usage (kitchensink misc register range on the SoCs) - PM domains changes for support of two new ZTE SoCs (zx296718 and zx2967)" * tag 'armsoc-drivers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (53 commits) soc: samsung: pmu: Add register defines for pad retention control reset: make zx2967 explicitly non-modular reset: core: fix reset_control_put soc: samsung: pm_domains: Read domain name from the new label property soc: samsung: pm_domains: Remove message about failed memory allocation soc: samsung: pm_domains: Remove unused name field soc: samsung: pm_domains: Use full names in subdomains registration log sata: ahci-da850: un-hardcode the MPY bits sata: ahci-da850: add a workaround for controller instability sata: ahci: export ahci_do_hardreset() locally sata: ahci-da850: implement a workaround for the softreset quirk sata: ahci-da850: add device tree match table sata: ahci-da850: get the sata clock using a connection id soc: samsung: pmu: Remove duplicated define for ARM_L2_OPTION register memory: atmel-ebi: Enable the SMC clock if specified soc: samsung: pmu: Remove unused and duplicated defines memory: atmel-ebi: Properly handle multiple reference to the same CS memory: atmel-ebi: Fix the test to enable generic SMC logic soc: samsung: pm_domains: Add new Exynos5433 compatible soc: samsung: pmu: Add dummy support for Exynos5433 SoC ...
2017-02-23	Merge tag 'pci-v4.11-changes' of ↵	Linus Torvalds	3	-23/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: - add ASPM L1 substate support - enable PCIe Extended Tags when supported - configure PCIe MPS settings on iProc, Versatile, X-Gene, and Xilinx - increase VPD access timeout - add ACS quirks for Intel Union Point, Qualcomm QDF2400 and QDF2432 - use new pci_irq_alloc_vectors() in more drivers - fix MSI affinity memory leak - remove unused MSI interfaces and update documentation - remove unused AER .link_reset() callback - avoid pci_lock / p->pi_lock deadlock seen with perf - serialize sysfs enable/disable num_vfs operations - move DesignWare IP from drivers/pci/host/ to drivers/pci/dwc/ and refactor so we can support both hosts and endpoints - add DT ECAM-like support for HiSilicon Hip06/Hip07 controllers - add Rockchip system power management support - add Thunder-X cn81xx and cn83xx support - add Exynos 5440 PCIe PHY support * tag 'pci-v4.11-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (93 commits) PCI: dwc: Remove dependency of designware on CONFIG_PCI PCI: dwc: Add CONFIG_PCIE_DW_HOST to enable PCI dwc host PCI: dwc: Split pcie-designware.c into host and core files PCI: dwc: designware: Fix style errors in pcie-designware.c PCI: dwc: designware: Parse "num-lanes" property in dw_pcie_setup_rc() PCI: dwc: all: Split struct pcie_port into host-only and core structures PCI: dwc: designware: Get device pointer at the start of dw_pcie_host_init() PCI: dwc: all: Rename cfg_read/cfg_write to read/write PCI: dwc: all: Use platform_set_drvdata() to save private data PCI: dwc: designware: Move register defines to designware header file PCI: dwc: Use PTR_ERR_OR_ZERO to simplify code PCI: dra7xx: Group PHY API invocations PCI: dra7xx: Enable MSI and legacy interrupts simultaneously PCI: dra7xx: Add support to force RC to work in GEN1 mode PCI: dra7xx: Simplify probe code with devm_gpiod_get_optional() PCI: Move DesignWare IP support to new drivers/pci/dwc/ directory PCI: exynos: Support the PHY generic framework Documentation: binding: Modify the exynos5440 PCIe binding phy: phy-exynos-pcie: Add support for Exynos PCIe PHY Documentation: samsung-phy: Add exynos-pcie-phy binding ...
2017-02-23	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net	Linus Torvalds	2	-1/+11
	Pull networking fixes from David Miller: 1) Some 'const'ing in qlogic networking drivers, from Bhumika Goyal. 2) Fix scheduling while atomic in l2tp network namespace exit by deferring the work to the workqueue. From Ridge Kennedy. 3) Fix use after free in dccp timewait handling, from Andrey Ryabinin. 4) mlx5e CQE compression engine not initialized properly, from Tariq Toukan. 5) Some UAPI header fixes from Dmitry V. Levin. 6) Don't overwrite module parameter value in mlx4 driver, from Majd Dibbiny. 7) Fix divide by zero in xt_hashlimit netfilter module, from Alban Browaeys. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (35 commits) bpf: Fix bpf_xdp_event_output net/mlx4_en: Use __skb_fill_page_desc() net/mlx4_core: Use cq quota in SRIOV when creating completion EQs net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs net/mlx4: Spoofcheck and zero MAC can't coexist net/mlx4: Change ENOTSUPP to EOPNOTSUPP uapi: fix linux/rds.h userspace compilation errors uapi: fix linux/seg6.h and linux/seg6_iptunnel.h userspace compilation errors lib: Remove string from parman config selection forcedeth: Remove return from a void function bpf: fix spelling mistake: "proccessed" -> "processed" uapi: fix linux/llc.h userspace compilation error uapi: fix linux/ip6_tunnel.h userspace compilation errors net/mlx5e: Fix wrong CQE decompression net/mlx5e: Update MPWQE stride size when modifying CQE compress state net/mlx5e: Fix broken CQE compression initialization net/mlx5e: Do not reduce LRO WQE size when not using build_skb net/mlx5e: Register/unregister vport representors on interface attach/detach net/mlx5e: s390 system compilation fix tcp: account for ts offset only if tsecr not zero ...
2017-02-23	Merge tag 'for-linus' of ↵	Linus Torvalds	2	-2/+6
	git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma Pull Mellanox rdma updates from Doug Ledford: "Mellanox specific updates for 4.11 merge window Because the Mellanox code required being based on a net-next tree, I keept it separate from the remainder of the RDMA stack submission that is based on 4.10-rc3. This branch contains: - Various mlx4 and mlx5 fixes and minor changes - Support for adding a tag match rule to flow specs - Support for cvlan offload operation for raw ethernet QPs - A change to the core IB code to recognize raw eth capabilities and enumerate them (touches non-Mellanox code) - Implicit On-Demand Paging memory registration support" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (40 commits) IB/mlx5: Fix configuration of port capabilities IB/mlx4: Take source GID by index from HW GID table IB/mlx5: Fix blue flame buffer size calculation IB/mlx4: Remove unused variable from function declaration IB: Query ports via the core instead of direct into the driver IB: Add protocol for USNIC IB/mlx4: Support raw packet protocol IB/mlx5: Support raw packet protocol IB/core: Add raw packet protocol IB/mlx5: Add implicit MR support IB/mlx5: Expose MR cache for mlx5_ib IB/mlx5: Add null_mkey access IB/umem: Indicate that process is being terminated IB/umem: Update on demand page (ODP) support IB/core: Add implicit MR flag IB/mlx5: Support creation of a WQ with scatter FCS offload IB/mlx5: Enable QP creation with cvlan offload IB/mlx5: Enable WQ creation and modification with cvlan offload IB/mlx5: Expose vlan offloads capabilities IB/uverbs: Enable QP creation with cvlan offload ...
2017-02-23	blk-mq: use sbq wait queues instead of restart for driver tags	Omar Sandoval	1	-0/+2
	Commit 50e1dab86aa2 ("blk-mq-sched: fix starvation for multiple hardware queues and shared tags") fixed one starvation issue for shared tags. However, we can still get into a situation where we fail to allocate a tag because all tags are allocated but we don't have any pending requests on any hardware queue. One solution for this would be to restart all queues that share a tag map, but that really sucks. Ideally, we could just block and wait for a tag, but that isn't always possible from blk_mq_dispatch_rq_list(). However, we can still use the struct sbitmap_queue wait queues with a custom callback instead of blocking. This has a few benefits: 1. It avoids iterating over all hardware queues when completing an I/O, which the current restart code has to do. 2. It benefits from the existing rolling wakeup code. 3. It avoids punting to another thread just to have it block. Signed-off-by: Omar Sandoval <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-23	block/sed-opal: Introduce free_opal_dev to free the structure and clean up state	Scott Bauer	1	-0/+5
	Before we free the opal structure we need to clean up any saved locking ranges that the user had told us to unlock from a suspend. Signed-off-by: Scott Bauer <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2017-02-23	Merge branch 'linus' of ↵	Linus Torvalds	2	-0/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Try to catch hash output overrun in testmgr - Introduce walksize attribute for batched walking - Make crypto_xor() and crypto_inc() alignment agnostic Algorithms: - Add time-invariant AES algorithm - Add standalone CBCMAC algorithm Drivers: - Add NEON acclerated chacha20 on ARM/ARM64 - Expose AES-CTR as synchronous skcipher on ARM64 - Add scalar AES implementation on ARM64 - Improve scalar AES implementation on ARM - Improve NEON AES implementation on ARM/ARM64 - Merge CRC32 and PMULL instruction based drivers on ARM64 - Add NEON acclerated CBCMAC/CMAC/XCBC AES on ARM64 - Add IPsec AUTHENC implementation in atmel - Add Support for Octeon-tx CPT Engine - Add Broadcom SPU driver - Add MediaTek driver" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (142 commits) crypto: xts - Add ECB dependency crypto: cavium - switch to pci_alloc_irq_vectors crypto: cavium - switch to pci_alloc_irq_vectors crypto: cavium - remove dead MSI-X related define crypto: brcm - Avoid double free in ahash_finup() crypto: cavium - fix Kconfig dependencies crypto: cavium - cpt_bind_vq_to_grp could return an error code crypto: doc - fix typo hwrng: omap - update Kconfig help description crypto: ccm - drop unnecessary minimum 32-bit alignment crypto: ccm - honour alignmask of subordinate MAC cipher crypto: caam - fix state buffer DMA (un)mapping crypto: caam - abstract ahash request double buffering crypto: caam - fix error path for ctx_dma mapping failure crypto: caam - fix DMA API leaks for multiple setkey() calls crypto: caam - don't dma_map key for hash algorithms crypto: caam - use dma_map_sg() return code crypto: caam - replace sg_count() with sg_nents_for_len() crypto: caam - check sg_count() return value crypto: caam - fix HW S/G in ablkcipher_giv_edesc_alloc() ..
2017-02-23	Merge tag 'rpmsg-v4.11' of git://github.com/andersson/remoteproc	Linus Torvalds	2	-4/+15
	Pull rpmsg updates from Bjorn Andersson: "This introduces an interface for user space to directly communicate on rpmsg endpoints without the implementation of specific kernel drivers, which is useful for e.g. debug channels: * tag 'rpmsg-v4.11' of git://github.com/andersson/remoteproc: rpmsg: rpmsg_create_ept() returns NULL on error rpmsg: qcom: smd: Return positively when not enabled rpmsg: unlock on error in rpmsg_eptdev_read() rpmsg: char: add CONFIG_NET dependency rpmsg: smd: Register rpmsg user space interface for edges rpmsg: Driver for user space endpoint interface rpmsg: qcom_smd: Implement endpoint "poll" rpmsg: Introduce "poll" to endpoint ops rpmsg: qcom_smd: Add support for "label" property
2017-02-23	Merge tag 'rproc-v4.11' of git://github.com/andersson/remoteproc	Linus Torvalds	2	-3/+21
	Pull remoteproc updates from Bjorn Andersson: "This introduces support for booting the dedicated sensor core in the Qualcomm MSM8996, updates the Qualcomm ADSP and Hexagon drivers to utilize SMD subdevice helpers for properly handle shutdowns and restarts of the remoteproc, add virtio support to the ST remoteproc and refactor the Qualcomm Hexagon driver to handle variations between platforms. The support code for parsing, loading and authenticating Qualcomm firmware files (MDT) is refactored and move to drivers/soc/qcom, to allow for non-remoteproc drivers to utilize this. Finally it brings some cleanups to the remoteproc core" * tag 'rproc-v4.11' of git://github.com/andersson/remoteproc: (27 commits) remoteproc: qcom: mdt_loader: Use signed type for offset remoteproc: st: add virtio communication support remoteproc: st: correct probe error management remoteproc: Modify the function names remoteproc: Reduce asynchronous request_firmware to auto-boot only remoteproc: Drop qcom_scm_pas_supported() from adsp_probe() MAINTAINERS: Add missing rpmsg include path remoteproc: qcom: Use common SMD edge handler remoteproc: qcom: wcnss: Make SMD handling common remoteproc: Move qcom_mdt_loader into drivers/soc/qcom remoteproc: qcom: mdt_loader: Refactor MDT loader remoteproc: qcom: mdt_loader: Don't overwrite firmware object remoteproc: qcom: Extract non-mdt related helper remoteproc: qcom: q6v5: Decouple driver from MDT loader remoteproc: qcom: q6v5: Remove mss supply from 8916 remoteproc: qcom: fix initializers for qcom_mss_reg_res array remoteproc: Drop firmware_loading_complete remoteproc: Add RPROC_DELETED state remoteproc: Move rproc_delete_debug_dir() to rproc_del() remoteproc: qcom: Add SLPI rproc support to load and boot slpi proc. ...
2017-02-23	Merge tag 'sound-4.11-rc1' of ↵	Linus Torvalds	2	-4/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound updates from Takashi Iwai: "Here is the update of sound bits for 4.11: again at this time, no big changes in ALSA and ASoC core but only cosmetic changes like consitifaction. Meanwhile, quite a lot of developments are seen in a few driver side. ALSA Core: - Clean up, consitification of some ops HD-audio: - A slight behavior change of single_cmd option - Quirks for AmigaOne X1000, Samsung Ativ Book 8, Dell AiO, ALC221 HP, and fixes for Lewisburg controller - Realtek ALC299, ALC1220 codecs Others: - USB-audio: Tascam US-16x08 DSP mixer quirk - Intel HDMI LPE audio support for Baytrail / Cherrytrail; this contains some updates in drm/i915 for the new platform binding ASoC: - Lots of updates in Intel drivers, mostly for DisplayPort and HDMI on Skylake and onwards, as well as more Baytrail / Cherrytrail boards support - Channel mapping support for HDMI - Support for AllWinner A31 and A33, Everest Semiconductor ES8328, Nuvoton NAU8540. * tag 'sound-4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (323 commits) ALSA: usb-audio: Tidy up mixer_us16x08.c ALSA: usb-audio: Fix memory leak and corruption in mixer_us16x08.c ALSA: usb-audio: purge needless variable length array ALSA: x86: hdmi: select CONFIG_SND_PCM ALSA: x86: Don't enable runtime PM as default ALSA: x86: Use runtime PM autosuspend ALSA: usb-audio: localize function without external linkage ALSA: usb-audio: localize one-referrer variable ALSA: usb-audio: Tascam US-16x08 DSP mixer quirk ALSA: emu10k1: constify snd_emux_operators structure ASoC: sun4i-spdif: drop unnessary snd_soc_unregister_component() ASoC: Intel: bxt: Add jack port initialize in bxt_rt298 machine ASoC: nau8825: automatic BCLK and LRC divde in master mode ASoC: hdac_hdmi: Add device id for Geminilake ASoC: Intel: Skylake: Add Geminlake IDs ASoC: rt298: Add DMI match for Geminilake reference platform ASoC: Intel: Skylake: Check device type to get endpoint configuration ASoC: Intel: bxt: Add jack port initialize in da7219_max98357a machine ASoC: Intel: Skylake: Add jack port initialize in nau88l25_ssm4567 machine ASoC: Intel: Skylake: Add jack port initialize in nau88l25_max98357a machine ...
2017-02-23	Merge tag 'gpio-v4.11-1' of ↵	Linus Torvalds	2	-13/+57
	git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio Pull GPIO updates from Linus Walleij: "This is the bulk of GPIO changes for the v4.11 cycle Core changes: - Augment fwnode_get_named_gpiod() to configure the GPIO pin immediately after requesting it like all other APIs do. This is a treewide change also updating all users. - Pass a GPIO label down to gpiod_request() from fwnode_get_named_gpiod(). This makes debugfs and the userspace ABI correctly reflect the current in-kernel consumer of a pin taken using this abstraction. This is a treewide change also updating all users. - Rename devm_get_gpiod_from_child() to devm_fwnode_get_gpiod_from_child() to reflect the fact that this function is operating on a fwnode object. This is a treewide change also updating all users. - Make it possible to take multiple GPIOs in a single hog of device tree hogs. - The refactorings switching GPIO chips to use the .set_config() callback using standard pin control properties and providing a backend into the pin control subsystem that were also merged into the pin control tree naturally appear here too. Testing instrumentation: - A whole slew of cleanups and improvements to the mockup GPIO driver. We now have an extended userspace test exercising the subsystem, and we can inject interrupts etc from userspace to fully test the core GPIO functionality. New drivers: - New driver for the Cortina Systems Gemini GPIO controller. - New driver for the Exar XR17V352/354/358 chips. - New driver for the ACCES PCI-IDIO-16 PCI GPIO card. Driver changes: - RCAR: set the irqchip parent device, add fine-grained runtime PM support. - pca953x: support optional RESET control line on the chip. - DaVinci: cleanups and simplifications. Add support for multiple instances. - .set_multiple() and naming of lines on more or less all of the ISA/PCI GPIO controllers. - mcp23s08: refactored to use regmap as a first step to further rewrites and modernizations" * tag 'gpio-v4.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio: (61 commits) gpio: reintroduce devm_get_gpiod_from_child() gpio: pci-idio-16: Fix PCI BAR index gpio: pci-idio-16: Fix PCI device ID code gpio: mockup: implement event injecting over debugfs gpio: mockup: add a dummy irqchip gpio: mockup: implement naming the lines gpio: mockup: code shrink gpio: mockup: readability tweaks gpio: Add GPIO support for the ACCES PCI-IDIO-16 gpio: Add the devm_fwnode_get_index_gpiod_from_child() helper gpio: Rename devm_get_gpiod_from_child() gpio: mcp23s08: Select REGMAP/REGMAP_I2C to fix build error gpio: ws16c48: Add support for GPIO names gpio: gpio-mm: Add support for GPIO names gpio: 104-idio-16: Add support for GPIO names gpio: 104-idi-48: Add support for GPIO names gpio: 104-dio-48e: Add support for GPIO names gpio: ws16c48: Remove unnecessary driver_data set gpio: gpio-mm: Remove unnecessary driver_data set gpio: 104-idio-16: Remove unnecessary driver_data set ...
2017-02-23	Merge branch 'for-linus' of ↵	Linus Torvalds	4	-116/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input updates from Dmitry: - a new driver for Zeitech touchscreen controller - a new driver for Samsung "touchkeys" - touchscreen driver for Moorestown platform has been removed because platform support is gone - MPU3050 accelerometer driver was removed in favor of IIO driver - miscellaneous driver cleanup and fixes * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (88 commits) Input: zet6223 - export OF device ID as module aliases Input: tsc2004/5 - switch to using generic device properties Input: tsc2004/5 - fix regulator handling Input: tsc2005 - add OF device table Input: add driver for Zeitec ZET6223 Input: joydev - do not report stale values on first open Input: synaptics-rmi4 - forward upper mechanical buttons to PS/2 guest Input: synaptics-rmi4 - clean up F30 implementation Input: synaptics - use SERIO_OOB_DATA to handle trackstick buttons Input: psmouse - add a custom serio protocol to send extra information Input: synaptics-rmi4 - fix error return code in rmi_probe_interrupts() Input: xpad - restore LED state after device resume Input: synaptics-rmi4 - add rmi_find_function() Input: xpad - fix stuck mode button on Xbox One S pad Input: joydev - use clamp() macro Input: refuse to register absolute devices without absinfo Input: synaptics-rmi4 - add sysfs interfaces for hardware IDs Input: synaptics-rmi4 - add sysfs attribute update_fw_status Input: mousedev - stop offering PS/2 to userspace by default Input: tca8418 - switch to using generic device properties ...
2017-02-23	Merge tag 'mfd-for-linus-4.11' of ↵	Linus Torvalds	6	-12/+396
	git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD updates from Lee Jones: "Core Frameworks: - Add new !TOUCHSCREEN_SUN4I dependency for SUN4I_GPADC - List include/dt-bindings/mfd/* to files supported in MAINTAINERS New Drivers: - Intel Apollo Lake SPI NOR - ST STM32 Timers (Advanced, Basic and PWM) - Motorola 6556002 CPCAP (PMIC) New Device Support: - Add support for AXP221 to axp20x - Add support for Intel Gemini Lake to intel-lpss-pci - Add support for MT6323 LED to mt6397-core - Add support for COMe-bBD#, COMe-bSL6, COMe-bKL6, COMe-cAL6 and COMe-cKL6 to kempld-core New Functionality: - Add support for Analog CODAC to sun6i-prcm - Add support for Watchdog to lpc_ich Fix-ups: - Error handling improvements; axp288_charger, axp20x, ab8500-sysctrl - Adapt platform data handling; axp20x - IRQ handling improvements; arizona, axp20x - Remove superfluous code; arizona, axp20x, lpc_ich - Trivial coding style/spelling fixes; axp20x, abx500, mfd.txt - Regmap fix-ups; axp20x - DT changes; mfd.txt, aspeed-lpc, aspeed-gfx, ab8500-core, tps65912, mt6397 - Use new I2C probing mechanism; max77686 - Constification; rk808 Bug Fixes: - Stop data transfer whilst suspended; cros_ec" * tag 'mfd-for-linus-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: (43 commits) mfd: lpc_ich: Enable watchdog on Intel Apollo Lake PCH mfd: lpc_ich: Remove useless comments in core part mfd: Add support for several boards to Kontron PLD driver mfd: constify regmap_irq_chip structures MAINTAINERS: Add include/dt-bindings/mfd to MFD entry mfd: cpcap: Add minimal support mfd: mt6397: Add MT6323 LED support into MT6397 driver Documentation: devicetree: Add LED subnode binding for MT6323 PMIC mfd: tps65912: Export OF device ID table as module aliases mfd: ab8500-core: Rename clock device and compatible mfd: cros_ec: Send correct suspend/resume event to EC mfd: max77686: Remove I2C device ID table mfd: max77686: Use the struct i2c_driver .probe_new instead of .probe mfd: max77686: Use of_device_get_match_data() helper mfd: max77686: Don't attempt to get i2c_device_id .data mfd: ab8500-sysctrl: Handle probe deferral mfd: intel-lpss: Add Intel Gemini Lake PCI IDs mfd: axp20x: Fix AXP806 access errors on cold boot mfd: cros_ec: Send suspend state notification to EC mfd: cros_ec: Prevent data transfer while device is suspended ...
2017-02-23	net/mlx4: Spoofcheck and zero MAC can't coexist	Eugenia Emantayev	2	-1/+11
	Spoofcheck can't be enabled if VF MAC is zero. Vice versa, can't zero MAC if spoofcheck is on. Fixes: 8f7ba3ca12f6 ('net/mlx4: Add set VF mac address support') Signed-off-by: Eugenia Emantayev <[email protected]> Signed-off-by: Tariq Toukan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-02-22	Merge branch 'akpm' (patches from Andrew)	Linus Torvalds	17	-41/+225
	Merge updates from Andrew Morton: "142 patches: - DAX updates - various misc bits - OCFS2 updates - most of MM" * emailed patches from Andrew Morton <[email protected]>: (142 commits) mm/z3fold.c: limit first_num to the actual range of possible buddy indexes mm: fix <linux/pagemap.h> stray kernel-doc notation zram: remove obsolete sysfs attrs mm/memblock.c: remove unnecessary log and clean up oom-reaper: use madvise_dontneed() logic to decide if unmap the VMA mm: drop unused argument of zap_page_range() mm: drop zap_details::check_swap_entries mm: drop zap_details::ignore_dirty mm, page_alloc: warn_alloc nodemask is NULL when cpusets are disabled mm: help __GFP_NOFAIL allocations which do not trigger OOM killer mm, oom: do not enforce OOM killer for __GFP_NOFAIL automatically mm: consolidate GFP_NOFAIL checks in the allocator slowpath lib/show_mem.c: teach show_mem to work with the given nodemask arch, mm: remove arch specific show_mem mm, page_alloc: warn_alloc print nodemask mm, page_alloc: do not report all nodes in show_mem Revert "mm: bail out in shrink_inactive_list()" mm, vmscan: consider eligible zones in get_scan_count mm, vmscan: cleanup lru size claculations mm, vmscan: do not count freed pages as PGDEACTIVATE ...
2017-02-22	Merge tag 'devicetree-for-4.11' of ↵	Linus Torvalds	2	-1/+8
	git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull DeviceTree updates from Rob Herring: "Pretty standard stuff with dtc upstream sync being the biggest piece. - Sync dtc to upstream commit 0931cea3ba20. This picks up overlay support in dtc. - Set dma_ops for reserved memory users. - Make references to IOMMU consistent in DT bindings. - Cleanup references to pm_power_off in bindings. - Move some display bindings that snuck into the old bindings/video/ path. - Fix some wrong documentation paths caused from binding restructuring. - Vendor prefixes for Faraday and Fujitsu. - Fix an of_node ref counting leak in of_find_node_opts_by_path - Introduce new graph helper of_graph_get_remote_node() which will be used by DRM drivers in 4.12" * tag 'devicetree-for-4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (27 commits) DT: add Faraday Tec. as vendor of: introduce of_graph_get_remote_node of: Add missing space at end of pr_fmt(). of: make of_device_make_bus_id() static of: fix of_node leak caused in of_find_node_opts_by_path dt-bindings: net: remove reference to fixed link support dt-bindings: power: reset: qnap-poweroff: Drop reference to pm_power_off dt-bindings: power: reset: gpio-poweroff: Drop reference to pm_power_off dt-bindings: mfd: as3722: Drop reference to pm_power_off dt-bindings: display: move ANX7814 and SiI8620 bridge bindings of/unittest: Swap arguments of of_unittest_apply_overlay() Documentation: usb: fix wrong documentation paths serial: fsl-imx-uart.txt: Remove generic property devicetree: Add Fujitsu Ltd. vendor prefix Documentation: display: fix wrong documentation paths of: remove redundant memset in overlay bus:qcom : Fix typo in qcom,ebi2.txt dt-bindings: qman: Remove pool channel node Documentation: panel-dpi: fix path to display-timing.txt devicetree: bindings: clk: mvebu: fix description for sata1 on Armada XP ...
2017-02-22	Merge tag 'docs-4.11' of git://git.lwn.net/linux	Linus Torvalds	1	-56/+54
	Pull documentation updates from Jonathan Corbet: "A slightly quieter cycle for documentation this time around. Three more DocBook template files have been converted to RST; only 21 to go. There are various build improvements and the usual array of documentation improvements and fixes" * tag 'docs-4.11' of git://git.lwn.net/linux: (44 commits) docs / driver-api: Fix structure references in device_link.rst PM / docs: Fix structure references in device.rst Add a target to check broken external links in the Documentation Documentation: Fix linux-api list typo Documentation: DocBook/Makefile comment typo Improve sparse documentation Documentation: make Makefile.sphinx no-ops quieter Documentation: DMA-ISA-LPC.txt Documentation: input: fix path to input code definitions docs: Remove the copyright year from conf.py docs: Fix a warning in the Korean HOWTO.rst translation PM / sleep / docs: Convert PM notifiers document to reST PM / core / docs: Convert sleep states API document to reST PM / core: Update kerneldoc comments in pm.h doc-rst: Fix recursive make invocation from macros doc-rst: Delete output of failed dot-SVG conversion doc-rst: Break shell command sequences on failure Documentation/sphinx: make targets independent of Sphinx work for HAVE_SPHINX=0 doc-rst: fixed cleandoc target when used with O=dir Documentation/sphinx: prevent generation of .pyc files in the source tree ...
2017-02-22	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	2	-14/+49
	Pull KVM updates from Paolo Bonzini: "4.11 is going to be a relatively large release for KVM, with a little over 200 commits and noteworthy changes for most architectures. ARM: - GICv3 save/restore - cache flushing fixes - working MSI injection for GICv3 ITS - physical timer emulation MIPS: - various improvements under the hood - support for SMP guests - a large rewrite of MMU emulation. KVM MIPS can now use MMU notifiers to support copy-on-write, KSM, idle page tracking, swapping, ballooning and everything else. KVM_CAP_READONLY_MEM is also supported, so that writes to some memory regions can be treated as MMIO. The new MMU also paves the way for hardware virtualization support. PPC: - support for POWER9 using the radix-tree MMU for host and guest - resizable hashed page table - bugfixes. s390: - expose more features to the guest - more SIMD extensions - instruction execution protection - ESOP2 x86: - improved hashing in the MMU - faster PageLRU tracking for Intel CPUs without EPT A/D bits - some refactoring of nested VMX entry/exit code, preparing for live migration support of nested hypervisors - expose yet another AVX512 CPUID bit - host-to-guest PTP support - refactoring of interrupt injection, with some optimizations thrown in and some duct tape removed. - remove lazy FPU handling - optimizations of user-mode exits - optimizations of vcpu_is_preempted() for KVM guests generic: - alternative signaling mechanism that doesn't pound on tsk->sighand->siglock" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (195 commits) x86/kvm: Provide optimized version of vcpu_is_preempted() for x86-64 x86/paravirt: Change vcp_is_preempted() arg type to long KVM: VMX: use correct vmcs_read/write for guest segment selector/base x86/kvm/vmx: Defer TR reload after VM exit x86/asm/64: Drop __cacheline_aligned from struct x86_hw_tss x86/kvm/vmx: Simplify segment_base() x86/kvm/vmx: Get rid of segment_base() on 64-bit kernels x86/kvm/vmx: Don't fetch the TSS base from the GDT x86/asm: Define the kernel TSS limit in a macro kvm: fix page struct leak in handle_vmon KVM: PPC: Book3S HV: Disable HPT resizing on POWER9 for now KVM: Return an error code only as a constant in kvm_get_dirty_log() KVM: Return an error code only as a constant in kvm_get_dirty_log_protect() KVM: Return directly after a failed copy_from_user() in kvm_vm_compat_ioctl() KVM: x86: remove code for lazy FPU handling KVM: race-free exit from KVM_RUN without POSIX signals KVM: PPC: Book3S HV: Turn "KVM guest htab" message into a debug message KVM: PPC: Book3S PR: Ratelimit copy data failure error messages KVM: Support vCPU-based gfn->hva cache KVM: use separate generations for each address space ...
2017-02-23	Merge tag 'v4.10-rc8' into drm-next	Dave Airlie	21	-54/+123
	Linux 4.10-rc8 Backmerge Linus rc8 to fix some conflicts, but also to avoid pulling it in via a fixes pull from someone.
2017-02-22	Merge tag 'xfs-4.11-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux	Linus Torvalds	2	-11/+11
	Pull xfs updates from Darrick Wong: "Here are the XFS changes for 4.11. We aren't introducing any major features in this release cycle except for this being the first merge window I've managed on my own. :) Changes since last update: - Various cleanups - Livelock fixes for eofblocks scanning - Improved input verification for on-disk metadata - Fix races in the copy on write remap mechanism - Fix buffer io error timeout controls - Streamlining of directio copy on write - Asynchronous discard support - Fix asserts when splitting delalloc reservations - Don't bloat bmbt when right shifting extents - Inode alignment fixes for 32k block sizes" * tag 'xfs-4.11-merge-7' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (39 commits) xfs: remove XFS_ALLOCTYPE_ANY_AG and XFS_ALLOCTYPE_START_AG xfs: simplify xfs_rtallocate_extent xfs: tune down agno asserts in the bmap code xfs: Use xfs_icluster_size_fsb() to calculate inode chunk alignment xfs: don't reserve blocks for right shift transactions xfs: fix len comparison in xfs_extent_busy_trim xfs: fix uninitialized variable in _reflink_convert_cow xfs: split indlen reservations fairly when under reserved xfs: handle indlen shortage on delalloc extent merge xfs: resurrect debug mode drop buffered writes mechanism xfs: clear delalloc and cache on buffered write failure xfs: don't block the log commit handler for discards xfs: improve busy extent sorting xfs: improve handling of busy extents in the low-level allocator xfs: don't fail xfs_extent_busy allocation xfs: correct null checks and error processing in xfs_initialize_perag xfs: update ctime and mtime on clone destinatation inodes xfs: allocate direct I/O COW blocks in iomap_begin xfs: go straight to real allocations for direct I/O COW writes xfs: return the converted extent in __xfs_reflink_convert_cow ...
2017-02-22	Merge branch 'for-linus' of ↵	Linus Torvalds	1	-6/+15
	git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk Pull printk updates from Petr Mladek: - Add Petr Mladek, Sergey Senozhatsky as printk maintainers, and Steven Rostedt as the printk reviewer. This idea came up after the discussion about printk issues at Kernel Summit. It was formulated and discussed at lkml[1]. - Extend a lock-less NMI per-cpu buffers idea to handle recursive printk() calls by Sergey Senozhatsky[2]. It is the first step in sanitizing printk as discussed at Kernel Summit. The change allows to see messages that would normally get ignored or would cause a deadlock. Also it allows to enable lockdep in printk(). This already paid off. The testing in linux-next helped to discover two old problems that were hidden before[3][4]. - Remove unused parameter by Sergey Senozhatsky. Clean up after a past change. [1] http://lkml.kernel.org/r/[email protected] [2] http://lkml.kernel.org/r/[email protected] [3] http://lkml.kernel.org/r/[email protected] [4] http://lkml.kernel.org/r/[email protected] * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk: printk: drop call_console_drivers() unused param printk: convert the rest to printk-safe printk: remove zap_locks() function printk: use printk_safe buffers in printk printk: report lost messages in printk safe/nmi contexts printk: always use deferred printk when flush printk_safe lines printk: introduce per-cpu safe_print seq buffer printk: rename nmi.c and exported api printk: use vprintk_func in vprintk() MAINTAINERS: Add printk maintainers
2017-02-22	Merge tag 'modules-for-v4.11' of ↵	Linus Torvalds	1	-4/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules updates from Jessica Yu: "Summary of modules changes for the 4.11 merge window: - A few small code cleanups - Add modules git tree url to MAINTAINERS" * tag 'modules-for-v4.11' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: MAINTAINERS: add tree for modules module: fix memory leak on early load_module() failures module: Optimize search_module_extables() modules: mark __inittest/__exittest as __maybe_unused livepatch/module: print notice of TAINT_LIVEPATCH module: Drop redundant declaration of struct module
2017-02-22	mm: fix <linux/pagemap.h> stray kernel-doc notation	Randy Dunlap	1	-1/+0
	Delete stray (second) function description in find_lock_page() kernel-doc notation. Note: scripts/kernel-doc just ignores the second function description. Fixes: 2457aec63745e ("mm: non-atomically mark page accessed during page cache allocation where possible") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Reported-by: Matthew Wilcox <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22	mm: drop unused argument of zap_page_range()	Kirill A. Shutemov	1	-1/+1
	There's no users of zap_page_range() who wants non-NULL 'details'. Let's drop it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22	mm: drop zap_details::check_swap_entries	Kirill A. Shutemov	1	-1/+0
	detail == NULL would give the same functionality as .check_swap_entries==true. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22	mm: drop zap_details::ignore_dirty	Kirill A. Shutemov	1	-1/+0
	The only user of ignore_dirty is oom-reaper. But it doesn't really use it. ignore_dirty only has effect on file pages mapped with dirty pte. But oom-repear skips shared VMAs, so there's no way we can dirty file pte in them. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kirill A. Shutemov <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Rik van Riel <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22	lib/show_mem.c: teach show_mem to work with the given nodemask	Michal Hocko	1	-3/+2
	show_mem() allows to filter out node specific data which is irrelevant to the allocation request via SHOW_MEM_FILTER_NODES. The filtering is done in skip_free_areas_node which skips all nodes which are not in the mems_allowed of the current process. This works most of the time as expected because the nodemask shouldn't be outside of the allocating task but there are some exceptions. E.g. memory hotplug might want to request allocations from outside of the allowed nodes (see new_node_page). Get rid of this hardcoded behavior and push the allocation mask down the show_mem path and use it instead of cpuset_current_mems_allowed. NULL nodemask is interpreted as cpuset_current_mems_allowed. [[email protected]: coding-style fixes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22	mm, page_alloc: warn_alloc print nodemask	Michal Hocko	1	-2/+2
	warn_alloc is currently used for to report an allocation failure or an allocation stall. We print some details of the allocation request like the gfp mask and the request order. We do not print the allocation nodemask which is important when debugging the reason for the allocation failure as well. We alreaddy print the nodemask in the OOM report. Add nodemask to warn_alloc and print it in warn_alloc as well. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22	mm, vmscan: cleanup lru size claculations	Michal Hocko	1	-1/+1
	lruvec_lru_size returns the full size of the LRU list while we sometimes need a value reduced only to eligible zones (e.g. for lowmem requests). inactive_list_is_low is one such user. Later patches will add more of them. Add a new parameter to lruvec_lru_size and allow it filter out zones which are not eligible for the given context. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Hillf Danton <[email protected]> Acked-by: Minchan Kim <[email protected]> Acked-by: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22	mm, thp: add new defer+madvise defrag option	David Rientjes	1	-0/+1
	There is no thp defrag option that currently allows MADV_HUGEPAGE regions to do direct compaction and reclaim while all other thp allocations simply trigger kswapd and kcompactd in the background and fail immediately. The "defer" setting simply triggers background reclaim and compaction for all regions, regardless of MADV_HUGEPAGE, which makes it unusable for our userspace where MADV_HUGEPAGE is being used to indicate the application is willing to wait for work for thp memory to be available. The "madvise" setting will do direct compaction and reclaim for these MADV_HUGEPAGE regions, but does not trigger kswapd and kcompactd in the background for anybody else. For reasonable usage, there needs to be a mesh between the two options. This patch introduces a fifth mode, "defer+madvise", that will do direct reclaim and compaction for MADV_HUGEPAGE regions and trigger background reclaim and compaction for everybody else so that hugepages may be available in the near future. A proposal to allow direct reclaim and compaction for MADV_HUGEPAGE regions as part of the "defer" mode, making it a very powerful setting and avoids breaking userspace, was offered: http://marc.info/?t=148236612700003 This additional mode is a compromise. A second proposal to allow both "defer" and "madvise" to be selected at the same time was also offered: http://marc.info/?t=148357345300001. This is possible, but there was a concern that it might break existing userspaces the parse the output of the defrag mode, so the fifth option was introduced instead. This patch also cleans up the helper function for storing to "enabled" and "defrag" since the former supports three modes while the latter supports five and triple_flag_store() was getting unnecessarily messy. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Rientjes <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: "Kirill A. Shutemov" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22	mm/swap: skip readahead only when swap slot cache is enabled	Huang Ying	1	-0/+2
	Because during swap off, a swap entry may have swap_map[] == SWAP_HAS_CACHE (for example, just allocated). If we return NULL in __read_swap_cache_async(), the swap off will abort. So when swap slot cache is disabled, (for swap off), we will wait for page to be put into swap cache in such race condition. This should not be a problem for swap slot cache, because swap slot cache should be drained after clearing swap_slot_cache_enabled. [[email protected]: fix memory leak in __read_swap_cache_async()] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/5e2c5f6abe8e6eb0797408897b1bba80938e9b9d.1484082593.git.tim.c.chen@linux.intel.com Signed-off-by: "Huang, Ying" <[email protected]> Signed-off-by: Tim Chen <[email protected]> Cc: Aaron Lu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Huang Ying <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> escreveu: Cc: Kirill A. Shutemov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22	mm/swap: add cache for swap slots allocation	Tim Chen	2	-0/+32
	We add per cpu caches for swap slots that can be allocated and freed quickly without the need to touch the swap info lock. Two separate caches are maintained for swap slots allocated and swap slots returned. This is to allow the swap slots to be returned to the global pool in a batch so they will have a chance to be coaelesced with other slots in a cluster. We do not reuse the slots that are returned right away, as it may increase fragmentation of the slots. The swap allocation cache is protected by a mutex as we may sleep when searching for empty slots in cache. The swap free cache is protected by a spin lock as we cannot sleep in the free path. We refill the swap slots cache when we run out of slots, and we disable the swap slots cache and drain the slots if the global number of slots fall below a low watermark threshold. We re-enable the cache agian when the slots available are above a high watermark. [[email protected]: use raw_cpu_ptr over this_cpu_ptr for swap slots access] [[email protected]: add comments on locks in swap_slots.h] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/35de301a4eaa8daa2977de6e987f2c154385eb66.1484082593.git.tim.c.chen@linux.intel.com Signed-off-by: Tim Chen <[email protected]> Signed-off-by: "Huang, Ying" <[email protected]> Reviewed-by: Michal Hocko <[email protected]> Cc: Aaron Lu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Huang Ying <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> escreveu: Cc: Kirill A. Shutemov <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22	mm/swap: free swap slots in batch	Tim Chen	1	-0/+1
	Add new functions that free unused swap slots in batches without the need to reacquire swap info lock. This improves scalability and reduce lock contention. Link: http://lkml.kernel.org/r/c25e0fcdfd237ec4ca7db91631d3b9f6ed23824e.1484082593.git.tim.c.chen@linux.intel.com Signed-off-by: Tim Chen <[email protected]> Signed-off-by: "Huang, Ying" <[email protected]> Cc: Aaron Lu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Huang Ying <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> escreveu: Cc: Kirill A. Shutemov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-02-22	mm/swap: allocate swap slots in batches	Tim Chen	1	-0/+2
	Currently, the swap slots are allocated one page at a time, causing contention to the swap_info lock protecting the swap partition on every page being swapped. This patch adds new functions get_swap_pages and scan_swap_map_slots to request multiple swap slots at once. This will reduces the lock contention on the swap_info lock. Also scan_swap_map_slots can operate more efficiently as swap slots often occurs in clusters close to each other on a swap device and it is quicker to allocate them together. Link: http://lkml.kernel.org/r/9fec2845544371f62c3763d43510045e33d286a6.1484082593.git.tim.c.chen@linux.intel.com Signed-off-by: Tim Chen <[email protected]> Signed-off-by: "Huang, Ying" <[email protected]> Cc: Aaron Lu <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Huang Ying <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Jonathan Corbet <[email protected]> escreveu: Cc: Kirill A. Shutemov <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Shaohua Li <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>