aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-07-12Merge branch 'remotes/lorenzo/pci/endpoint'Bjorn Helgaas2-16/+22
- Complete PCI endpoint removal so a subsequent add doesn't fail with -EBUSY (Alan Mikhak) - Pay attention to PCI endpoint fixed-size BARs (Alan Mikhak) - Fix PCI endpoint handling of 64bit BARs (Alan Mikhak) - Clear PCI endpoint BARs before freeing space (Alan Mikhak) * remotes/lorenzo/pci/endpoint: PCI: endpoint: Clear BAR before freeing its space PCI: endpoint: Skip odd BAR when skipping 64bit BAR PCI: endpoint: Allocate enough space for fixed size BAR PCI: endpoint: Set endpoint controller pointer to NULL
2019-07-12Merge branch 'remotes/lorenzo/pci/xilinx'Bjorn Helgaas1-6/+5
- Fix Xilinx NWL multi-MSI vector aliasing issue (Bharat Kumar Gogada) * remotes/lorenzo/pci/xilinx: PCI: xilinx-nwl: Fix Multi MSI data programming
2019-07-12Merge branch 'remotes/lorenzo/pci/tegra'Bjorn Helgaas4-82/+519
- Reorganize Tegra AFI/PHY/REFCLK/etc functions (Manikanta Maddireddy) - Mask Tegra AFI_INTR in runtime suspend (Manikanta Maddireddy) - Fix Tegra AFI/PCIe powerup sequence (Manikanta Maddireddy) - Add Tegra124, Tegra132, Tegra210, and Tegra186 support for Gen2 link speed (Manikanta Maddireddy) - Advertise Tegra AER support (Manikanta Maddireddy) - Program Tegra210 UPHY settings (Manikanta Maddireddy) - Enable Tegra opportunistic UpdateFC and ACK (Manikanta Maddireddy) - Disable Tegra AFI dynamic clock gating (Manikanta Maddireddy) - Process Tegra pending DLL transactions before entering L1 or L2 to prevent receiver errors (Manikanta Maddireddy) - Enable Tegra xclk clock clamping in L1 (Manikanta Maddireddy) - Increase Tegra deskew retry time (Manikanta Maddireddy) - Work around Tegra hardware RAW erratum (Manikanta Maddireddy) - Update Tegra210 flow control timer frequency (Manikanta Maddireddy) - Work around Tegra Gen1/Gen2 link number negotiation issue (Manikanta Maddireddy) - Work around Tegra PLLE power down issue (Manikanta Maddireddy) - Program Tegra20 to support cacheable upstream transactions (Manikanta Maddireddy) - Log Tegra PRSNT_SENSE_IRQ as debug, not err (Manikanta Maddireddy) - Add register offset for third Root Port on Tegra186 and Tegra30 (Manikanta Maddireddy) - Document Tegra PCIe DPD pinctrl property (Manikanta Maddireddy) - Put Tegra PEX CLK & BIAS pads in DPD mode to reduce power usage when powergated (Manikanta Maddireddy) - Add generic DT binding for "reset-gpios" property (Manikanta Maddireddy) - Add Tegra support for GPIO-based PERST# (Manikanta Maddireddy) - Enable Relaxed Ordering only for Tegra20 & Tegra30 (Vidya Sagar) * remotes/lorenzo/pci/tegra: PCI: tegra: Enable Relaxed Ordering only for Tegra20 & Tegra30 PCI: tegra: Change link retry log level to debug PCI: tegra: Add support for GPIO based PERST# PCI: Add DT binding for "reset-gpios" property PCI: tegra: Put PEX CLK & BIAS pads in DPD mode dt-bindings: pci: tegra: Document PCIe DPD pinctrl optional prop PCI: tegra: Add AFI_PEX2_CTRL reg offset as part of SoC struct PCI: tegra: Change PRSNT_SENSE IRQ log to debug PCI: tegra: Program AFI_CACHE_BAR_{0,1}_{ST,SZ} registers only for Tegra20 PCI: tegra: Fix PLLE power down issue due to CLKREQ# signal PCI: tegra: Set target speed as Gen1 before starting LTSSM PCI: tegra: Update flow control timer frequency in Tegra210 PCI: tegra: Add SW fixup for RAW violations PCI: tegra: Increase the deskew retry time PCI: tegra: Enable PCIe xclk clock clamping PCI: tegra: Process pending DLL transactions before entering L1 or L2 PCI: tegra: Disable AFI dynamic clock gating PCI: tegra: Enable opportunistic UpdateFC and ACK PCI: tegra: Program UPHY electrical settings for Tegra210 PCI: tegra: Advertise PCIe Advanced Error Reporting (AER) capability PCI: tegra: Add PCIe Gen2 link speed support PCI: tegra: Fix PCIe host power up sequence PCI: tegra: Mask AFI_INTR in runtime suspend PCI: tegra: Rearrange Tegra PCIe driver functions PCI: tegra: Handle failure cases in tegra_pcie_power_on() soc/tegra: pmc: Export tegra_powergate_power_on()
2019-07-12Merge branch 'remotes/lorenzo/pci/rcar'Bjorn Helgaas1-0/+1
- Add R-Car device tree support for r8a774a1 (Biju Das) * remotes/lorenzo/pci/rcar: dt-bindings: PCI: rcar: Add device tree support for r8a774a1
2019-07-12Merge branch 'remotes/lorenzo/pci/qcom'Bjorn Helgaas2-63/+77
- Move qcom driver to bulk clock API (Bjorn Andersson) - Add Qualcomm QCS404 PCIe controller support (Bjorn Andersson) - Ensure Qualcomm PERST is asserted for at least 100ms (Niklas Cassel) * remotes/lorenzo/pci/qcom: PCI: qcom: Ensure that PERST is asserted for at least 100 ms PCI: qcom: Add QCS404 PCIe controller support dt-bindings: PCI: qcom: Add QCS404 to the binding PCI: qcom: Use clk bulk API for 2.4.0 controllers
2019-07-12Merge branch 'remotes/lorenzo/pci/mobiveil'Bjorn Helgaas2-213/+314
- Unify mobiveil register accessors (Hou Zhiqiang) - Remove MSI_FLAG_MULTI_PCI_MSI since mobiveil hardware doesn't support Multiple MSI (Hou Zhiqiang) - Program outbound windows with base address from DT instead of assuming zero (Hou Zhiqiang) - Skip "safe" list traversal when it's unnecessary (Hou Zhiqiang) - Initialize WIN_NUM_0 explicitly for CFG outbound transactions (Hou Zhiqiang) - Use WIN_NUM_0 for MEM inbound transactions (Hou Zhiqiang) - Fix up mobiveil Class Code to PCI_CLASS_BRIDGE_PCI (Hou Zhiqiang) - Wait for link-up before enumerating devices, not while initializing host (Hou Zhiqiang) - Move IRQ chained handler setup out of DT code (Hou Zhiqiang) - Set primary/secondary/subordinate bus numbers (Hou Zhiqiang) - Fix "valid device" check to allow root bus device 0 to be multi-function (Hou Zhiqiang) - Make DT "gpio_slave" and "apb_csr" properties optional (Hou Zhiqiang) - Refactor MEM/IO outbound window initialization (Hou Zhiqiang) - Fix validity check for inbound/outbound window programming (Hou Zhiqiang) - Initialize and preserve window control bits (Hou Zhiqiang) - Fix 64-bit outbound window setup (both CPU and PCI addresses) (Hou Zhiqiang) - Move IO port setup to host init (Hou Zhiqiang) - Fix infinite loop in INTx ISR (Hou Zhiqiang) - Fix INTx interrupt clearing to avoid missed interrupts (Hou Zhiqiang) * remotes/lorenzo/pci/mobiveil: PCI: mobiveil: Fix INTx interrupt clearing in mobiveil_pcie_isr() PCI: mobiveil: Fix infinite-loop in the INTx handling function PCI: mobiveil: Move PCIe PIO enablement out of inbound window routine PCI: mobiveil: Add upper 32-bit PCI base address setup in inbound window PCI: mobiveil: Add upper 32-bit CPU base address setup in outbound window PCI: mobiveil: Mask out hardcoded bits in inbound/outbound windows setup PCI: mobiveil: Clear the control fields before updating it PCI: mobiveil: Add configured inbound windows counter PCI: mobiveil: Fix the valid check for inbound and outbound windows PCI: mobiveil: Clean-up program_{ib/ob}_windows() PCI: mobiveil: Remove an unnecessary return value check PCI: mobiveil: Fix error return values PCI: mobiveil: Refactor the MEM/IO outbound window initialization PCI: mobiveil: Make some register updates more readable PCI: mobiveil: Reformat the code for readability dt-bindings: PCI: mobiveil: Change gpio_slave and apb_csr to optional PCI: mobiveil: Fix devfn check in mobiveil_pcie_valid_device() PCI: mobiveil: Initialize Primary/Secondary/Subordinate bus numbers PCI: mobiveil: Move IRQ chained handler setup out of DT parse PCI: mobiveil: Move the link up waiting out of mobiveil_host_init() PCI: mobiveil: Fix the Class Code field PCI: mobiveil: Use the 1st inbound window for MEM inbound transactions PCI: mobiveil: Use WIN_NUM_0 explicitly for CFG outbound window PCI: mobiveil: Update the resource list traversal function PCI: mobiveil: Fix PCI base address in MEM/IO outbound windows PCI: mobiveil: Remove the flag MSI_FLAG_MULTI_PCI_MSI PCI: mobiveil: Unify register accessors
2019-07-12Merge branch 'remotes/lorenzo/pci/hv'Bjorn Helgaas1-6/+9
- Fix Hyper-V use-after-free in eject path (Dexuan Cui) * remotes/lorenzo/pci/hv: PCI: hv: Fix a use-after-free bug in hv_eject_device_work()
2019-07-12Merge branch 'remotes/lorenzo/pci/dwc'Bjorn Helgaas5-35/+80
- Add dwc API support to de-initialize host (Vidya Sagar) - Clean up dwc DBI,ATU read and write APIs (Vidya Sagar) - Export dwc APIs to support .remove() so drivers can be modular (Vidya Sagar) - Simplify imx6 Kconfig dependencies (Leonard Crestez) - Fix dra7xx build error when !CONFIG_GPIOLIB (YueHaibing) * remotes/lorenzo/pci/dwc: PCI: dwc: pci-dra7xx: Fix compilation when !CONFIG_GPIOLIB PCI: imx6: Simplify Kconfig depends on PCI: dwc: Export APIs to support .remove() implementation PCI: dwc: Cleanup DBI,ATU read and write APIs PCI: dwc: Add API support to de-initialize host
2019-07-12Merge branch 'remotes/lorenzo/pci/armada'Bjorn Helgaas1-1/+81
- Add Armada8k PHYs support (Miquel Raynal) * remotes/lorenzo/pci/armada: PCI: armada8k: Add PHYs support
2019-07-12Merge branch 'remotes/lorenzo/pci/altera'Bjorn Helgaas3-18/+65
- Allow building Altera host bridge driver as a module (Ley Foon Tan) - Fix Altera Stratix 10 Type 1 to Type 0 config access conversion (Ley Foon Tan) * remotes/lorenzo/pci/altera: PCI: altera: Fix configuration type based on secondary number PCI: altera-msi: Allow building as module PCI: altera: Allow building as module
2019-07-12Merge branch 'pci/virtualization'Bjorn Helgaas3-17/+12
- Fix problem with caching VF config space size (Alex Williamson) * pci/virtualization: PCI/IOV: Assume SR-IOV VFs support extended config space. Revert "PCI/IOV: Use VF0 cached config space size for other VFs"
2019-07-12Merge branch 'pci/resource'Bjorn Helgaas5-32/+62
- Evaluate ACPI "PCI Boot Configuration"_DSM (Benjamin Herrenschmidt) - Don't auto-realloc if we're preserving firmware config (Benjamin Herrenschmidt) - Allow arm64 to reallocate resources if necessary (Benjamin Herrenschmidt) - Preserve firmware config on arm64 when desired (Benjamin Herrenschmidt) - Simplify resource distribution to hotplug bridges (Nicholas Johnson) * pci/resource: PCI: Skip resource distribution when no hotplug bridges PCI: Simplify pci_bus_distribute_available_resources() arm64: PCI: Preserve firmware configuration when desired arm64: PCI: Allow resource reallocation if necessary PCI: Don't auto-realloc if we're preserving firmware config PCI/ACPI: Evaluate PCI Boot Configuration _DSM
2019-07-12Merge branch 'pci/peer-to-peer'Bjorn Helgaas1-1/+9
- Prevent drivers that use dma_virt_ops from using peer-to-peer DMA (Logan Gunthorpe) * pci/peer-to-peer: PCI/P2PDMA: Fix missing check for dma_virt_ops
2019-07-12Merge branch 'pci/misc'Bjorn Helgaas3-21/+92
- Generalize multi-function power dependency device links (Abhishek Sahu) - Add NVIDIA GPU multi-function power dependencies (Abhishek Sahu) - Optimize /proc/bus/pci/devices by using seq_puts() instead of seq_printf() (Markus Elfring) - Enable NVIDIA HDA controllers if BIOS didn't (Lukas Wunner) * pci/misc: PCI: Enable NVIDIA HDA controllers PCI: Use seq_puts() instead of seq_printf() in show_device() PCI: Add NVIDIA GPU multi-function power dependencies PCI: Generalize multi-function power dependency device links
2019-07-12Merge branch 'pci/enumeration'Bjorn Helgaas7-10/+23
- If user prevents VF probing, return error instead of pretending a driver has claimed the VF (Alex Williamson) - Always allow probing with driver_override (Alex Williamson) - Decode PCIe 32 GT/s link speed (Gustavo Pimentel) - Ignore lockdep for sysfs remove to avoid lockdep false positive (Marek Vasut) * pci/enumeration: PCI: sysfs: Ignore lockdep for remove attribute PCI: Decode PCIe 32 GT/s link speed PCI: Always allow probing with driver_override PCI: Return error if cannot probe VF
2019-07-12Merge branch 'pci/docs'Bjorn Helgaas66-2426/+3084
- Convert docs to reST (Changbin Du) - Convert PM docs to reST (Mauro Carvalho Chehab) * pci/docs: docs: power: convert docs to ReST and rename to *.rst Documentation: PCI: convert endpoint/pci-test-howto.txt to reST Documentation: PCI: convert endpoint/pci-test-function.txt to reST Documentation: PCI: convert endpoint/pci-endpoint-cfs.txt to reST Documentation: PCI: convert endpoint/pci-endpoint.txt to reST Documentation: PCI: convert pcieaer-howto.txt to reST Documentation: PCI: convert pci-error-recovery.txt to reST Documentation: PCI: convert acpi-info.txt to reST Documentation: PCI: convert MSI-HOWTO.txt to reST Documentation: PCI: convert pci-iov-howto.txt to reST Documentation: PCI: convert PCIEBUS-HOWTO.txt to reST Documentation: PCI: convert pci.txt to reST Documentation: add Linux PCI to Sphinx TOC tree
2019-07-12coresight: Make the coresight_device_fwnode_match declaration's fwnode ↵Nathan Chancellor1-1/+1
parameter const Fix Linus' merge error in the parent commit, causing: drivers/hwtracing/coresight/coresight.c:1051:11: error: incompatible pointer types passing 'int (struct device *, void *)' to parameter of type 'int (*)(struct device *, const void *)' [-Werror,-Wincompatible-pointer-types] coresight_device_fwnode_match); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~ include/linux/device.h:173:17: note: passing argument to parameter 'match' here int (*match)(struct device *dev, const void *data)); ^ due to missed header file fixup. Fixes: f632a8170a6b ("Merge tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core") Signed-off-by: Nathan Chancellor <[email protected]> [ Greg even sent this patch with his pull request, but I stupidly thought it was the merge resolution fix I had already done as part of the merge. But no, this was the extra fix for the header file that goes with the definition I _had_ caught - Linus ] Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12Revert "NFS: readdirplus optimization by cache mechanism" (memleak)Max Kellermann2-86/+7
This reverts commit be4c2d4723a4a637f0d1b4f7c66447141a4b3564. That commit caused a severe memory leak in nfs_readdir_make_qstr(). When listing a directory with more than 100 files (this is how many struct nfs_cache_array_entry elements fit in one 4kB page), all allocated file name strings past those 100 leak. The root of the leakage is that those string pointers are managed in pages which are never linked into the page cache. fs/nfs/dir.c puts pages into the page cache by calling read_cache_page(); the callback function nfs_readdir_filler() will then fill the given page struct which was passed to it, which is already linked in the page cache (by do_read_cache_page() calling add_to_page_cache_lru()). Commit be4c2d4723a4 added another (local) array of allocated pages, to be filled with more data, instead of discarding excess items received from the NFS server. Those additional pages can be used by the next nfs_readdir_filler() call (from within the same nfs_readdir() call). The leak happens when some of those additional pages are never used (copied to the page cache using copy_highpage()). The pages will be freed by nfs_readdir_free_pages(), but their contents will not. The commit did not invoke nfs_readdir_clear_array() (and doing so would have been dangerous, because it did not track which of those pages were already copied to the page cache, risking double free bugs). How to reproduce the leak: - Use a kernel with CONFIG_SLUB_DEBUG_ON. - Create a directory on a NFS mount with more than 100 files with names long enough to use the "kmalloc-32" slab (so we can easily look up the allocation counts): for i in `seq 110`; do touch ${i}_0123456789abcdef; done - Drop all caches: echo 3 >/proc/sys/vm/drop_caches - Check the allocation counter: grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls 30564391 nfs_readdir_add_to_array+0x73/0xd0 age=534558/4791307/6540952 pid=370-1048386 cpus=0-47 nodes=0-1 - Request a directory listing and check the allocation counters again: ls [...] grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls 30564511 nfs_readdir_add_to_array+0x73/0xd0 age=207/4792999/6542663 pid=370-1048386 cpus=0-47 nodes=0-1 There are now 120 new allocations. - Drop all caches and check the counters again: echo 3 >/proc/sys/vm/drop_caches grep nfs_readdir /sys/kernel/slab/kmalloc-32/alloc_calls 30564401 nfs_readdir_add_to_array+0x73/0xd0 age=735/4793524/6543176 pid=370-1048386 cpus=0-47 nodes=0-1 110 allocations are gone, but 10 have leaked and will never be freed. Unhelpfully, those allocations are explicitly excluded from KMEMLEAK, that's why my initial attempts with KMEMLEAK were not successful: /* * Avoid a kmemleak false positive. The pointer to the name is stored * in a page cache page which kmemleak does not scan. */ kmemleak_not_leak(string->name); It would be possible to solve this bug without reverting the whole commit: - keep track of which pages were not used, and call nfs_readdir_clear_array() on them, or - manually link those pages into the page cache But for now I have decided to just revert the commit, because the real fix would require complex considerations, risking more dangerous (crash) bugs, which may seem unsuitable for the stable branches. Signed-off-by: Max Kellermann <[email protected]> Cc: [email protected] # v5.1+ Signed-off-by: Trond Myklebust <[email protected]>
2019-07-12Merge tag 'driver-core-5.3-rc1' of ↵Linus Torvalds201-2114/+1939
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core and debugfs updates from Greg KH: "Here is the "big" driver core and debugfs changes for 5.3-rc1 It's a lot of different patches, all across the tree due to some api changes and lots of debugfs cleanups. Other than the debugfs cleanups, in this set of changes we have: - bus iteration function cleanups - scripts/get_abi.pl tool to display and parse Documentation/ABI entries in a simple way - cleanups to Documenatation/ABI/ entries to make them parse easier due to typos and other minor things - default_attrs use for some ktype users - driver model documentation file conversions to .rst - compressed firmware file loading - deferred probe fixes All of these have been in linux-next for a while, with a bunch of merge issues that Stephen has been patient with me for" * tag 'driver-core-5.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (102 commits) debugfs: make error message a bit more verbose orangefs: fix build warning from debugfs cleanup patch ubifs: fix build warning after debugfs cleanup patch driver: core: Allow subsystems to continue deferring probe drivers: base: cacheinfo: Ensure cpu hotplug work is done before Intel RDT arch_topology: Remove error messages on out-of-memory conditions lib: notifier-error-inject: no need to check return value of debugfs_create functions swiotlb: no need to check return value of debugfs_create functions ceph: no need to check return value of debugfs_create functions sunrpc: no need to check return value of debugfs_create functions ubifs: no need to check return value of debugfs_create functions orangefs: no need to check return value of debugfs_create functions nfsd: no need to check return value of debugfs_create functions lib: 842: no need to check return value of debugfs_create functions debugfs: provide pr_fmt() macro debugfs: log errors when something goes wrong drivers: s390/cio: Fix compilation warning about const qualifiers drivers: Add generic helper to match by of_node driver_find_device: Unify the match function with class_find_device() bus_find_device: Unify the match callback with class_find_device ...
2019-07-12Merge branch 'akpm' (patches from Andrew)Linus Torvalds173-3381/+3586
Merge updates from Andrew Morton: "Am experimenting with splitting MM up into identifiable subsystems perhaps with a view to gitifying it in complex ways. Also with more verbose "incoming" emails. Most of MM is here and a few other trees. Subsystems affected by this patch series: - hotfixes - iommu - scripts - arch/sh - ocfs2 - mm:slab-generic - mm:slub - mm:kmemleak - mm:kasan - mm:cleanups - mm:debug - mm:pagecache - mm:swap - mm:memcg - mm:gup - mm:pagemap - mm:infrastructure - mm:vmalloc - mm:initialization - mm:pagealloc - mm:vmscan - mm:tools - mm:proc - mm:ras - mm:oom-kill hotfixes: mm: vmscan: scan anonymous pages on file refaults mm/nvdimm: add is_ioremap_addr and use that to check ioremap address mm/memcontrol: fix wrong statistics in memory.stat mm/z3fold.c: lock z3fold page before __SetPageMovable() nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi header MAINTAINERS: nilfs2: update email address iommu: include/linux/dmar.h: replace single-char identifiers in macros scripts: scripts/decode_stacktrace: match basepath using shell prefix operator, not regex scripts/decode_stacktrace: look for modules with .ko.debug extension scripts/spelling.txt: drop "sepc" from the misspelling list scripts/spelling.txt: add spelling fix for prohibited scripts/decode_stacktrace: Accept dash/underscore in modules scripts/spelling.txt: add more spellings to spelling.txt arch/sh: arch/sh/configs/sdk7786_defconfig: remove CONFIG_LOGFS sh: config: remove left-over BACKLIGHT_LCD_SUPPORT sh: prevent warnings when using iounmap ocfs2: fs: ocfs: fix spelling mistake "hearbeating" -> "heartbeat" ocfs2/dlm: use struct_size() helper ocfs2: add last unlock times in locking_state ocfs2: add locking filter debugfs file ocfs2: add first lock wait time in locking_state ocfs: no need to check return value of debugfs_create functions fs/ocfs2/dlmglue.c: unneeded variable: "status" ocfs2: use kmemdup rather than duplicating its implementation mm:slab-generic: Patch series "mm/slab: Improved sanity checking": mm/slab: validate cache membership under freelist hardening mm/slab: sanity-check page type when looking up cache lkdtm/heap: add tests for freelist hardening mm:slub: mm/slub.c: avoid double string traverse in kmem_cache_flags() slub: don't panic for memcg kmem cache creation failure mm:kmemleak: mm/kmemleak.c: fix check for softirq context mm/kmemleak.c: change error at _write when kmemleak is disabled docs: kmemleak: add more documentation details mm:kasan: mm/kasan: print frame description for stack bugs Patch series "Bitops instrumentation for KASAN", v5: lib/test_kasan: add bitops tests x86: use static_cpu_has in uaccess region to avoid instrumentation asm-generic, x86: add bitops instrumentation for KASAN Patch series "mm/kasan: Add object validation in ksize()", v3: mm/kasan: introduce __kasan_check_{read,write} mm/kasan: change kasan_check_{read,write} to return boolean lib/test_kasan: Add test for double-kzfree detection mm/slab: refactor common ksize KASAN logic into slab_common.c mm/kasan: add object validation in ksize() mm:cleanups: include/linux/pfn_t.h: remove pfn_t_to_virt() Patch series "remove ARCH_SELECT_MEMORY_MODEL where it has no effect": arm: remove ARCH_SELECT_MEMORY_MODEL s390: remove ARCH_SELECT_MEMORY_MODEL sparc: remove ARCH_SELECT_MEMORY_MODEL mm/gup.c: make follow_page_mask() static mm/memory.c: trivial clean up in insert_page() mm: make !CONFIG_HUGE_PAGE wrappers into static inlines include/linux/mm_types.h: ifdef struct vm_area_struct::swap_readahead_info mm: remove the account_page_dirtied export mm/page_isolation.c: change the prototype of undo_isolate_page_range() include/linux/vmpressure.h: use spinlock_t instead of struct spinlock mm: remove the exporting of totalram_pages include/linux/pagemap.h: document trylock_page() return value mm:debug: mm/failslab.c: by default, do not fail allocations with direct reclaim only Patch series "debug_pagealloc improvements": mm, debug_pagelloc: use static keys to enable debugging mm, page_alloc: more extensive free page checking with debug_pagealloc mm, debug_pagealloc: use a page type instead of page_ext flag mm:pagecache: Patch series "fix filler_t callback type mismatches", v2: mm/filemap.c: fix an overly long line in read_cache_page mm/filemap: don't cast ->readpage to filler_t for do_read_cache_page jffs2: pass the correct prototype to read_cache_page 9p: pass the correct prototype to read_cache_page mm/filemap.c: correct the comment about VM_FAULT_RETRY mm:swap: mm, swap: fix race between swapoff and some swap operations mm/swap_state.c: simplify total_swapcache_pages() with get_swap_device() mm, swap: use rbtree for swap_extent mm/mincore.c: fix race between swapoff and mincore mm:memcg: memcg, oom: no oom-kill for __GFP_RETRY_MAYFAIL memcg, fsnotify: no oom-kill for remote memcg charging mm, memcg: introduce memory.events.local mm: memcontrol: dump memory.stat during cgroup OOM Patch series "mm: reparent slab memory on cgroup removal", v7: mm: memcg/slab: postpone kmem_cache memcg pointer initialization to memcg_link_cache() mm: memcg/slab: rename slab delayed deactivation functions and fields mm: memcg/slab: generalize postponed non-root kmem_cache deactivation mm: memcg/slab: introduce __memcg_kmem_uncharge_memcg() mm: memcg/slab: unify SLAB and SLUB page accounting mm: memcg/slab: don't check the dying flag on kmem_cache creation mm: memcg/slab: synchronize access to kmem_cache dying flag using a spinlock mm: memcg/slab: rework non-root kmem_cache lifecycle management mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages mm: memcg/slab: reparent memcg kmem_caches on cgroup removal mm, memcg: add a memcg_slabinfo debugfs file mm:gup: Patch series "switch the remaining architectures to use generic GUP", v4: mm: use untagged_addr() for get_user_pages_fast addresses mm: simplify gup_fast_permitted mm: lift the x86_32 PAE version of gup_get_pte to common code MIPS: use the generic get_user_pages_fast code sh: add the missing pud_page definition sh: use the generic get_user_pages_fast code sparc64: add the missing pgd_page definition sparc64: define untagged_addr() sparc64: use the generic get_user_pages_fast code mm: rename CONFIG_HAVE_GENERIC_GUP to CONFIG_HAVE_FAST_GUP mm: reorder code blocks in gup.c mm: consolidate the get_user_pages* implementations mm: validate get_user_pages_fast flags mm: move the powerpc hugepd code to mm/gup.c mm: switch gup_hugepte to use try_get_compound_head mm: mark the page referenced in gup_hugepte mm/gup: speed up check_and_migrate_cma_pages() on huge page mm/gup.c: remove some BUG_ONs from get_gate_page() mm/gup.c: mark undo_dev_pagemap as __maybe_unused mm:pagemap: asm-generic, x86: introduce generic pte_{alloc,free}_one[_kernel] alpha: switch to generic version of pte allocation arm: switch to generic version of pte allocation arm64: switch to generic version of pte allocation csky: switch to generic version of pte allocation m68k: sun3: switch to generic version of pte allocation mips: switch to generic version of pte allocation nds32: switch to generic version of pte allocation nios2: switch to generic version of pte allocation parisc: switch to generic version of pte allocation riscv: switch to generic version of pte allocation um: switch to generic version of pte allocation unicore32: switch to generic version of pte allocation mm/pgtable: drop pgtable_t variable from pte_fn_t functions mm/memory.c: fail when offset == num in first check of __vm_map_pages() mm:infrastructure: mm/mmu_notifier: use hlist_add_head_rcu() mm:vmalloc: Patch series "Some cleanups for the KVA/vmalloc", v5: mm/vmalloc.c: remove "node" argument mm/vmalloc.c: preload a CPU with one object for split purpose mm/vmalloc.c: get rid of one single unlink_va() when merge mm/vmalloc.c: switch to WARN_ON() and move it under unlink_va() mm/vmalloc.c: spelling> s/informaion/information/ mm:initialization: mm/large system hash: use vmalloc for size > MAX_ORDER when !hashdist mm/large system hash: clear hashdist when only one node with memory is booted mm:pagealloc: arm64: move jump_label_init() before parse_early_param() Patch series "add init_on_alloc/init_on_free boot options", v10: mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options mm: init: report memory auto-initialization features at boot time mm:vmscan: mm: vmscan: remove double slab pressure by inc'ing sc->nr_scanned mm: vmscan: correct some vmscan counters for THP swapout mm:tools: tools/vm/slabinfo: order command line options tools/vm/slabinfo: add partial slab listing to -X tools/vm/slabinfo: add option to sort by partial slabs tools/vm/slabinfo: add sorting info to help menu mm:proc: proc: use down_read_killable mmap_sem for /proc/pid/maps proc: use down_read_killable mmap_sem for /proc/pid/smaps_rollup proc: use down_read_killable mmap_sem for /proc/pid/pagemap proc: use down_read_killable mmap_sem for /proc/pid/clear_refs proc: use down_read_killable mmap_sem for /proc/pid/map_files mm: use down_read_killable for locking mmap_sem in access_remote_vm mm: smaps: split PSS into components mm: vmalloc: show number of vmalloc pages in /proc/meminfo mm:ras: mm/memory-failure.c: clarify error message mm:oom-kill: mm: memcontrol: use CSS_TASK_ITER_PROCS at mem_cgroup_scan_tasks() mm, oom: refactor dump_tasks for memcg OOMs mm, oom: remove redundant task_in_mem_cgroup() check oom: decouple mems_allowed from oom_unkillable_task mm/oom_kill.c: remove redundant OOM score normalization in select_bad_process()" * akpm: (147 commits) mm/oom_kill.c: remove redundant OOM score normalization in select_bad_process() oom: decouple mems_allowed from oom_unkillable_task mm, oom: remove redundant task_in_mem_cgroup() check mm, oom: refactor dump_tasks for memcg OOMs mm: memcontrol: use CSS_TASK_ITER_PROCS at mem_cgroup_scan_tasks() mm/memory-failure.c: clarify error message mm: vmalloc: show number of vmalloc pages in /proc/meminfo mm: smaps: split PSS into components mm: use down_read_killable for locking mmap_sem in access_remote_vm proc: use down_read_killable mmap_sem for /proc/pid/map_files proc: use down_read_killable mmap_sem for /proc/pid/clear_refs proc: use down_read_killable mmap_sem for /proc/pid/pagemap proc: use down_read_killable mmap_sem for /proc/pid/smaps_rollup proc: use down_read_killable mmap_sem for /proc/pid/maps tools/vm/slabinfo: add sorting info to help menu tools/vm/slabinfo: add option to sort by partial slabs tools/vm/slabinfo: add partial slab listing to -X tools/vm/slabinfo: order command line options mm: vmscan: correct some vmscan counters for THP swapout mm: vmscan: remove double slab pressure by inc'ing sc->nr_scanned ...
2019-07-12net/mlx5e: Convert single case statement switch statements into if statementsNathan Chancellor1-23/+11
During the review of commit 1ff2f0fa450e ("net/mlx5e: Return in default case statement in tx_post_resync_params"), Leon and Nick pointed out that the switch statements can be converted to single if statements that return early so that the code is easier to follow. Suggested-by: Leon Romanovsky <[email protected]> Suggested-by: Nick Desaulniers <[email protected]> Signed-off-by: Nathan Chancellor <[email protected]> Reviewed-by: Leon Romanovsky <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2019-07-12Merge branches 'clk-bcm63xx', 'clk-silabs', 'clk-lochnagar' and ↵Stephen Boyd21-175/+1997
'clk-rockchip' into clk-next - Support gated clk controller on MIPS based BCM63XX SoCs - Small frequency support for SiLabs Si544 chips - Support SiLabs Si5341 and Si5340 chips * clk-bcm63xx: clk: add BCM63XX gated clock controller driver devicetree: document the BCM63XX gated clock bindings * clk-silabs: clk: Add Si5341/Si5340 driver dt-bindings: clock: Add silabs,si5341 clk: clk-si544: Implement small frequency change support * clk-lochnagar: clk: lochnagar: Update DT binding doc to include the primary SPDIF MCLK clk: lochnagar: Use new parent_data approach to register clock parents * clk-rockchip: clk: rockchip: export HDMIPHY clock on rk3228 clk: rockchip: add watchdog pclk on rk3328 clk: rockchip: add clock id for hdmi_phy special clock on rk3228 clk: rockchip: add clock id for watchdog pclk on rk3328 clk: rockchip: convert pclk_wdt boilerplat to new SGRF_GATE macro clk: rockchip: add a type from SGRF-controlled gate clocks clk: rockchip: Remove 48 MHz PLL rate from rk3288 clk: rockchip: add 1.464GHz cpu-clock rate to rk3228 clk: rockchip: Slightly more accurate math in rockchip_mmc_get_phase() clk: rockchip: Don't yell about bad mmc phases when getting clk: rockchip: Use clk_hw_get_rate() in MMC phase calculation
2019-07-12Merge branches 'clk-rpi-cpufreq', 'clk-tegra', 'clk-simplify-provider.h', ↵Stephen Boyd19-116/+592
'clk-sprd' and 'clk-at91' into clk-next - Support for CPU clks on Raspberry Pi devices - Slow clk support for AT91 SAM9X60 SoCs * clk-rpi-cpufreq: clk: raspberrypi: register platform device for raspberrypi-cpufreq firmware: raspberrypi: register clk device clk: bcm283x: add driver interfacing with Raspberry Pi's firmware clk: bcm2835: remove pllb * clk-tegra: clk: tegra: Do not enable PLL_RE_VCO on Tegra210 clk: tegra: Warn if an enabled PLL is in IDDQ clk: tegra: Do not warn unnecessarily clk: tegra210: fix PLLU and PLLU_OUT1 * clk-simplify-provider.h: clk: consoldiate the __clk_get_hw() declarations clk: Unexport __clk_of_table clk: Remove ifdef for COMMON_CLK in clk-provider.h * clk-sprd: clk: sprd: Add check for return value of sprd_clk_regmap_init() clk: sprd: Check error only for devm_regmap_init_mmio() clk: sprd: Switch from of_iomap() to devm_ioremap_resource() * clk-at91: clk: at91: sckc: use dedicated functions to unregister clock clk: at91: sckc: improve error path for sama5d4 sck registration clk: at91: sckc: remove unnecessary line clk: at91: sckc: improve error path for sam9x5 sck register clk: at91: sckc: add support to free slow clock osclillator clk: at91: sckc: add support to free slow rc oscillator clk: at91: sckc: add support to free slow oscillator clk: at91: sckc: add support for SAM9X60 dt-bindings: clk: at91: add bindings for SAM9X60's slow clock controller clk: at91: sckc: add support to specify registers bit offsets clk: at91: sckc: sama5d4 has no bypass support
2019-07-12Merge branches 'clk-debugfs', 'clk-unused', 'clk-refactor' and 'clk-qoriq' ↵Stephen Boyd9-194/+30
into clk-next - Add a 'clk_parent' file in clk debugfs - Remove dead code in various clk drivers * clk-debugfs: clk: Add clk_parent entry in debugfs * clk-unused: clk: qcom: Fix -Wunused-const-variable clk: mmp: frac: Remove set but not used variable 'prev_rate' clk: ti: Remove unused functions clk: mediatek: mt8516: Remove unused variable * clk-refactor: clk: clk-cdce706: simplify getting the adapter of a client clk: Simplify clk_core_can_round() * clk-qoriq: clk: qoriq: add support for lx2160a
2019-07-12Merge branches 'clk-bulk-optional', 'clk-kirkwood', 'clk-socfpga' and ↵Stephen Boyd9-12/+120
'clk-docs' into clk-next - Add a clk_bulk_get_optional() API (with devm too) - Support for Marvell 98DX1135 SoCs * clk-bulk-optional: clk: Document some devm_clk_bulk*() APIs clk: Add devm_clk_bulk_get_optional() function clk: Add clk_bulk_get_optional() function * clk-kirkwood: clk: kirkwood: Add support for MV98DX1135 dt-bindings: clock: mvebu: Add compatible string for 98dx1135 core clock * clk-socfpga: clk: socfpga: stratix10: fix divider entry for the emac clocks clk: socfpga: stratix10: add additional clocks needed for the NAND IP * clk-docs: clk: Grammar missing "and", Spelling s/statisfied/satisfied/
2019-07-12Merge branches 'clk-ti', 'clk-samsung', 'clk-imx' and 'clk-allwinner' into ↵Stephen Boyd52-2387/+3349
clk-next * clk-ti: clk: ti: Use int to check return value from of_property_count_elems_of_size() firmware: ti_sci: extend clock identifiers from u8 to u32 clk: keystone: sci-clk: extend clock IDs to 32 bits clk: keystone: sci-clk: probe clocks from DT instead of firmware clk: keystone: sci-clk: split out the fw clock parsing to own function clk: keystone: sci-clk: cut down the clock name length * clk-samsung: clk: samsung: Add bus clock for GPU/G3D on Exynos4412 clk: samsung: add new clocks for DMC for Exynos5422 SoC clk: samsung: add BPLL rate table for Exynos 5422 SoC clk: samsung: add needed IDs for DMC clocks in Exynos5420 clk: samsung: exynos5433: Use of_clk_get_parent_count() * clk-imx: (38 commits) clk: imx8mq: Keep uart clocks on during system boot clk: imx: Remove __init for imx_register_uart_clocks() API clk: imx6q: fix section mismatch warning clk: imx8mq: Use devm_platform_ioremap_resource() instead of of_iomap() clk: imx8mq: Use imx_check_clocks() API directly clk: imx: Remove __init for imx_check_clocks() API clk: imx6sll: Switch to clk_hw based API clk: imx7d: Switch to clk_hw based API clk: imx6ul: Switch to clk_hw based API clk: imx6sx: Switch to clk_hw based API clk: imx6q: Switch to clk_hw based API clk: imx6sl: Switch to clk_hw based API clk: imx: Switch wrappers to clk_hw based API clk: imx: clk-fixup-mux: Switch to clk_hw based API clk: imx: clk-fixup-div: Switch to clk_hw based API clk: imx: clk-gate-exclusive: Switch to clk_hw based API clk: imx: clk-pfd: Switch to clk_hw based API clk: imx: clk-pllv3: Switch to clk_hw based API clk: imx: clk-gate2: Switch to clk_hw based API clk: imx: clk-cpu: Switch to clk_hw based API ... * clk-allwinner: (29 commits) clk: Simplify debugfs printing and add a newline clk: sunxi-ng: sun8i-r: Use local parent references for SUNXI_CCU_GATE clk: sunxi-ng: a80-usb: Use local parent references for SUNXI_CCU_GATE clk: sunxi-ng: gate: Add macros for referencing local clock parents clk: sunxi-ng: h6-r: Use local parent references for CLK_FIXED_FACTOR clk: sunxi-ng: h6: Use local parent references for CLK_FIXED_FACTOR clk: sunxi-ng: a64: Use local parent references for CLK_FIXED_FACTOR clk: sunxi-ng: f1c100s: Use local parent references for CLK_FIXED_FACTOR clk: sunxi-ng: sun8i-r: Use local parent references for CLK_FIXED_FACTOR clk: sunxi-ng: v3s: Use local parent references for CLK_FIXED_FACTOR clk: sunxi-ng: r40: Use local parent references for CLK_FIXED_FACTOR clk: sunxi-ng: h3: Use local parent references for CLK_FIXED_FACTOR clk: sunxi-ng: a33: Use local parent references for CLK_FIXED_FACTOR clk: sunxi-ng: a23: Use local parent references for CLK_FIXED_FACTOR clk: sunxi-ng: a31: Use local parent references for CLK_FIXED_FACTOR clk: sunxi-ng: sun5i: Use local parent references for CLK_FIXED_FACTOR clk: sunxi-ng: a10: Use local parent references for CLK_FIXED_FACTOR clk: sunxi-ng: sun8i-r: Use local parent references for CLK_HW_INIT_* clk: sunxi-ng: switch to of_clk_hw_register() for registering clks clk: fixed-factor: Add CLK_FIXED_FACTOR_FW_NAME for DT clock-names parent ...
2019-07-12Merge branches 'clk-qcom-gdsc-warn', 'clk-ingenic', 'clk-qcom-qcs404-reset', ↵Stephen Boyd31-208/+1298
'clk-xgene-limit' and 'clk-meson' into clk-next * clk-qcom-gdsc-warn: clk: qcom: gdsc: WARN when failing to toggle * clk-ingenic: MIPS: Remove dead code clk: ingenic: Remove unused functions MIPS: jz4740: PM: Let CGU driver suspend clocks and set sleep mode clk: ingenic: Handle setting the Low-Power Mode bit clk: ingenic: Add missing header in cgu.h clk: ingenic/jz4725b: Fix "pll half" divider not read/written properly clk: ingenic/jz4725b: Fix incorrect dividers for main clocks clk: ingenic/jz4770: Fix incorrect dividers for main clocks clk: ingenic/jz4740: Fix incorrect dividers for main clocks clk: ingenic: Add support for divider tables * clk-qcom-qcs404-reset: clk: gcc-qcs404: Add PCIe resets * clk-xgene-limit: clk: xgene: Don't build COMMON_CLK_XGENE by default * clk-meson: clk: meson: g12a: mark fclk_div3 as critical clk: meson: g12a: Add support for G12B CPUB clocks dt-bindings: clk: meson: add g12b periph clock controller bindings clk: meson-g12a: add temperature sensor clocks dt-bindings: clk: g12a-clkc: add Temperature Sensor clock IDs clk: meson: meson8b: add the cts_i958 clock clk: meson: meson8b: add the cts_mclk_i958 clocks clk: meson: meson8b: add the cts_amclk clocks dt-bindings: clock: meson8b: add the audio clocks clk: meson: g12a: add controller register init clk: meson: eeclk: add init regs clk: meson: g12a: add mpll register init sequences clk: meson: mpll: add init callback and regs clk: meson: axg: spread spectrum is on mpll2 clk: meson: gxbb: no spread spectrum on mpll0 clk: meson: mpll: properly handle spread spectrum clk: meson: meson8b: fix a typo in the VPU parent names array variable clk: meson: fix MPLL 50M binding id typo
2019-07-12Merge branches 'clk-pwm-duty', 'clk-bcm', 'clk-mtk', 'clk-qcom-msm8998-gpu' ↵Stephen Boyd21-144/+343
and 'clk-renesas' into clk-next - Add support to get duty cycle of generic pwm clks * clk-pwm-duty: clk: pwm: implement the .get_duty_cycle callback * clk-bcm: clk: bcm: Allow CLK_BCM2835 for ARCH_BRCMSTB clk: bcm: Make BCM2835 clock drivers selectable * clk-mtk: clk: mediatek: Remove MT8183 unused clock clk: mediatek: add audsys clock driver for MT8516 dt-bindings: mediatek: audsys: add support for MT8516 * clk-qcom-msm8998-gpu: dt-bindings: clock: Document gpucc for msm8998 * clk-renesas: clk: renesas: cpg-mssr: Use [] to denote a flexible array member clk: renesas: cpg-mssr: Combine driver-private and clock array allocation clk: renesas: mstp: Combine group-private and clock array allocation clk: renesas: div6: Combine clock-private and parent array allocation clk: renesas: cpg-mssr: Update kerneldoc for struct cpg_mssr_priv clk: renesas: r8a774a1: Add TMU clock clk: renesas: r8a77995: Add CMM clocks clk: renesas: r8a77990: Add CMM clocks clk: renesas: r8a77965: Add CMM clocks clk: renesas: r8a7795: Add CMM clocks clk: renesas: r9a06g032: Add clock domain support dt-bindings: clock: renesas: r9a06g032-sysctrl: Document power Domains clk: renesas: mstp: Remove error messages on out-of-memory conditions clk: renesas: cpg-mssr: Remove error messages on out-of-memory conditions clk: renesas: cpg-mssr: Use genpd of_node instead of local copy clk: renesas: r8a7796: Add CMM clocks clk: renesas: r8a779{5|6|65}: Add TPU clock
2019-07-12Merge tag 'v5.3-rockchip-clk1' of ↵Stephen Boyd10-46/+29
git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into clk-rockchip Pull Rockchip clk driver updates from Heiko Stuebner: - New clock-ids+exports for two clocks - Cleanup for some boilerplate code for clocks we cannot really control from the kernel, but want to define separately to match the hardware-description (watchdog in secure-grf) - Improvement in mmc phase calculation and cleanup of some rate defintions * tag 'v5.3-rockchip-clk1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: clk: rockchip: export HDMIPHY clock on rk3228 clk: rockchip: add watchdog pclk on rk3328 clk: rockchip: add clock id for hdmi_phy special clock on rk3228 clk: rockchip: add clock id for watchdog pclk on rk3328 clk: rockchip: convert pclk_wdt boilerplat to new SGRF_GATE macro clk: rockchip: add a type from SGRF-controlled gate clocks clk: rockchip: Remove 48 MHz PLL rate from rk3288 clk: rockchip: add 1.464GHz cpu-clock rate to rk3228 clk: rockchip: Slightly more accurate math in rockchip_mmc_get_phase() clk: rockchip: Don't yell about bad mmc phases when getting clk: rockchip: Use clk_hw_get_rate() in MMC phase calculation
2019-07-12mm/oom_kill.c: remove redundant OOM score normalization in select_bad_process()Tetsuo Handa1-2/+0
Since commit bbbe48029720 ("mm, oom: remove 'prefer children over parent' heuristic") removed the "%s: Kill process %d (%s) score %u or sacrifice child\n" line, oc->chosen_points is no longer used after select_bad_process(). Link: http://lkml.kernel.org/r/1560853435-15575-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12oom: decouple mems_allowed from oom_unkillable_taskShakeel Butt3-28/+33
Commit ef08e3b4981a ("[PATCH] cpusets: confine oom_killer to mem_exclusive cpuset") introduces a heuristic where a potential oom-killer victim is skipped if the intersection of the potential victim and the current (the process triggered the oom) is empty based on the reason that killing such victim most probably will not help the current allocating process. However the commit 7887a3da753e ("[PATCH] oom: cpuset hint") changed the heuristic to just decrease the oom_badness scores of such potential victim based on the reason that the cpuset of such processes might have changed and previously they may have allocated memory on mems where the current allocating process can allocate from. Unintentionally 7887a3da753e ("[PATCH] oom: cpuset hint") introduced a side effect as the oom_badness is also exposed to the user space through /proc/[pid]/oom_score, so, readers with different cpusets can read different oom_score of the same process. Later, commit 6cf86ac6f36b ("oom: filter tasks not sharing the same cpuset") fixed the side effect introduced by 7887a3da753e by moving the cpuset intersection back to only oom-killer context and out of oom_badness. However the combination of ab290adbaf8f ("oom: make oom_unkillable_task() helper function") and 26ebc984913b ("oom: /proc/<pid>/oom_score treat kernel thread honestly") unintentionally brought back the cpuset intersection check into the oom_badness calculation function. Other than doing cpuset/mempolicy intersection from oom_badness, the memcg oom context is also doing cpuset/mempolicy intersection which is quite wrong and is caught by syzcaller with the following report: kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 28426 Comm: syz-executor.5 Not tainted 5.2.0-rc3-next-20190607 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline] RIP: 0010:has_intersects_mems_allowed mm/oom_kill.c:84 [inline] RIP: 0010:oom_unkillable_task mm/oom_kill.c:168 [inline] RIP: 0010:oom_unkillable_task+0x180/0x400 mm/oom_kill.c:155 Code: c1 ea 03 80 3c 02 00 0f 85 80 02 00 00 4c 8b a3 10 07 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8d 74 24 10 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 67 02 00 00 49 8b 44 24 10 4c 8d a0 68 fa ff ff RSP: 0018:ffff888000127490 EFLAGS: 00010a03 RAX: dffffc0000000000 RBX: ffff8880a4cd5438 RCX: ffffffff818dae9c RDX: 100000000c3cc602 RSI: ffffffff818dac8d RDI: 0000000000000001 RBP: ffff8880001274d0 R08: ffff888000086180 R09: ffffed1015d26be0 R10: ffffed1015d26bdf R11: ffff8880ae935efb R12: 8000000061e63007 R13: 0000000000000000 R14: 8000000061e63017 R15: 1ffff11000024ea6 FS: 00005555561f5940(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000607304 CR3: 000000009237e000 CR4: 00000000001426f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 Call Trace: oom_evaluate_task+0x49/0x520 mm/oom_kill.c:321 mem_cgroup_scan_tasks+0xcc/0x180 mm/memcontrol.c:1169 select_bad_process mm/oom_kill.c:374 [inline] out_of_memory mm/oom_kill.c:1088 [inline] out_of_memory+0x6b2/0x1280 mm/oom_kill.c:1035 mem_cgroup_out_of_memory+0x1ca/0x230 mm/memcontrol.c:1573 mem_cgroup_oom mm/memcontrol.c:1905 [inline] try_charge+0xfbe/0x1480 mm/memcontrol.c:2468 mem_cgroup_try_charge+0x24d/0x5e0 mm/memcontrol.c:6073 mem_cgroup_try_charge_delay+0x1f/0xa0 mm/memcontrol.c:6088 do_huge_pmd_wp_page_fallback+0x24f/0x1680 mm/huge_memory.c:1201 do_huge_pmd_wp_page+0x7fc/0x2160 mm/huge_memory.c:1359 wp_huge_pmd mm/memory.c:3793 [inline] __handle_mm_fault+0x164c/0x3eb0 mm/memory.c:4006 handle_mm_fault+0x3b7/0xa90 mm/memory.c:4053 do_user_addr_fault arch/x86/mm/fault.c:1455 [inline] __do_page_fault+0x5ef/0xda0 arch/x86/mm/fault.c:1521 do_page_fault+0x71/0x57d arch/x86/mm/fault.c:1552 page_fault+0x1e/0x30 arch/x86/entry/entry_64.S:1156 RIP: 0033:0x400590 Code: 06 e9 49 01 00 00 48 8b 44 24 10 48 0b 44 24 28 75 1f 48 8b 14 24 48 8b 7c 24 20 be 04 00 00 00 e8 f5 56 00 00 48 8b 74 24 08 <89> 06 e9 1e 01 00 00 48 8b 44 24 08 48 8b 14 24 be 04 00 00 00 8b RSP: 002b:00007fff7bc49780 EFLAGS: 00010206 RAX: 0000000000000001 RBX: 0000000000760000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 000000002000cffc RDI: 0000000000000001 RBP: fffffffffffffffe R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000075 R11: 0000000000000246 R12: 0000000000760008 R13: 00000000004c55f2 R14: 0000000000000000 R15: 00007fff7bc499b0 Modules linked in: ---[ end trace a65689219582ffff ]--- RIP: 0010:__read_once_size include/linux/compiler.h:194 [inline] RIP: 0010:has_intersects_mems_allowed mm/oom_kill.c:84 [inline] RIP: 0010:oom_unkillable_task mm/oom_kill.c:168 [inline] RIP: 0010:oom_unkillable_task+0x180/0x400 mm/oom_kill.c:155 Code: c1 ea 03 80 3c 02 00 0f 85 80 02 00 00 4c 8b a3 10 07 00 00 48 b8 00 00 00 00 00 fc ff df 4d 8d 74 24 10 4c 89 f2 48 c1 ea 03 <80> 3c 02 00 0f 85 67 02 00 00 49 8b 44 24 10 4c 8d a0 68 fa ff ff RSP: 0018:ffff888000127490 EFLAGS: 00010a03 RAX: dffffc0000000000 RBX: ffff8880a4cd5438 RCX: ffffffff818dae9c RDX: 100000000c3cc602 RSI: ffffffff818dac8d RDI: 0000000000000001 RBP: ffff8880001274d0 R08: ffff888000086180 R09: ffffed1015d26be0 R10: ffffed1015d26bdf R11: ffff8880ae935efb R12: 8000000061e63007 R13: 0000000000000000 R14: 8000000061e63017 R15: 1ffff11000024ea6 FS: 00005555561f5940(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2f823000 CR3: 000000009237e000 CR4: 00000000001426f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000600 The fix is to decouple the cpuset/mempolicy intersection check from oom_unkillable_task() and make sure cpuset/mempolicy intersection check is only done in the global oom context. [[email protected]: change function name and update comment] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Shakeel Butt <[email protected]> Reported-by: [email protected] Acked-by: Roman Gushchin <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: David Rientjes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm, oom: remove redundant task_in_mem_cgroup() checkShakeel Butt5-47/+9
oom_unkillable_task() can be called from three different contexts i.e. global OOM, memcg OOM and oom_score procfs interface. At the moment oom_unkillable_task() does a task_in_mem_cgroup() check on the given process. Since there is no reason to perform task_in_mem_cgroup() check for global OOM and oom_score procfs interface, those contexts provide NULL memcg and skips the task_in_mem_cgroup() check. However for memcg OOM context, the oom_unkillable_task() is always called from mem_cgroup_scan_tasks() and thus task_in_mem_cgroup() check becomes redundant and effectively dead code. So, just remove the task_in_mem_cgroup() check altogether. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Shakeel Butt <[email protected]> Signed-off-by: Tetsuo Handa <[email protected]> Acked-by: Roman Gushchin <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: David Rientjes <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Nick Piggin <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm, oom: refactor dump_tasks for memcg OOMsShakeel Butt1-28/+40
dump_tasks() traverses all the existing processes even for the memcg OOM context which is not only unnecessary but also wasteful. This imposes a long RCU critical section even from a contained context which can be quite disruptive. Change dump_tasks() to be aligned with select_bad_process and use mem_cgroup_scan_tasks to selectively traverse only processes of the target memcg hierarchy during memcg OOM. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Shakeel Butt <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: David Rientjes <[email protected]> Cc: KOSAKI Motohiro <[email protected]> Cc: Paul Jackson <[email protected]> Cc: Nick Piggin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: memcontrol: use CSS_TASK_ITER_PROCS at mem_cgroup_scan_tasks()Tetsuo Handa2-4/+1
Since commit c03cd7738a83 ("cgroup: Include dying leaders with live threads in PROCS iterations") corrected how CSS_TASK_ITER_PROCS works, mem_cgroup_scan_tasks() can use CSS_TASK_ITER_PROCS in order to check only one thread from each thread group. [[email protected]: remove thread group leader check in oom_evaluate_task()] Link: http://lkml.kernel.org/r/1560853257-14934-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tetsuo Handa <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Tejun Heo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/memory-failure.c: clarify error messageJane Chu1-1/+1
Some user who install SIGBUS handler that does longjmp out therefore keeping the process alive is confused by the error message "[188988.765862] Memory failure: 0x1840200: Killing cellsrv:33395 due to hardware memory corruption" Slightly modify the error message to improve clarity. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jane Chu <[email protected]> Acked-by: Naoya Horiguchi <[email protected]> Acked-by: Pankaj Gupta <[email protected]> Reviewed-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: vmalloc: show number of vmalloc pages in /proc/meminfoRoman Gushchin3-1/+13
Vmalloc() is getting more and more used these days (kernel stacks, bpf and percpu allocator are new top users), and the total % of memory consumed by vmalloc() can be pretty significant and changes dynamically. /proc/meminfo is the best place to display this information: its top goal is to show top consumers of the memory. Since the VmallocUsed field in /proc/meminfo is not in use for quite a long time (it has been defined to 0 by a5ad88ce8c7f ("mm: get rid of 'vmalloc_info' from /proc/meminfo")), let's reuse it for showing the actual physical memory consumption of vmalloc(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Roman Gushchin <[email protected]> Acked-by: Johannes Weiner <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: smaps: split PSS into componentsLuigi Semenzato3-42/+105
Report separate components (anon, file, and shmem) for PSS in smaps_rollup. This helps understand and tune the memory manager behavior in consumer devices, particularly mobile devices. Many of them (e.g. chromebooks and Android-based devices) use zram for anon memory, and perform disk reads for discarded file pages. The difference in latency is large (e.g. reading a single page from SSD is 30 times slower than decompressing a zram page on one popular device), thus it is useful to know how much of the PSS is anon vs. file. All the information is already present in /proc/pid/smaps, but much more expensive to obtain because of the large size of that procfs entry. This patch also removes a small code duplication in smaps_account, which would have gotten worse otherwise. Also updated Documentation/filesystems/proc.txt (the smaps section was a bit stale, and I added a smaps_rollup section) and Documentation/ABI/testing/procfs-smaps_rollup. [[email protected]: v5] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Luigi Semenzato <[email protected]> Acked-by: Yu Zhao <[email protected]> Cc: Sonny Rao <[email protected]> Cc: Yu Zhao <[email protected]> Cc: Brian Geffon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: use down_read_killable for locking mmap_sem in access_remote_vmKonstantin Khlebnikov2-2/+5
This function is used by ptrace and proc files like /proc/pid/cmdline and /proc/pid/environ. Access_remote_vm never returns error codes, all errors are ignored and only size of successfully read data is returned. So, if current task was killed we'll simply return 0 (bytes read). Mmap_sem could be locked for a long time or forever if something goes wrong. Using a killable lock permits cleanup of stuck tasks and simplifies investigation. Link: http://lkml.kernel.org/r/156007494202.3335.16782303099589302087.stgit@buzz Signed-off-by: Konstantin Khlebnikov <[email protected]> Reviewed-by: Michal Koutný <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Cyrill Gorcunov <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Al Viro <[email protected]> Cc: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12proc: use down_read_killable mmap_sem for /proc/pid/map_filesKonstantin Khlebnikov1-6/+22
Do not remain stuck forever if something goes wrong. Using a killable lock permits cleanup of stuck tasks and simplifies investigation. It seems ->d_revalidate() could return any error (except ECHILD) to abort validation and pass error as result of lookup sequence. [[email protected]: fix proc_map_files_lookup() return value, per Andrei] Link: http://lkml.kernel.org/r/156007493995.3335.9595044802115356911.stgit@buzz Signed-off-by: Konstantin Khlebnikov <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Reviewed-by: Cyrill Gorcunov <[email protected]> Reviewed-by: Kirill Tkhai <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Al Viro <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12proc: use down_read_killable mmap_sem for /proc/pid/clear_refsKonstantin Khlebnikov1-1/+4
Do not remain stuck forever if something goes wrong. Using a killable lock permits cleanup of stuck tasks and simplifies investigation. Replace the only unkillable mmap_sem lock in clear_refs_write(). Link: http://lkml.kernel.org/r/156007493826.3335.5424884725467456239.stgit@buzz Signed-off-by: Konstantin Khlebnikov <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Reviewed-by: Cyrill Gorcunov <[email protected]> Reviewed-by: Kirill Tkhai <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Al Viro <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12proc: use down_read_killable mmap_sem for /proc/pid/pagemapKonstantin Khlebnikov1-1/+3
Do not remain stuck forever if something goes wrong. Using a killable lock permits cleanup of stuck tasks and simplifies investigation. Link: http://lkml.kernel.org/r/156007493638.3335.4872164955523928492.stgit@buzz Signed-off-by: Konstantin Khlebnikov <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Reviewed-by: Cyrill Gorcunov <[email protected]> Reviewed-by: Kirill Tkhai <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Al Viro <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12proc: use down_read_killable mmap_sem for /proc/pid/smaps_rollupKonstantin Khlebnikov1-2/+6
Do not remain stuck forever if something goes wrong. Using a killable lock permits cleanup of stuck tasks and simplifies investigation. Link: http://lkml.kernel.org/r/156007493429.3335.14666825072272692455.stgit@buzz Signed-off-by: Konstantin Khlebnikov <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Reviewed-by: Cyrill Gorcunov <[email protected]> Reviewed-by: Kirill Tkhai <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Al Viro <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12proc: use down_read_killable mmap_sem for /proc/pid/mapsKonstantin Khlebnikov2-2/+10
Do not remain stuck forever if something goes wrong. Using a killable lock permits cleanup of stuck tasks and simplifies investigation. This function is also used for /proc/pid/smaps. Link: http://lkml.kernel.org/r/156007493160.3335.14447544314127417266.stgit@buzz Signed-off-by: Konstantin Khlebnikov <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Reviewed-by: Cyrill Gorcunov <[email protected]> Reviewed-by: Kirill Tkhai <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Al Viro <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Koutný <[email protected]> Cc: Oleg Nesterov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12tools/vm/slabinfo: add sorting info to help menuTobin C. Harding1-0/+2
Passing more than one sorting option has undefined behaviour. Add an explicit statement as such to the help menu, this also has the advantage of highlighting all the sorting options. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tobin C. Harding <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Brendan Gregg <[email protected]>, Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12tools/vm/slabinfo: add option to sort by partial slabsTobin C. Harding1-2/+7
We would like to get a better view of the level of fragmentation within the SLUB allocator. Total number of partial slabs is an indicator of fragmentation. Add a command line option (-P | --partial) to sort the slab list by total number of partial slabs. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tobin C. Harding <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Brendan Gregg <[email protected]>, Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12tools/vm/slabinfo: add partial slab listing to -XTobin C. Harding1-13/+28
We would like to see how fragmented the SLUB allocator is, one window into fragmentation is the total number of partial slabs. Currently `slabinfo -X` shows slabs sorted by loss and by size. We can use this option to also show slabs sorted by number of partial slabs. Option '-X' can be used in conjunction with '-N' to control the number of slabs shown e.g. list of top 5 slabs: slabinfo -X -N5 Add list of slabs ordered by number of partial slabs to output of `slabinfo -X`. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tobin C. Harding <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Brendan Gregg <[email protected]>, Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Qian Cai <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12tools/vm/slabinfo: order command line optionsTobin C. Harding1-35/+35
During recent discussion on LKML over SLAB vs SLUB it was suggested by Jesper that it would be nice to have a tool to view the current fragmentation of the slab allocators. CC list for this set is taken from that thread. For SLUB we have all the information for this already exposed by the kernel and also we have a userspace tool for displaying this info: tools/vm/slabinfo.c Extend slabinfo to improve the fragmentation information by enabling sorting of caches by number of partial slabs. Also add cache list sorted in this manner to the output of `slabinfo -X`. This patch (of 4): get_opt() has a spurious character within the option string. Remove it and reorder the options in alphabetic order so that it is easier to keep the options correct. Use the same ordering for command help output and long option handling code. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Tobin C. Harding <[email protected]> Cc: Jesper Dangaard Brouer <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Qian Cai <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Brendan Gregg <[email protected]>, Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: vmscan: correct some vmscan counters for THP swapoutYang Shi1-14/+49
Commit bd4c82c22c36 ("mm, THP, swap: delay splitting THP after swapped out"), THP can be swapped out in a whole. But, nr_reclaimed and some other vm counters still get inc'ed by one even though a whole THP (512 pages) gets swapped out. This doesn't make too much sense to memory reclaim. For example, direct reclaim may just need reclaim SWAP_CLUSTER_MAX pages, reclaiming one THP could fulfill it. But, if nr_reclaimed is not increased correctly, direct reclaim may just waste time to reclaim more pages, SWAP_CLUSTER_MAX * 512 pages in worst case. And, it may cause pgsteal_{kswapd|direct} is greater than pgscan_{kswapd|direct}, like the below: pgsteal_kswapd 122933 pgsteal_direct 26600225 pgscan_kswapd 174153 pgscan_direct 14678312 nr_reclaimed and nr_scanned must be fixed in parallel otherwise it would break some page reclaim logic, e.g. vmpressure: this looks at the scanned/reclaimed ratio so it won't change semantics as long as scanned & reclaimed are fixed in parallel. compaction/reclaim: compaction wants a certain number of physical pages freed up before going back to compacting. kswapd priority raising: kswapd raises priority if we scan fewer pages than the reclaim target (which itself is obviously expressed in order-0 pages). As a result, kswapd can falsely raise its aggressiveness even when it's making great progress. Other than nr_scanned and nr_reclaimed, some other counters, e.g. pgactivate, nr_skipped, nr_ref_keep and nr_unmap_fail need to be fixed too since they are user visible via cgroup, /proc/vmstat or trace points, otherwise they would be underreported. When isolating pages from LRUs, nr_taken has been accounted in base page, but nr_scanned and nr_skipped are still accounted in THP. It doesn't make too much sense too since this may cause trace point underreport the numbers as well. So accounting those counters in base page instead of accounting THP as one page. nr_dirty, nr_unqueued_dirty, nr_congested and nr_writeback are used by file cache, so they are not impacted by THP swap. This change may result in lower steal/scan ratio in some cases since THP may get split during page reclaim, then a part of tail pages get reclaimed instead of the whole 512 pages, but nr_scanned is accounted by 512, particularly for direct reclaim. But, this should be not a significant issue. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Shi <[email protected]> Reviewed-by: "Huang, Ying" <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mel Gorman <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: vmscan: remove double slab pressure by inc'ing sc->nr_scannedYang Shi1-5/+0
Commit 9092c71bb724 ("mm: use sc->priority for slab shrink targets") has broken up the relationship between sc->nr_scanned and slab pressure. The sc->nr_scanned can't double slab pressure anymore. So, it sounds no sense to still keep sc->nr_scanned inc'ed. Actually, it would prevent from adding pressure on slab shrink since excessive sc->nr_scanned would prevent from scan->priority raise. The bonnie test doesn't show this would change the behavior of slab shrinkers. w/ w/o /sec %CP /sec %CP Sequential delete: 3960.6 94.6 3997.6 96.2 Random delete: 2518 63.8 2561.6 64.6 The slight increase of "/sec" without the patch would be caused by the slight increase of CPU usage. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yang Shi <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Mel Gorman <[email protected]> Cc: "Kirill A . Shutemov" <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Hillf Danton <[email protected]> Cc: "Huang, Ying" <[email protected]> Cc: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: init: report memory auto-initialization features at boot timeAlexander Potapenko1-0/+24
Print the currently enabled stack and heap initialization modes. Stack initialization is enabled by a config flag, while heap initialization is configured at boot time with defaults being set in the config. It's more convenient for the user to have all information about these hardening measures in one place at boot, so the user can reason about the expected behavior of the running system. The possible options for stack are: - "all" for CONFIG_INIT_STACK_ALL; - "byref_all" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL; - "byref" for CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF; - "__user" for CONFIG_GCC_PLUGIN_STRUCTLEAK_USER; - "off" otherwise. Depending on the values of init_on_alloc and init_on_free boottime options we also report "heap alloc" and "heap free" as "on"/"off". In the init_on_free mode initializing pages at boot time may take a while, so print a notice about that as well. This depends on how much memory is installed, the memory bandwidth, etc. On a relatively modern x86 system, it takes about 0.75s/GB to wipe all memory: [ 0.418722] mem auto-init: stack:byref_all, heap alloc:off, heap free:on [ 0.419765] mem auto-init: clearing system memory may take some time... [ 12.376605] Memory: 16408564K/16776672K available (14339K kernel code, 1397K rwdata, 3756K rodata, 1636K init, 11460K bss, 368108K reserved, 0K cma-reserved) Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Alexander Potapenko <[email protected]> Suggested-by: Kees Cook <[email protected]> Acked-by: Kees Cook <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: James Morris <[email protected]> Cc: Jann Horn <[email protected]> Cc: Kostya Serebryany <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Masahiro Yamada <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Sandeep Patil <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Cc: Souptick Joarder <[email protected]> Cc: Marco Elver <[email protected]> Cc: Kaiwan N Billimoria <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>