aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-07-12mm/slab: sanity-check page type when looking up cacheKees Cook2-11/+22
This avoids any possible type confusion when looking up an object. For example, if a non-slab were to be passed to kfree(), the invalid slab_cache pointer (i.e. overlapped with some other value from the struct page union) would be used for subsequent slab manipulations that could lead to further memory corruption. Since the page is already in cache, adding the PageSlab() check will have nearly zero cost, so add a check and WARN() to virt_to_cache(). Additionally replaces an open-coded virt_to_cache(). To support the failure mode this also updates all callers of virt_to_cache() and cache_from_obj() to handle a NULL cache pointer return value (though note that several already handle this case gracefully). [[email protected]: restore IRQs in kfree()] Link: http://lkml.kernel.org/r/20190613065637.GE16334@mwanda Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Dan Carpenter <[email protected]> Cc: Alexander Popov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: David Rientjes <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Pekka Enberg <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/slab: validate cache membership under freelist hardeningKees Cook1-8/+6
Patch series "mm/slab: Improved sanity checking". This adds defenses against slab cache confusion (as seen in real-world exploits[1]) and gracefully handles type confusions when trying to look up slab caches from an arbitrary page. (Also is patch 3: new LKDTM tests for these defenses as well as for the existing double-free detection. This patch (of 3): When building under CONFIG_SLAB_FREELIST_HARDENING, it makes sense to perform sanity-checking on the assumed slab cache during kmem_cache_free() to make sure the kernel doesn't mix freelists across slab caches and corrupt memory (as seen in the exploitation of flaws like CVE-2018-9568[1]). Note that the prior code might WARN() but still corrupt memory (i.e. return the assumed cache instead of the owned cache). There is no noticeable performance impact (changes are within noise). Measuring parallel kernel builds, I saw the following with CONFIG_SLAB_FREELIST_HARDENED, before and after this patch: before: Run times: 288.85 286.53 287.09 287.07 287.21 Min: 286.53 Max: 288.85 Mean: 287.35 Std Dev: 0.79 after: Run times: 289.58 287.40 286.97 287.20 287.01 Min: 286.97 Max: 289.58 Mean: 287.63 Std Dev: 0.99 Delta: 0.1% which is well below the standard deviation [1] https://github.com/ThomasKing2014/slides/raw/master/Building%20universal%20Android%20rooting%20with%20a%20type%20confusion%20vulnerability.pdf Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Alexander Popov <[email protected]> Cc: Alexander Potapenko <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12ocfs2: use kmemdup rather than duplicating its implementationFuqian Huang2-9/+7
kmemdup is introduced to duplicate a region of memory in a neat way. Rather than kmalloc/kzalloc + memcpy, which the programmer needs to write the size twice (sometimes lead to mistakes), kmemdup improves readability, leads to smaller code and also reduce the chances of mistakes. Suggestion to use kmemdup rather than using kmalloc/kzalloc + memcpy. [[email protected]: coding style fixes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Fuqian Huang <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12fs/ocfs2/dlmglue.c: unneeded variable: "status"Hariprasad Kelam1-2/+1
fix below issue reported by coccicheck fs/ocfs2/dlmglue.c:4410:5-11: Unneeded variable: "status". Return "0" on line 4428 We can not change return type of ocfs2_downconvert_thread as its registered as callback of kthread_create. Link: http://lkml.kernel.org/r/20190702183237.GA13975@hari-Inspiron-1545 Signed-off-by: Hariprasad Kelam <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12ocfs: no need to check return value of debugfs_create functionsGreg Kroah-Hartman13-261/+73
When calling debugfs functions, there is no need to ever check the return value. The function can work or not, but the code logic should never do something different based on this. Also, because there is no need to save the file dentry, remove all of the variables that were being saved, and just recursively delete the whole directory when shutting down, saving a lot of logic and local variables. [[email protected]: v2] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Jia Guo <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12ocfs2: add first lock wait time in locking_stateGang He2-3/+30
ocfs2 file system uses locking_state file under debugfs to dump each ocfs2 file system's dlm lock resources, but the users ever encountered some hang(deadlock) problems in ocfs2 file system. I'd like to add first lock wait time in locking_state file, which can help the upper scripts detect these deadlock problems via comparing the first lock wait time with the current time. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Gang He <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12ocfs2: add locking filter debugfs fileGang He2-0/+40
Add locking filter debugfs file, which is used to filter lock resources dump from locking_state debugfs file. We use d_filter_secs field to filter lock resources dump, the default d_filter_secs(0) value filters nothing, otherwise, only dump the last N seconds active lock resources. This enhancement can avoid dumping lots of old records. The d_filter_secs value can be changed via locking_filter file. [[email protected]: fix undefined reference to `__udivdi3'] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Gang He <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Acked-by: Randy Dunlap <[email protected]> [build-tested] Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12ocfs2: add last unlock times in locking_stateGang He2-3/+16
ocfs2 file system uses locking_state file under debugfs to dump each ocfs2 file system's dlm lock resources, but the dlm lock resources in memory are becoming more and more after the files were touched by the user. it will become a bit difficult to analyze these dlm lock resource records in locking_state file by the upper scripts, though some files are not active for now, which were accessed long time ago. Then, I'd like to add last pr/ex unlock times in locking_state file for each dlm lock resource record, the the upper scripts can use last unlock time to filter inactive dlm lock resource record. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Gang He <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12ocfs2/dlm: use struct_size() helperGustavo A. R. Silva1-5/+3
One of the more common cases of allocation size calculations is finding the size of a structure that has a zero-sized array at the end, along with memory for some number of elements for that array. For example: struct dlm_migratable_lockres { ... struct dlm_migratable_lock ml[0]; // 16 bytes each, begins at byte 112 }; Make use of the struct_size() helper instead of an open-coded version in order to avoid any potential type mistakes. So, replace the following form: sizeof(struct dlm_migratable_lockres) + (mres->num_locks * sizeof(struct dlm_migratable_lock)) with: struct_size(mres, ml, mres->num_locks) Notice that, in this case, variable sz is not necessary, hence it is removed. This code was detected with the help of Coccinelle. Link: http://lkml.kernel.org/r/20190605204926.GA24467@embeddedor Signed-off-by: Gustavo A. R. Silva <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12fs: ocfs: fix spelling mistake "hearbeating" -> "heartbeat"ChenGang4-4/+4
There are some spelling mistakes in ocfs, fix it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: ChenGang <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Joseph Qi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12sh: prevent warnings when using iounmapSam Ravnborg1-1/+5
When building drm/exynos for sh, as part of an allmodconfig build, the following warning triggered: exynos7_drm_decon.c: In function `decon_remove': exynos7_drm_decon.c:769:24: warning: unused variable `ctx' struct decon_context *ctx = dev_get_drvdata(&pdev->dev); The ctx variable is only used as argument to iounmap(). In sh - allmodconfig CONFIG_MMU is not defined so it ended up in: \#define __iounmap(addr) do { } while (0) \#define iounmap __iounmap Fix the warning by introducing a static inline function for iounmap. This is similar to several other architectures. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Sam Ravnborg <[email protected]> Reviewed-by: Geert Uytterhoeven <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Cc: Will Deacon <[email protected]> Cc: Mark Brown <[email protected]> Cc: Inki Dae <[email protected]> Cc: Krzysztof Kozlowski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12sh: config: remove left-over BACKLIGHT_LCD_SUPPORTKrzysztof Kozlowski2-2/+0
CONFIG_BACKLIGHT_LCD_SUPPORT was removed in 8c5dc8d9f19c ("video: backlight: Remove useless BACKLIGHT_LCD_SUPPORT kernel symbol"). Options protected by CONFIG_BACKLIGHT_LCD_SUPPORT are now available directly. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Krzysztof Kozlowski <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Rich Felker <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12arch/sh/configs/sdk7786_defconfig: remove CONFIG_LOGFSKefeng Wang1-1/+0
After commit 1d0fd57a50aa ("logfs: remove from tree"), logfs was removed, drop CONFIG_LOGFS from all defconfigs. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kefeng Wang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12scripts/spelling.txt: add more spellings to spelling.txtColin Ian King1-0/+33
Here are some of the more common spelling mistakes and typos that I've found while fixing up spelling mistakes in the kernel over the past few months. Developers keep on coming up with more inventive ways to spell words. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12scripts/decode_stacktrace: Accept dash/underscore in modulesEvan Green1-1/+1
The manpage for modprobe mentions that dashes and underscores are treated interchangeably in module names. The stack trace dumps seem to print module names with underscores. Use bash to replace _ with the pattern [-_] so that file names with dashes or underscores can be found. For example, this line: [ 27.919759] hda_widget_sysfs_init+0x2b8/0x3a5 [snd_hda_core] should find a module named snd-hda-core.ko. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Evan Green <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Acked-by: Konstantin Khlebnikov <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Douglas Anderson <[email protected]> Cc: Evan Green <[email protected]> Cc: Nicolas Boichat <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Manuel Traut <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12scripts/spelling.txt: add spelling fix for prohibitedChris Paterson2-1/+3
Misspelling 'prohibited' is quite common in the real world, although surprisingly not so much in the Linux Kernel. In addition to fixing the typo we may as well add it to the spelling checker. Also adding the present participle (prohibiting). Link: http://lkml.kernel.org/r/[email protected] Fixes: 5bf2fbbef50c ("clk: renesas: cpg-mssr: Add r8a77470 support") Signed-off-by: Chris Paterson <[email protected]> Acked-by: Stephen Boyd <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12scripts/spelling.txt: drop "sepc" from the misspelling listPaul Walmsley6-6/+5
The RISC-V architecture has a register named the "Supervisor Exception Program Counter", or "sepc". This abbreviation triggers checkpatch.pl's misspelling detector, resulting in noise in the checkpatch output. The risk that this noise could cause more useful warnings to be missed seems to outweigh the harm of an occasional misspelling of "spec". Thus drop the "sepc" entry from the misspelling list. [[email protected]: fix existing "sepc" instances, per Joe] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Paul Walmsley <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12scripts/decode_stacktrace: look for modules with .ko.debug extensionNicolas Boichat1-1/+1
In Chromium OS kernel builds, we split the debug information as .ko.debug files, and that's what decode_stacktrace.sh needs to use. Relax objfile matching rule to allow any .ko* file to be matched. [[email protected]: add quotes around name pattern] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Nicolas Boichat <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Konstantin Khlebnikov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12scripts/decode_stacktrace: match basepath using shell prefix operator, not regexNicolas Boichat1-1/+1
The basepath may contain special characters, which would confuse the regex matcher. ${var#prefix} does the right thing. Link: http://lkml.kernel.org/r/[email protected] Fixes: 67a28de47faa8358 ("scripts/decode_stacktrace: only strip base path when a prefix of the path") Signed-off-by: Nicolas Boichat <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12include/linux/dmar.h: replace single-char identifiers in macrosQian Cai1-6/+8
There are a few macros in IOMMU have single-char identifiers make the code hard to read and debug. Replace them with meaningful names. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Qian Cai <[email protected]> Suggested-by: Andrew Morton <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Robin Murphy <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12MAINTAINERS: nilfs2: update email addressRyusuke Konishi1-1/+1
Change my email since lab.ntt.co.jp email domain has been deprecated due to company policy. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ryusuke Konishi <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12nilfs2: do not use unexported cpu_to_le32()/le32_to_cpu() in uapi headerMasahiro Yamada1-12/+12
cpu_to_le32/le32_to_cpu is defined in include/linux/byteorder/generic.h, which is not exported to user-space. UAPI headers must use the ones prefixed with double-underscore. Detected by compile-testing exported headers: include/linux/nilfs2_ondisk.h: In function `nilfs_checkpoint_set_snapshot': include/linux/nilfs2_ondisk.h:536:17: error: implicit declaration of function `cpu_to_le32' [-Werror=implicit-function-declaration] cp->cp_flags = cpu_to_le32(le32_to_cpu(cp->cp_flags) | \ ^ include/linux/nilfs2_ondisk.h:552:1: note: in expansion of macro `NILFS_CHECKPOINT_FNS' NILFS_CHECKPOINT_FNS(SNAPSHOT, snapshot) ^~~~~~~~~~~~~~~~~~~~ include/linux/nilfs2_ondisk.h:536:29: error: implicit declaration of function `le32_to_cpu' [-Werror=implicit-function-declaration] cp->cp_flags = cpu_to_le32(le32_to_cpu(cp->cp_flags) | \ ^ include/linux/nilfs2_ondisk.h:552:1: note: in expansion of macro `NILFS_CHECKPOINT_FNS' NILFS_CHECKPOINT_FNS(SNAPSHOT, snapshot) ^~~~~~~~~~~~~~~~~~~~ include/linux/nilfs2_ondisk.h: In function `nilfs_segment_usage_set_clean': include/linux/nilfs2_ondisk.h:622:19: error: implicit declaration of function `cpu_to_le64' [-Werror=implicit-function-declaration] su->su_lastmod = cpu_to_le64(0); ^~~~~~~~~~~ Link: http://lkml.kernel.org/r/[email protected] Fixes: e63e88bc53ba ("nilfs2: move ioctl interface and disk layout to uapi separately") Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Ryusuke Konishi <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Greg KH <[email protected]> Cc: Joe Perches <[email protected]> Cc: <[email protected]> [4.9+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/z3fold.c: lock z3fold page before __SetPageMovable()Henry Burns1-1/+11
Following zsmalloc.c's example we call trylock_page() and unlock_page(). Also make z3fold_page_migrate() assert that newpage is passed in locked, as per the documentation. [[email protected]: fix trylock_page return value test, per Shakeel] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Henry Burns <[email protected]> Suggested-by: Vitaly Wool <[email protected]> Acked-by: Vitaly Wool <[email protected]> Acked-by: David Rientjes <[email protected]> Reviewed-by: Shakeel Butt <[email protected]> Cc: Vitaly Vul <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Xidong Wang <[email protected]> Cc: Jonathan Adams <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/memcontrol: fix wrong statistics in memory.statYafang Shao1-2/+3
When we calculate total statistics for memcg1_stats and memcg1_events, we use the the index 'i' in the for loop as the events index. Actually we should use memcg1_stats[i] and memcg1_events[i] as the events index. Link: http://lkml.kernel.org/r/[email protected] Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty"). Signed-off-by: Yafang Shao <[email protected] Reviewed-by: Shakeel Butt <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Yafang Shao <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm/nvdimm: add is_ioremap_addr and use that to check ioremap addressAneesh Kumar K.V3-1/+20
Architectures like powerpc use different address range to map ioremap and vmalloc range. The memunmap() check used by the nvdimm layer was wrongly using is_vmalloc_addr() to check for ioremap range which fails for ppc64. This result in ppc64 not freeing the ioremap mapping. The side effect of this is an unbind failure during module unload with papr_scm nvdimm driver Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Aneesh Kumar K.V <[email protected]> Fixes: b5beae5e224f ("powerpc/pseries: Add driver for PAPR SCM regions") Cc: Dan Williams <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12mm: vmscan: scan anonymous pages on file refaultsKuo-Hsin Yang1-3/+3
When file refaults are detected and there are many inactive file pages, the system never reclaim anonymous pages, the file pages are dropped aggressively when there are still a lot of cold anonymous pages and system thrashes. This issue impacts the performance of applications with large executable, e.g. chrome. With this patch, when file refault is detected, inactive_list_is_low() always returns true for file pages in get_scan_count() to enable scanning anonymous pages. The problem can be reproduced by the following test program. ---8<--- void fallocate_file(const char *filename, off_t size) { struct stat st; int fd; if (!stat(filename, &st) && st.st_size >= size) return; fd = open(filename, O_WRONLY | O_CREAT, 0600); if (fd < 0) { perror("create file"); exit(1); } if (posix_fallocate(fd, 0, size)) { perror("fallocate"); exit(1); } close(fd); } long *alloc_anon(long size) { long *start = malloc(size); memset(start, 1, size); return start; } long access_file(const char *filename, long size, long rounds) { int fd, i; volatile char *start1, *end1, *start2; const int page_size = getpagesize(); long sum = 0; fd = open(filename, O_RDONLY); if (fd == -1) { perror("open"); exit(1); } /* * Some applications, e.g. chrome, use a lot of executable file * pages, map some of the pages with PROT_EXEC flag to simulate * the behavior. */ start1 = mmap(NULL, size / 2, PROT_READ | PROT_EXEC, MAP_SHARED, fd, 0); if (start1 == MAP_FAILED) { perror("mmap"); exit(1); } end1 = start1 + size / 2; start2 = mmap(NULL, size / 2, PROT_READ, MAP_SHARED, fd, size / 2); if (start2 == MAP_FAILED) { perror("mmap"); exit(1); } for (i = 0; i < rounds; ++i) { struct timeval before, after; volatile char *ptr1 = start1, *ptr2 = start2; gettimeofday(&before, NULL); for (; ptr1 < end1; ptr1 += page_size, ptr2 += page_size) sum += *ptr1 + *ptr2; gettimeofday(&after, NULL); printf("File access time, round %d: %f (sec) ", i, (after.tv_sec - before.tv_sec) + (after.tv_usec - before.tv_usec) / 1000000.0); } return sum; } int main(int argc, char *argv[]) { const long MB = 1024 * 1024; long anon_mb, file_mb, file_rounds; const char filename[] = "large"; long *ret1; long ret2; if (argc != 4) { printf("usage: thrash ANON_MB FILE_MB FILE_ROUNDS "); exit(0); } anon_mb = atoi(argv[1]); file_mb = atoi(argv[2]); file_rounds = atoi(argv[3]); fallocate_file(filename, file_mb * MB); printf("Allocate %ld MB anonymous pages ", anon_mb); ret1 = alloc_anon(anon_mb * MB); printf("Access %ld MB file pages ", file_mb); ret2 = access_file(filename, file_mb * MB, file_rounds); printf("Print result to prevent optimization: %ld ", *ret1 + ret2); return 0; } ---8<--- Running the test program on 2GB RAM VM with kernel 5.2.0-rc5, the program fills ram with 2048 MB memory, access a 200 MB file for 10 times. Without this patch, the file cache is dropped aggresively and every access to the file is from disk. $ ./thrash 2048 200 10 Allocate 2048 MB anonymous pages Access 200 MB file pages File access time, round 0: 2.489316 (sec) File access time, round 1: 2.581277 (sec) File access time, round 2: 2.487624 (sec) File access time, round 3: 2.449100 (sec) File access time, round 4: 2.420423 (sec) File access time, round 5: 2.343411 (sec) File access time, round 6: 2.454833 (sec) File access time, round 7: 2.483398 (sec) File access time, round 8: 2.572701 (sec) File access time, round 9: 2.493014 (sec) With this patch, these file pages can be cached. $ ./thrash 2048 200 10 Allocate 2048 MB anonymous pages Access 200 MB file pages File access time, round 0: 2.475189 (sec) File access time, round 1: 2.440777 (sec) File access time, round 2: 2.411671 (sec) File access time, round 3: 1.955267 (sec) File access time, round 4: 0.029924 (sec) File access time, round 5: 0.000808 (sec) File access time, round 6: 0.000771 (sec) File access time, round 7: 0.000746 (sec) File access time, round 8: 0.000738 (sec) File access time, round 9: 0.000747 (sec) Checked the swap out stats during the test [1], 19006 pages swapped out with this patch, 3418 pages swapped out without this patch. There are more swap out, but I think it's within reasonable range when file backed data set doesn't fit into the memory. $ ./thrash 2000 100 2100 5 1 # ANON_MB FILE_EXEC FILE_NOEXEC ROUNDS PROCESSES Allocate 2000 MB anonymous pages active_anon: 1613644, inactive_anon: 348656, active_file: 892, inactive_file: 1384 (kB) pswpout: 7972443, pgpgin: 478615246 Access 100 MB executable file pages Access 2100 MB regular file pages File access time, round 0: 12.165, (sec) active_anon: 1433788, inactive_anon: 478116, active_file: 17896, inactive_file: 24328 (kB) File access time, round 1: 11.493, (sec) active_anon: 1430576, inactive_anon: 477144, active_file: 25440, inactive_file: 26172 (kB) File access time, round 2: 11.455, (sec) active_anon: 1427436, inactive_anon: 476060, active_file: 21112, inactive_file: 28808 (kB) File access time, round 3: 11.454, (sec) active_anon: 1420444, inactive_anon: 473632, active_file: 23216, inactive_file: 35036 (kB) File access time, round 4: 11.479, (sec) active_anon: 1413964, inactive_anon: 471460, active_file: 31728, inactive_file: 32224 (kB) pswpout: 7991449 (+ 19006), pgpgin: 489924366 (+ 11309120) With 4 processes accessing non-overlapping parts of a large file, 30316 pages swapped out with this patch, 5152 pages swapped out without this patch. The swapout number is small comparing to pgpgin. [1]: https://github.com/vovo/testing/blob/master/mem_thrash.c Link: http://lkml.kernel.org/r/[email protected] Fixes: e9868505987a ("mm,vmscan: only evict file pages when we have plenty") Fixes: 7c5bd705d8f9 ("mm: memcg: only evict file pages when we have plenty") Signed-off-by: Kuo-Hsin Yang <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Sonny Rao <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Minchan Kim <[email protected]> Cc: <[email protected]> [4.12+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-07-12clk: consoldiate the __clk_get_hw() declarationsStephen Rothwell8-4/+13
Without this we were getting errors like: In file included from drivers/clk/clkdev.c:22:0: drivers/clk/clk.h:36:23: error: static declaration of '__clk_get_hw' follows non-static declaration include/linux/clk-provider.h:808:16: note: previous declaration of '__clk_get_hw' was here Fixes: 59fcdce425b7 ("clk: Remove ifdef for COMMON_CLK in clk-provider.h") fixes: 73e0e496afda ("clkdev: Always allocate a struct clk and call __clk_get() w/ CCF") Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Stephen Boyd <[email protected]>
2019-07-12SUNRPC: Fix transport accounting when caller specifies an rpc_xprtTrond Myklebust3-23/+24
Ensure that we do the required accounting for the round robin queue when the caller to rpc_init_task() has passed in a transport to be used. Reported-by: Olga Kornievskaia <[email protected]> Reported-by: Neil Brown <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2019-07-12perf vendor events s390: Add JSON files for machine type 8561Thomas Richter5-0/+576
Add CPU measurement counter facility event description files (JSON) for IBM machine types 8561 and 8562. Signed-off-by: Thomas Richter <[email protected]> Reviewed-by: Vasily Gorbik <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hendrik Brueckner <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2019-07-12Merge tag 'nfs-rdma-for-5.3-1' of ↵Trond Myklebust16-503/+837
git://git.linux-nfs.org/projects/anna/linux-nfs NFSoRDMA client updates for 5.3 New features: - Add a way to place MRs back on the free list - Reduce context switching - Add new trace events Bugfixes and cleanups: - Fix a BUG when tracing is enabled with NFSv4.1 - Fix a use-after-free in rpcrdma_post_recvs - Replace use of xdr_stream_pos in rpcrdma_marshal_req - Fix occasional transport deadlock - Fix show_nfs_errors macros, other tracing improvements - Remove RPCRDMA_REQ_F_PENDING and fr_state - Various simplifications and refactors
2019-07-12x86/vdso: Fix flip/flop vdso build bugNaohiro Aota1-6/+6
Two consecutive "make" on an already compiled kernel tree will show different behavior: $ make CALL scripts/checksyscalls.sh CALL scripts/atomic/check-atomics.sh DESCEND objtool CHK include/generated/compile.h VDSOCHK arch/x86/entry/vdso/vdso64.so.dbg VDSOCHK arch/x86/entry/vdso/vdso32.so.dbg Kernel: arch/x86/boot/bzImage is ready (#3) Building modules, stage 2. MODPOST 12 modules $ make make CALL scripts/checksyscalls.sh CALL scripts/atomic/check-atomics.sh DESCEND objtool CHK include/generated/compile.h VDSO arch/x86/entry/vdso/vdso64.so.dbg OBJCOPY arch/x86/entry/vdso/vdso64.so VDSO2C arch/x86/entry/vdso/vdso-image-64.c CC arch/x86/entry/vdso/vdso-image-64.o VDSO arch/x86/entry/vdso/vdso32.so.dbg OBJCOPY arch/x86/entry/vdso/vdso32.so VDSO2C arch/x86/entry/vdso/vdso-image-32.c CC arch/x86/entry/vdso/vdso-image-32.o AR arch/x86/entry/vdso/built-in.a AR arch/x86/entry/built-in.a AR arch/x86/built-in.a GEN .version CHK include/generated/compile.h UPD include/generated/compile.h CC init/version.o AR init/built-in.a LD vmlinux.o <snip> This is causing "LD vmlinux" once every two times even without any modifications. This is the same bug fixed in commit 92a4728608a8 ("x86/boot: Fix if_changed build flip/flop bug"). Two "if_changed" cannot be used in one target. Fix this merging two commands into one function. Fixes: 7ac870747988 ("x86/vdso: Switch to generic vDSO implementation") Signed-off-by: Naohiro Aota <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Vincenzo Frascino <[email protected]> Reviewed-by: Vincenzo Frascino <[email protected]> Reviewed-by: Masahiro Yamada <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2019-07-12MAINTAINERS: add entry for block io cgroupKonstantin Khlebnikov1-0/+13
This links mailing list [email protected] with related files. $ ./scripts/get_maintainer.pl -f block/blk-cgroup.c Jens Axboe <[email protected]> (maintainer:BLOCK LAYER) [email protected] (open list:CONTROL GROUP - BLOCK IO CONTROLLER (BLKIO)) [email protected] (open list:BLOCK LAYER) [email protected] (open list) Signed-off-by: Konstantin Khlebnikov <[email protected]> Added git tree/maintainer entries from Tejun. Signed-off-by: Jens Axboe <[email protected]>
2019-07-12RMDA/siw: Require a 64 bit archJason Gunthorpe1-1/+1
The new siw driver fails to build on i386 with drivers/infiniband/sw/siw/siw_qp.c:1025:3: error: invalid output size for constraint '+q' smp_store_mb(*cq->notify, SIW_NOTIFY_NOT); As it is using 64 bit values with the smp_store_mb. Since the entire scheme here seems questionable, and we are in the merge window, fix the compile failures by disabling 32 bit support on this driver. A proper fix will be reviewed post merge window. Fixes: c0cf5bdde46c ("rdma/siw: addition to kernel build environment") Reported-by: Arnd Bergmann <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-07-12null_blk: fixup ->report_zones() for !CONFIG_BLK_DEV_ZONEDJens Axboe1-1/+1
A previous commit changed the prototype, but didn't adjust the function for when zoned device support is disabled. Fix it up. Fixes: bd976e527259 ("block: Kill gfp_t argument of blkdev_report_zones()") Signed-off-by: Jens Axboe <[email protected]>
2019-07-12dm bufio: fix deadlock with loop deviceJunxiao Bi1-3/+1
When thin-volume is built on loop device, if available memory is low, the following deadlock can be triggered: One process P1 allocates memory with GFP_FS flag, direct alloc fails, memory reclaim invokes memory shrinker in dm_bufio, dm_bufio_shrink_scan() runs, mutex dm_bufio_client->lock is acquired, then P1 waits for dm_buffer IO to complete in __try_evict_buffer(). But this IO may never complete if issued to an underlying loop device that forwards it using direct-IO, which allocates memory using GFP_KERNEL (see: do_blockdev_direct_IO()). If allocation fails, memory reclaim will invoke memory shrinker in dm_bufio, dm_bufio_shrink_scan() will be invoked, and since the mutex is already held by P1 the loop thread will hang, and IO will never complete. Resulting in ABBA deadlock. Cc: [email protected] Signed-off-by: Junxiao Bi <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2019-07-12dm snapshot: add optional discard support featuresMike Snitzer2-21/+181
discard_zeroes_cow - a discard issued to the snapshot device that maps to entire chunks to will zero the corresponding exception(s) in the snapshot's exception store. discard_passdown_origin - a discard to the snapshot device is passed down to the snapshot-origin's underlying device. This doesn't cause copy-out to the snapshot exception store because the snapshot-origin target is bypassed. The discard_passdown_origin feature depends on the discard_zeroes_cow feature being enabled. When these 2 features are enabled they allow a temporarily read-only device that has completely exhausted its free space to recover space. To do so dm-snapshot provides temporary buffer to accommodate writes that the temporarily read-only device cannot handle yet. Once the upper layer frees space (e.g. fstrim to XFS) the discards issued to the dm-snapshot target will be issued to underlying read-only device whose free space was exhausted. In addition those discards will also cause zeroes to be written to the snapshot exception store if corresponding exceptions exist. If the underlying origin device provides deduplication for zero blocks then if/when the snapshot is merged backed to the origin those blocks will become unused. Once the origin has gained adequate space, merging the snapshot back to the thinly provisioned device will permit continued use of that device without the temporary space provided by the snapshot. Requested-by: John Dorminy <[email protected]> Signed-off-by: Mike Snitzer <[email protected]>
2019-07-12selftests/bpf: fix compiling loop{1, 2, 3}.c on s390Ilya Leoshkevich3-3/+3
Use PT_REGS_RC(ctx) instead of ctx->rax, which is not present on s390. Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Stanislav Fomichev <[email protected]> Tested-by: Stanislav Fomichev <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-07-12selftests/bpf: make PT_REGS_* work in userspaceIlya Leoshkevich1-20/+55
Right now, on certain architectures, these macros are usable only with kernel headers. This patch makes it possible to use them with userspace headers and, as a consequence, not only in BPF samples, but also in BPF selftests. On s390, provide the forward declaration of struct pt_regs and cast it to user_pt_regs in PT_REGS_* macros. This is necessary, because instead of the full struct pt_regs, s390 exposes only its first member user_pt_regs to userspace, and bpf_helpers.h is used with both userspace (in selftests) and kernel (in samples) headers. It was added in commit 466698e654e8 ("s390/bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type"). Ditto on arm64. On x86, provide userspace versions of PT_REGS_* macros. Unlike s390 and arm64, x86 provides struct pt_regs to both userspace and kernel, however, with different member names. Signed-off-by: Ilya Leoshkevich <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Reviewed-by: Stanislav Fomichev <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-07-12selftests/bpf: fix s930 -> s390 typoIlya Leoshkevich1-5/+5
Also check for __s390__ instead of __s390x__, just in case bpf_helpers.h is ever used by 32-bit userspace. Signed-off-by: Ilya Leoshkevich <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Reviewed-by: Stanislav Fomichev <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-07-12selftests/bpf: compile progs with -D__TARGET_ARCH_$(SRCARCH)Ilya Leoshkevich1-1/+3
This opens up the possibility of accessing registers in an arch-independent way. Signed-off-by: Ilya Leoshkevich <[email protected]> Reviewed-by: Stanislav Fomichev <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-07-12selftests/bpf: do not ignore clang failuresIlya Leoshkevich1-6/+7
When compiling an eBPF prog fails, make still returns 0, because failing clang command's output is piped to llc and therefore its exit status is ignored. When clang fails, pipe the string "clang failed" to llc. This will make llc fail with an informative error message. This solution was chosen over using pipefail, having separate targets or getting rid of llc invocation due to its simplicity. In addition, pull Kbuild.include in order to get .DELETE_ON_ERROR target, which would cause partial .o files to be removed. Signed-off-by: Ilya Leoshkevich <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-07-12tools: bpftool: add raw_tracepoint_writable prog type to headerDaniel T. Lee1-0/+1
From commit 9df1c28bb752 ("bpf: add writable context for raw tracepoints"), a new type of BPF_PROG, RAW_TRACEPOINT_WRITABLE has been added. Since this BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE is not listed at bpftool's header, it causes a segfault when executing 'bpftool feature'. This commit adds BPF_PROG_TYPE_RAW_TRACEPOINT_WRITABLE entry to prog_type_name enum, and will eventually fixes the segfault issue. Fixes: 9df1c28bb752 ("bpf: add writable context for raw tracepoints") Signed-off-by: Daniel T. Lee <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-07-12bpf: verifier: avoid fall-through warningsGustavo A. R. Silva1-0/+2
In preparation to enabling -Wimplicit-fallthrough, this patch silences the following warning: kernel/bpf/verifier.c: In function ‘check_return_code’: kernel/bpf/verifier.c:6106:6: warning: this statement may fall through [-Wimplicit-fallthrough=] if (env->prog->expected_attach_type == BPF_CGROUP_UDP4_RECVMSG || ^ kernel/bpf/verifier.c:6109:2: note: here case BPF_PROG_TYPE_CGROUP_SKB: ^~~~ Warning level 3 was used: -Wimplicit-fallthrough=3 Notice that is much clearer to explicitly add breaks in each case statement (that actually contains some code), rather than letting the code to fall through. This patch is part of the ongoing efforts to enable -Wimplicit-fallthrough. Signed-off-by: Gustavo A. R. Silva <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-07-12selftests/bpf: fix bpf_target_sparc checkIlya Leoshkevich1-2/+2
bpf_helpers.h fails to compile on sparc: the code should be checking for defined(bpf_target_sparc), but checks simply for bpf_target_sparc. Also change #ifdef bpf_target_powerpc to #if defined() for consistency. Signed-off-by: Ilya Leoshkevich <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Acked-by: David S. Miller <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-07-12xdp: fix potential deadlock on socket mutexIlya Maximets2-10/+8
There are 2 call chains: a) xsk_bind --> xdp_umem_assign_dev b) unregister_netdevice_queue --> xsk_notifier with the following locking order: a) xs->mutex --> rtnl_lock b) rtnl_lock --> xdp.lock --> xs->mutex Different order of taking 'xs->mutex' and 'rtnl_lock' could produce a deadlock here. Fix that by moving the 'rtnl_lock' before 'xs->lock' in the bind call chain (a). Reported-by: [email protected] Fixes: 455302d1c9ae ("xdp: fix hang while unregistering device bound to xdp socket") Signed-off-by: Ilya Maximets <[email protected]> Acked-by: Jonathan Lemon <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]>
2019-07-12platform/x86: Fix PCENGINES_APU2 Kconfig warningYueHaibing1-1/+1
Fix Kconfig warning for PCENGINES_APU2 symbol: WARNING: unmet direct dependencies detected for GPIO_AMD_FCH Depends on [n]: GPIOLIB [=n] && HAS_IOMEM [=y] Selected by [y]: - PCENGINES_APU2 [=y] && X86 [=y] && X86_PLATFORM_DEVICES [=y] && INPUT [=y] && INPUT_KEYBOARD [=y] && LEDS_CLASS [=y] WARNING: unmet direct dependencies detected for KEYBOARD_GPIO_POLLED Depends on [n]: !UML && INPUT [=y] && INPUT_KEYBOARD [=y] && GPIOLIB [=n] Selected by [y]: - PCENGINES_APU2 [=y] && X86 [=y] && X86_PLATFORM_DEVICES [=y] && INPUT [=y] && INPUT_KEYBOARD [=y] && LEDS_CLASS [=y] Add GPIOLIB dependency to fix it. Reported-by: Hulk Robot <[email protected]> Fixes: f8eb0235f659 ("x86: pcengines apuv2 gpio/leds/keys platform driver") Signed-off-by: YueHaibing <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]>
2019-07-12tools/power/x86/intel-speed-select: Add .gitignore filePrarit Bhargava1-0/+2
Add a .gitignore file for build include/ and final binary. Signed-off-by: Prarit Bhargava <[email protected]> Acked-by: Rafael J. Wysocki <[email protected]> Acked-by: Srinivas Pandruvada <[email protected]> Cc: Srinivas Pandruvada <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: David Arcari <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]>
2019-07-12platform/x86: mlx-platform: Fix error handling in mlxplat_init()Wei Yongjun1-1/+1
Add the missing platform_device_unregister() before return from mlxplat_init() in the error handling case. Fixes: 6b266e91a071 ("platform/x86: mlx-platform: Move regmap initialization before all drivers activation") Signed-off-by: Wei Yongjun <[email protected]> Signed-off-by: Andy Shevchenko <[email protected]>
2019-07-12drm/amd/powerplay: add pstate mclk(uclk) support for navi10Kevin Wang2-1/+8
add pstate mclk(uclk) support. Signed-off-by: Kevin Wang <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2019-07-12drm/amd/powerplay: fix smu clock type change miss errorKevin Wang1-2/+2
in the smu module, use the smu_xxxclk type to identify the CLK type use SMU_SCLK, SMU_MCLK to replace PP_SCLK, PP_MCLK. Signed-off-by: Kevin Wang <[email protected]> Reviewed-by: Feifei Xu <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>