aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-11-04clocksource/drivers/renesas-ostm: Use unique device name instead of ostmGeert Uytterhoeven1-5/+5
Currently all OSTM devices are called "ostm", also in kernel messages. As there can be multiple instances in an SoC, this can confuse the user. Hence construct a unique name from the DT node name, like is done for platform devices. On RSK+RZA1, the boot log changes like: -clocksource: ostm: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 57352151442 ns +clocksource: timer@fcfec000: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 57352151442 ns sched_clock: 32 bits at 33MHz, resolution 30ns, wraps every 64440619504ns -ostm: used for clocksource -ostm: used for clock events +/soc/timer@fcfec000: used for clocksource +/soc/timer@fcfec400: used for clock events ... -clocksource: Switched to clocksource ostm +clocksource: Switched to clocksource timer@fcfec000 Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-11-04clocksource/drivers/renesas-ostm: Convert to timer_ofGeert Uytterhoeven2-117/+73
Convert the Renesas OSTM driver to use the timer_of framework. This reduces the driver object size by 367 bytes (with gcc 7.4.0). Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-11-04clocksource/drivers/timer-of: Use unique device name instead of timerGeert Uytterhoeven1-1/+1
If a hardware-specific driver does not provide a name, the timer-of core falls back to device_node.name. Due to generic DT node naming policies, that name is almost always "timer", and thus doesn't identify the actual timer used. Fix this by using device_node.full_name instead, which includes the unit addrees. Example impact on /proc/timer_list: -Clock Event Device: timer +Clock Event Device: timer@fcfec400 Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Rob Herring <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-11-04clocksource/drivers/timer-of: Convert last full_name to %pOFGeert Uytterhoeven1-2/+2
Commit 469869d18a886e04 ("clocksource: Convert to using %pOF instead of full_name") converted all but the one just added before by commit 32f2fea6e77e64cd ("clocksource/drivers/timer-of: Handle of_irq_get_byname() result correctly"). Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Daniel Lezcano <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2019-09-27tick: broadcast-hrtimer: Fix a race in bc_set_nextBalasubramani Vivekanandan1-33/+29
When a cpu requests broadcasting, before starting the tick broadcast hrtimer, bc_set_next() checks if the timer callback (bc_handler) is active using hrtimer_try_to_cancel(). But hrtimer_try_to_cancel() does not provide the required synchronization when the callback is active on other core. The callback could have already executed tick_handle_oneshot_broadcast() and could have also returned. But still there is a small time window where the hrtimer_try_to_cancel() returns -1. In that case bc_set_next() returns without doing anything, but the next_event of the tick broadcast clock device is already set to a timeout value. In the race condition diagram below, CPU #1 is running the timer callback and CPU #2 is entering idle state and so calls bc_set_next(). In the worst case, the next_event will contain an expiry time, but the hrtimer will not be started which happens when the racing callback returns HRTIMER_NORESTART. The hrtimer might never recover if all further requests from the CPUs to subscribe to tick broadcast have timeout greater than the next_event of tick broadcast clock device. This leads to cascading of failures and finally noticed as rcu stall warnings Here is a depiction of the race condition CPU #1 (Running timer callback) CPU #2 (Enter idle and subscribe to tick broadcast) --------------------- --------------------- __run_hrtimer() tick_broadcast_enter() bc_handler() __tick_broadcast_oneshot_control() tick_handle_oneshot_broadcast() raw_spin_lock(&tick_broadcast_lock); dev->next_event = KTIME_MAX; //wait for tick_broadcast_lock //next_event for tick broadcast clock set to KTIME_MAX since no other cores subscribed to tick broadcasting raw_spin_unlock(&tick_broadcast_lock); if (dev->next_event == KTIME_MAX) return HRTIMER_NORESTART // callback function exits without restarting the hrtimer //tick_broadcast_lock acquired raw_spin_lock(&tick_broadcast_lock); tick_broadcast_set_event() clockevents_program_event() dev->next_event = expires; bc_set_next() hrtimer_try_to_cancel() //returns -1 since the timer callback is active. Exits without restarting the timer cpu_base->running = NULL; The comment that hrtimer cannot be armed from within the callback is wrong. It is fine to start the hrtimer from within the callback. Also it is safe to start the hrtimer from the enter/exit idle code while the broadcast handler is active. The enter/exit idle code and the broadcast handler are synchronized using tick_broadcast_lock. So there is no need for the existing try to cancel logic. All this can be removed which will eliminate the race condition as well. Fixes: 5d1638acb9f6 ("tick: Introduce hrtimer based broadcast") Originally-by: Thomas Gleixner <[email protected]> Signed-off-by: Balasubramani Vivekanandan <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2019-09-26Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds1-3/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix a timer expiry bug that would cause spurious delay of timers" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timer: Read jiffies once when forwarding base clk
2019-09-26Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds268-3617/+4801
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more perf updates from Ingo Molnar: "The only kernel change is comment typo fixes. The rest is mostly tooling fixes, but also new vendor event additions and updates, a bigger libperf/libtraceevent library and a header files reorganization that came in a bit late" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (108 commits) perf unwind: Fix libunwind build failure on i386 systems perf parser: Remove needless include directives perf build: Add detection of java-11-openjdk-devel package perf jvmti: Include JVMTI support for s390 perf vendor events: Remove P8 HW events which are not supported perf evlist: Fix access of freed id arrays perf stat: Fix free memory access / memory leaks in metrics perf tools: Replace needless mmap.h with what is needed, event.h perf evsel: Move config terms to a separate header perf evlist: Remove unused perf_evlist__fprintf() method perf evsel: Introduce evsel_fprintf.h perf evsel: Remove need for symbol_conf in evsel_fprintf.c perf copyfile: Move copyfile routines to separate files libperf: Add perf_evlist__poll() function libperf: Add perf_evlist__add_pollfd() function libperf: Add perf_evlist__alloc_pollfd() function libperf: Add libperf_init() call to the tests libperf: Merge libperf_set_print() into libperf_init() libperf: Add libperf dependency for tests targets libperf: Use sys/types.h to get ssize_t, not unistd.h ...
2019-09-26Merge tag 'trace-v5.4-2' of ↵Linus Torvalds2-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Srikar Dronamraju fixed a bug in the newmulti probe code" * tag 'trace-v5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/probe: Fix same probe event argument matching
2019-09-26perf unwind: Fix libunwind build failure on i386 systemsArnaldo Carvalho de Melo1-1/+1
Naresh Kamboju reported, that on the i386 build pr_err() doesn't get defined properly due to header ordering: perf-in.o: In function `libunwind__x86_reg_id': tools/perf/util/libunwind/../../arch/x86/util/unwind-libunwind.c:109: undefined reference to `pr_err' Reported-by: Naresh Kamboju <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Signed-off-by: Ingo Molnar <[email protected]>
2019-09-26Merge tag 'usercopy-v5.4-rc1' of ↵Linus Torvalds1-1/+7
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull usercopy fix from Kees Cook: "Fix hardened usercopy under CONFIG_DEBUG_VIRTUAL" * tag 'usercopy-v5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: usercopy: Avoid HIGHMEM pfn warning
2019-09-26Merge tag 'linux-kselftest-5.4-rc1.1' of ↵Linus Torvalds6-26/+48
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest updates from Shuah Khan: "Fixes to existing tests" * tag 'linux-kselftest-5.4-rc1.1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: tpm2: install python files selftests: livepatch: add missing fragments to config selftests: watchdog: cleanup whitespace in usage options selftest/ftrace: Fix typo in trigger-snapshot.tc selftests: watchdog: Add optional file argument selftests/seccomp: fix build on older kernels selftests: use "$(MAKE)" instead of "make"
2019-09-26Merge tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds29-580/+835
Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - Dequeue the request from the receive queue while we're re-encoding # v4.20+ - Fix buffer handling of GSS MIC without slack # 5.1 Features: - Increase xprtrdma maximum transport header and slot table sizes - Add support for nfs4_call_sync() calls using a custom rpc_task_struct - Optimize the default readahead size - Enable pNFS filelayout LAYOUTGET on OPEN Other bugfixes and cleanups: - Fix possible null-pointer dereferences and memory leaks - Various NFS over RDMA cleanups - Various NFS over RDMA comment updates - Don't receive TCP data into a reset request buffer - Don't try to parse incomplete RPC messages - Fix congestion window race with disconnect - Clean up pNFS return-on-close error handling - Fixes for NFS4ERR_OLD_STATEID handling" * tag 'nfs-for-5.4-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (53 commits) pNFS/filelayout: enable LAYOUTGET on OPEN NFS: Optimise the default readahead size NFSv4: Handle NFS4ERR_OLD_STATEID in LOCKU NFSv4: Handle NFS4ERR_OLD_STATEID in CLOSE/OPEN_DOWNGRADE NFSv4: Fix OPEN_DOWNGRADE error handling pNFS: Handle NFS4ERR_OLD_STATEID on layoutreturn by bumping the state seqid NFSv4: Add a helper to increment stateid seqids NFSv4: Handle RPC level errors in LAYOUTRETURN NFSv4: Handle NFS4ERR_DELAY correctly in return-on-close NFSv4: Clean up pNFS return-on-close error handling pNFS: Ensure we do clear the return-on-close layout stateid on fatal errors NFS: remove unused check for negative dentry NFSv3: use nfs_add_or_obtain() to create and reference inodes NFS: Refactor nfs_instantiate() for dentry referencing callers SUNRPC: Fix congestion window race with disconnect SUNRPC: Don't try to parse incomplete RPC messages SUNRPC: Rename xdr_buf_read_netobj to xdr_buf_read_mic SUNRPC: Fix buffer handling of GSS MIC without slack SUNRPC: RPC level errors should always set task->tk_rpc_status SUNRPC: Don't receive TCP data into a request buffer that has been reset ...
2019-09-26binfmt_elf: Do not move brk for INTERP-less ET_EXECKees Cook1-1/+2
When brk was moved for binaries without an interpreter, it should have been limited to ET_DYN only. In other words, the special case was an ET_DYN that lacks an INTERP, not just an executable that lacks INTERP. The bug manifested for giant static executables, where the brk would end up in the middle of the text area on 32-bit architectures. Reported-and-tested-by: Richard Kojedzinszky <[email protected]> Fixes: bbdc6076d2e5 ("binfmt_elf: move brk out of mmap when doing direct loader exec") Cc: [email protected] Signed-off-by: Kees Cook <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-26Merge tag 'xfs-5.4-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds5-21/+17
Pull xfs fixes from Darrick Wong: "There are a couple of bug fixes and some small code cleanups that came in recently: - Minor code cleanups - Fix a superblock logging error - Ensure that collapse range converts the data fork to extents format when necessary - Revert the ALLOC_USERDATA cleanup because it caused subtle behavior regressions" * tag 'xfs-5.4-merge-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: avoid unused to_mp() function warning xfs: log proper length of superblock xfs: revert 1baa2800e62d ("xfs: remove the unused XFS_ALLOC_USERDATA flag") xfs: removed unneeded variable xfs: convert inode to extent format after extent merge due to shift
2019-09-26Merge branch 'work.mount3' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull jffs2 fix from Al Viro: "braino fix for mount API conversion for jffs2" * 'work.mount3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: jffs2: Fix mounting under new mount API
2019-09-26Merge tag 's390-5.4-2' of ↵Linus Torvalds14-82/+334
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Vasily Gorbik: - Fix three kasan findings - Add PERF_EVENT_IOC_PERIOD ioctl support - Add Crypto Express7S support and extend sysfs attributes for pkey - Minor common I/O layer documentation corrections * tag 's390-5.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/cio: exclude subchannels with no parent from pseudo check s390/cio: avoid calling strlen on null pointer s390/topology: avoid firing events before kobjs are created s390/cpumf: Remove mixed white space s390/cpum_sf: Support ioctl PERF_EVENT_IOC_PERIOD s390/zcrypt: CEX7S exploitation support s390/cio: fix intparm documentation s390/pkey: Add sysfs attributes to emit AES CIPHER key blobs
2019-09-26Merge tag 'for-linus-5.4-rc1-tag' of ↵Linus Torvalds2-9/+17
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen update from Juergen Gross: "Only two small patches this time: - a small cleanup for swiotlb-xen - a fix for PCI initialization for some platforms" * tag 'for-linus-5.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pci: reserve MCFG areas earlier swiotlb-xen: Convert to use macro
2019-09-26Merge branch 'akpm' (patches from Andrew)Linus Torvalds114-564/+1005
Merge more updates from Andrew Morton: - almost all of the rest of -mm - various other subsystems Subsystems affected by this patch series: memcg, misc, core-kernel, lib, checkpatch, reiserfs, fat, fork, cpumask, kexec, uaccess, kconfig, kgdb, bug, ipc, lzo, kasan, madvise, cleanups, pagemap * emailed patches from Andrew Morton <[email protected]>: (77 commits) arch/sparc/include/asm/pgtable_64.h: fix build mm: treewide: clarify pgtable_page_{ctor,dtor}() naming ntfs: remove (un)?likely() from IS_ERR() conditions IB/hfi1: remove unlikely() from IS_ERR*() condition xfs: remove unlikely() from WARN_ON() condition wimax/i2400m: remove unlikely() from WARN*() condition fs: remove unlikely() from WARN_ON() condition xen/events: remove unlikely() from WARN() condition checkpatch: check for nested (un)?likely() calls hexagon: drop empty and unused free_initrd_mem mm: factor out common parts between MADV_COLD and MADV_PAGEOUT mm: introduce MADV_PAGEOUT mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM mm: introduce MADV_COLD mm: untag user pointers in mmap/munmap/mremap/brk vfio/type1: untag user pointers in vaddr_get_pfn tee/shm: untag user pointers in tee_shm_register media/v4l2-core: untag user pointers in videobuf_dma_contig_user_get drm/radeon: untag user pointers in radeon_gem_userptr_ioctl drm/amdgpu: untag user pointers ...
2019-09-26arch/sparc/include/asm/pgtable_64.h: fix buildAndrew Morton1-1/+1
A last-minute fixlet which I'd failed to merge at the appropriate time had the predictable effect. Fixes: f672e2c217e2d4b2 ("lib: untag user pointers in strn*_user") Cc: Andrey Konovalov <[email protected]> Cc: David Miller <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-26mm: treewide: clarify pgtable_page_{ctor,dtor}() namingMark Rutland26-48/+48
The naming of pgtable_page_{ctor,dtor}() seems to have confused a few people, and until recently arm64 used these erroneously/pointlessly for other levels of page table. To make it incredibly clear that these only apply to the PTE level, and to align with the naming of pgtable_pmd_page_{ctor,dtor}(), let's rename them to pgtable_pte_page_{ctor,dtor}(). These changes were generated with the following shell script: ---- git grep -lw 'pgtable_page_.tor' | while read FILE; do sed -i '{s/pgtable_page_ctor/pgtable_pte_page_ctor/}' $FILE; sed -i '{s/pgtable_page_dtor/pgtable_pte_page_dtor/}' $FILE; done ---- ... with the documentation re-flowed to remain under 80 columns, and whitespace fixed up in macros to keep backslashes aligned. There should be no functional change as a result of this patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mark Rutland <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> [m68k] Cc: Anshuman Khandual <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Yu Zhao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-26ntfs: remove (un)?likely() from IS_ERR() conditionsDenis Efremov4-9/+9
"likely(!IS_ERR(x))" is excessive. IS_ERR() already uses unlikely() internally. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Denis Efremov <[email protected]> Cc: Anton Altaparmakov <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-26IB/hfi1: remove unlikely() from IS_ERR*() conditionDenis Efremov1-1/+1
"unlikely(IS_ERR_OR_NULL(x))" is excessive. IS_ERR_OR_NULL() already uses unlikely() internally. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Denis Efremov <[email protected]> Cc: Mike Marciniszyn <[email protected]> Cc: Joe Perches <[email protected]> Acked-by: Dennis Dalessandro <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-26xfs: remove unlikely() from WARN_ON() conditionDenis Efremov1-2/+2
"unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely() internally. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Denis Efremov <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-26wimax/i2400m: remove unlikely() from WARN*() conditionDenis Efremov1-2/+1
"unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely() internally. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Denis Efremov <[email protected]> Cc: Inaky Perez-Gonzalez <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-26fs: remove unlikely() from WARN_ON() conditionDenis Efremov1-1/+1
"unlikely(WARN_ON(x))" is excessive. WARN_ON() already uses unlikely() internally. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Denis Efremov <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Joe Perches <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-26xen/events: remove unlikely() from WARN() conditionDenis Efremov1-1/+1
"unlikely(WARN(x))" is excessive. WARN() already uses unlikely() internally. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Denis Efremov <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Joe Perches <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-26jffs2: Fix mounting under new mount APIDavid Howells1-2/+0
The mounting of jffs2 is broken due to the changes from the new mount API because it specifies a "source" operation, but then doesn't actually process it. But because it specified it, it doesn't return -ENOPARAM and the caller doesn't process it either and the source gets lost. Fix this by simply removing the source parameter from jffs2 and letting the VFS deal with it in the default manner. To test it, enable CONFIG_MTD_MTDRAM and allow the default size and erase block size parameters, then try and mount the /dev/mtdblock<N> file that that creates as jffs2. No need to initialise it. Fixes: ec10a24f10c8 ("vfs: Convert jffs2 to use the new mount API") Reported-by: Al Viro <[email protected]> Signed-off-by: David Howells <[email protected]> cc: David Woodhouse <[email protected]> cc: Richard Weinberger <[email protected]> cc: [email protected] Signed-off-by: Al Viro <[email protected]>
2019-09-26Merge tag 'perf-core-for-mingo-5.5-20190925' of ↵Ingo Molnar128-1321/+1941
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf record: Stephane Eranian: - Fix priv level with branch sampling for paranoid=2, i.e. the kernel checks if perf_event_attr_attr.exclude_hv is set in addition to .exclude_kernel, so reset both to zero. Arnaldo Carvalho de Melo: - Don't warn about not being able to read kernel maps (kallsyms, etc) when kernel samples aren't being collected. perf list: Kim Phillips: - Allow plurals for metric, metricgroup., i.e.: $ perf list metrics was showing nothing, which is very confusing, make it work like: $ perf stat metric perf stat: Andi Kleen: - Free memory access/leaks detected via valgrind, related to metrics. Libraries: libperf: Jiri Olsa: - Move more stuff from tools/perf, this time a first stab at moving perf_mmap methods. libtracevent: Steven Rostedt (VMware): - Round up in tep_print_event() time precision. Tzvetomir Stoyanov (VMware): - Man pages for event print and related and plugins APIs. - Move traceevent plugins in its own subdirectory. Feature detection: Thomas Richter: - Add detection of java-11-openjdk-devel package, in addition to the older versions supported. Architecture specific: S/390: Thomas Richter (2): - Include JVMTI support for s390 Vendor events: AMD: Kim Phillips: - Add L3 cache events for Family 17h. - Remove redundant '['. PowerPC: Mamatha Inamdar: - Remove P8 HW events which are not supported. Cleanups: Arnaldo Carvalho de Melo: - Remove needless headers, add needed ones, move things around to reduce the headers dependency tree, speeding up builds by not doing needless compiles when unrelated stuff gets changed. - Ditch unused code that was dragging headers. Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2019-09-25checkpatch: check for nested (un)?likely() callsDenis Efremov1-0/+6
IS_ERR(), IS_ERR_OR_NULL(), IS_ERR_VALUE() and WARN*() already contain unlikely() optimization internally. Thus, there is no point in calling these functions and defines under likely()/unlikely(). This check is based on the coccinelle rule developed by Enrico Weigelt https://lore.kernel.org/lkml/[email protected]/ Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Denis Efremov <[email protected]> Cc: Joe Perches <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Anton Altaparmakov <[email protected]> Cc: Boris Ostrovsky <[email protected]> Cc: Boris Pismenny <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Denis Efremov <[email protected]> Cc: Dennis Dalessandro <[email protected]> Cc: Inaky Perez-Gonzalez <[email protected]> Cc: Juergen Gross <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mike Marciniszyn <[email protected]> Cc: Rob Clark <[email protected]> Cc: Saeed Mahameed <[email protected]> Cc: Sean Paul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25hexagon: drop empty and unused free_initrd_memMike Rapoport1-13/+0
hexagon never reserves or initializes initrd and the only mention of it is the empty free_initrd_mem() function. As we have a generic implementation of free_initrd_mem(), there is no need to define an empty stub for the hexagon implementation and it can be dropped. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Cc: Richard Kuo <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25mm: factor out common parts between MADV_COLD and MADV_PAGEOUTMinchan Kim1-147/+45
There are many common parts between MADV_COLD and MADV_PAGEOUT. This patch factor them out to save code duplication. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Suggested-by: Johannes Weiner <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Daniel Colascione <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: kbuild test robot <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Oleksandr Natalenko <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Sonny Rao <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tim Murray <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25mm: introduce MADV_PAGEOUTMinchan Kim8-0/+251
When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit intentionally because it's automatically bounded by PMD size. If PMD size(e.g., 256) makes some trouble, we could fix it later by limit it to SWAP_CLUSTER_MAX[1]. - man-page material MADV_PAGEOUT (since Linux x.x) Do not expect access in the near future so pages in the specified regions could be reclaimed instantly regardless of memory pressure. Thus, access in the range after successful operation could cause major page fault but never lose the up-to-date contents unlike MADV_DONTNEED. Pages belonging to a shared mapping are only processed if a write access is allowed for the calling process. MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [1] https://lore.kernel.org/lkml/[email protected]/ [[email protected]: clear PG_active on MADV_PAGEOUT] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Reported-by: kbuild test robot <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Daniel Colascione <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Oleksandr Natalenko <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Sonny Rao <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tim Murray <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIMMinchan Kim1-3/+3
The local variable references in shrink_page_list is PAGEREF_RECLAIM_CLEAN as default. It is for preventing to reclaim dirty pages when CMA try to migrate pages. Strictly speaking, we don't need it because CMA didn't allow to write out by .may_writepage = 0 in reclaim_clean_pages_from_list. Moreover, it has a problem to prevent anonymous pages's swap out even though force_reclaim = true in shrink_page_list on upcoming patch. So this patch makes references's default value to PAGEREF_RECLAIM and rename force_reclaim with ignore_references to make it more clear. This is a preparatory work for next patch. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Daniel Colascione <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: kbuild test robot <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Oleksandr Natalenko <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Sonny Rao <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tim Murray <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25mm: introduce MADV_COLDMinchan Kim10-4/+232
Patch series "Introduce MADV_COLD and MADV_PAGEOUT", v7. - Background The Android terminology used for forking a new process and starting an app from scratch is a cold start, while resuming an existing app is a hot start. While we continually try to improve the performance of cold starts, hot starts will always be significantly less power hungry as well as faster so we are trying to make hot start more likely than cold start. To increase hot start, Android userspace manages the order that apps should be killed in a process called ActivityManagerService. ActivityManagerService tracks every Android app or service that the user could be interacting with at any time and translates that into a ranked list for lmkd(low memory killer daemon). They are likely to be killed by lmkd if the system has to reclaim memory. In that sense they are similar to entries in any other cache. Those apps are kept alive for opportunistic performance improvements but those performance improvements will vary based on the memory requirements of individual workloads. - Problem Naturally, cached apps were dominant consumers of memory on the system. However, they were not significant consumers of swap even though they are good candidate for swap. Under investigation, swapping out only begins once the low zone watermark is hit and kswapd wakes up, but the overall allocation rate in the system might trip lmkd thresholds and cause a cached process to be killed(we measured performance swapping out vs. zapping the memory by killing a process. Unsurprisingly, zapping is 10x times faster even though we use zram which is much faster than real storage) so kill from lmkd will often satisfy the high zone watermark, resulting in very few pages actually being moved to swap. - Approach The approach we chose was to use a new interface to allow userspace to proactively reclaim entire processes by leveraging platform information. This allowed us to bypass the inaccuracy of the kernel’s LRUs for pages that are known to be cold from userspace and to avoid races with lmkd by reclaiming apps as soon as they entered the cached state. Additionally, it could provide many chances for platform to use much information to optimize memory efficiency. To achieve the goal, the patchset introduce two new options for madvise. One is MADV_COLD which will deactivate activated pages and the other is MADV_PAGEOUT which will reclaim private pages instantly. These new options complement MADV_DONTNEED and MADV_FREE by adding non-destructive ways to gain some free memory space. MADV_PAGEOUT is similar to MADV_DONTNEED in a way that it hints the kernel that memory region is not currently needed and should be reclaimed immediately; MADV_COLD is similar to MADV_FREE in a way that it hints the kernel that memory region is not currently needed and should be reclaimed when memory pressure rises. This patch (of 5): When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Pages in the specified regions will be treated as less-recently-accessed compared to pages in the system with similar access frequencies. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. [[email protected]: resolve conflicts with hmm.git] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Reported-by: kbuild test robot <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: James E.J. Bottomley <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Daniel Colascione <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Hillf Danton <[email protected]> Cc: Joel Fernandes (Google) <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Oleksandr Natalenko <[email protected]> Cc: Shakeel Butt <[email protected]> Cc: Sonny Rao <[email protected]> Cc: Suren Baghdasaryan <[email protected]> Cc: Tim Murray <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25mm: untag user pointers in mmap/munmap/mremap/brkCatalin Marinas2-5/+6
There isn't a good reason to differentiate between the user address space layout modification syscalls and the other memory permission/attributes ones (e.g. mprotect, madvise) w.r.t. the tagged address ABI. Untag the user addresses on entry to these functions. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Catalin Marinas <[email protected]> Acked-by: Will Deacon <[email protected]> Acked-by: Andrey Konovalov <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Szabolcs Nagy <[email protected]> Cc: Kevin Brodsky <[email protected]> Cc: Dave P Martin <[email protected]> Cc: Dave Hansen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25vfio/type1: untag user pointers in vaddr_get_pfnAndrey Konovalov1-0/+2
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. vaddr_get_pfn() uses provided user pointers for vma lookups, which can only by done with untagged pointers. Untag user pointers in this function. Link: http://lkml.kernel.org/r/87422b4d72116a975896f2b19b00f38acbd28f33.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Eric Auger <[email protected]> Reviewed-by: Vincenzo Frascino <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Will Deacon <[email protected]> Cc: Al Viro <[email protected]> Cc: Felix Kuehling <[email protected]> Cc: Jens Wiklander <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25tee/shm: untag user pointers in tee_shm_registerAndrey Konovalov1-0/+1
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. tee_shm_register()->optee_shm_unregister()->check_mem_type() uses provided user pointers for vma lookups (via __check_mem_type()), which can only by done with untagged pointers. Untag user pointers in this function. Link: http://lkml.kernel.org/r/4b993f33196b3566ac81285ff8453219e2079b45.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Jens Wiklander <[email protected]> Cc: Al Viro <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Auger <[email protected]> Cc: Felix Kuehling <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25media/v4l2-core: untag user pointers in videobuf_dma_contig_user_getAndrey Konovalov1-4/+5
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. videobuf_dma_contig_user_get() uses provided user pointers for vma lookups, which can only by done with untagged pointers. Untag the pointers in this function. Link: http://lkml.kernel.org/r/100436d5f8e4349a78f27b0bbb27e4801fcb946b.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Mauro Carvalho Chehab <[email protected]> Cc: Al Viro <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Auger <[email protected]> Cc: Felix Kuehling <[email protected]> Cc: Jens Wiklander <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25drm/radeon: untag user pointers in radeon_gem_userptr_ioctlAndrey Konovalov1-0/+2
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. In radeon_gem_userptr_ioctl() an MMU notifier is set up with a (tagged) userspace pointer. The untagged address should be used so that MMU notifiers for the untagged address get correctly matched up with the right BO. This funcation also calls radeon_ttm_tt_pin_userptr(), which uses provided user pointers for vma lookups, which can only by done with untagged pointers. This patch untags user pointers in radeon_gem_userptr_ioctl(). Link: http://lkml.kernel.org/r/c856babeb67195b35603b8d5ba386a2819cec5ff.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Reviewed-by: Kees Cook <[email protected]> Suggested-by: Felix Kuehling <[email protected]> Acked-by: Felix Kuehling <[email protected]> Cc: Al Viro <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Auger <[email protected]> Cc: Jens Wiklander <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25drm/amdgpu: untag user pointersAndrey Konovalov2-1/+3
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. In amdgpu_gem_userptr_ioctl() and amdgpu_amdkfd_gpuvm.c/init_user_pages() an MMU notifier is set up with a (tagged) userspace pointer. The untagged address should be used so that MMU notifiers for the untagged address get correctly matched up with the right BO. This patch untag user pointers in amdgpu_gem_userptr_ioctl() for the GEM case and in amdgpu_amdkfd_gpuvm_ alloc_memory_of_gpu() for the KFD case. This also makes sure that an untagged pointer is passed to amdgpu_ttm_tt_get_user_pages(), which uses it for vma lookups. Link: http://lkml.kernel.org/r/d684e1df08f2ecb6bc292e222b64fa9efbc26e69.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Kees Cook <[email protected]> Suggested-by: Felix Kuehling <[email protected]> Acked-by: Felix Kuehling <[email protected]> Cc: Al Viro <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Auger <[email protected]> Cc: Jens Wiklander <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Vincenzo Frascino <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25userfaultfd: untag user pointersAndrey Konovalov1-10/+12
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. userfaultfd code use provided user pointers for vma lookups, which can only by done with untagged pointers. Untag user pointers in validate_range(). Link: http://lkml.kernel.org/r/cdc59ddd7011012ca2e689bc88c3b65b1ea7e413.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Reviewed-by: Vincenzo Frascino <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Al Viro <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Auger <[email protected]> Cc: Felix Kuehling <[email protected]> Cc: Jens Wiklander <[email protected]> Cc: Khalid Aziz <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25fs/namespace: untag user pointers in copy_mount_optionsAndrey Konovalov1-1/+1
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. In copy_mount_options a user address is being subtracted from TASK_SIZE. If the address is lower than TASK_SIZE, the size is calculated to not allow the exact_copy_from_user() call to cross TASK_SIZE boundary. However if the address is tagged, then the size will be calculated incorrectly. Untag the address before subtracting. Link: http://lkml.kernel.org/r/1de225e4a54204bfd7f25dac2635e31aa4aa1d90.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Reviewed-by: Vincenzo Frascino <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Cc: Al Viro <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Auger <[email protected]> Cc: Felix Kuehling <[email protected]> Cc: Jens Wiklander <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25mm: untag user pointers in get_vaddr_framesAndrey Konovalov1-0/+2
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. get_vaddr_frames uses provided user pointers for vma lookups, which can only by done with untagged pointers. Instead of locating and changing all callers of this function, perform untagging in it. Link: http://lkml.kernel.org/r/28f05e49c92b2a69c4703323d6c12208f3d881fe.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Reviewed-by: Vincenzo Frascino <[email protected]> Acked-by: Catalin Marinas <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Al Viro <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Auger <[email protected]> Cc: Felix Kuehling <[email protected]> Cc: Jens Wiklander <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25mm: untag user pointers in mm/gup.cAndrey Konovalov1-0/+4
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. mm/gup.c provides a kernel interface that accepts user addresses and manipulates user pages directly (for example get_user_pages, that is used by the futex syscall). Since a user can provided tagged addresses, we need to handle this case. Add untagging to gup.c functions that use user addresses for vma lookups. Link: http://lkml.kernel.org/r/4731bddba3c938658c10ff4ed55cc01c60f4c8f8.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Reviewed-by: Vincenzo Frascino <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Cc: Al Viro <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Auger <[email protected]> Cc: Felix Kuehling <[email protected]> Cc: Jens Wiklander <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25mm: untag user pointers passed to memory syscallsAndrey Konovalov8-1/+23
This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. This patch allows tagged pointers to be passed to the following memory syscalls: get_mempolicy, madvise, mbind, mincore, mlock, mlock2, mprotect, mremap, msync, munlock, move_pages. The mmap and mremap syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma. Link: http://lkml.kernel.org/r/aaf0c0969d46b2feb9017f3e1b3ef3970b633d91.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Reviewed-by: Vincenzo Frascino <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Reviewed-by: Kees Cook <[email protected]> Cc: Al Viro <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Auger <[email protected]> Cc: Felix Kuehling <[email protected]> Cc: Jens Wiklander <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25lib: untag user pointers in strn*_userAndrey Konovalov3-4/+7
Patch series "arm64: untag user pointers passed to the kernel", v19. === Overview arm64 has a feature called Top Byte Ignore, which allows to embed pointer tags into the top byte of each pointer. Userspace programs (such as HWASan, a memory debugging tool [1]) might use this feature and pass tagged user pointers to the kernel through syscalls or other interfaces. Right now the kernel is already able to handle user faults with tagged pointers, due to these patches: 1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a tagged pointer") 2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged pointers") 3. 276e9327 ("arm64: entry: improve data abort handling of tagged pointers") This patchset extends tagged pointer support to syscall arguments. As per the proposed ABI change [3], tagged pointers are only allowed to be passed to syscalls when they point to memory ranges obtained by anonymous mmap() or sbrk() (see the patchset [3] for more details). For non-memory syscalls this is done by untaging user pointers when the kernel performs pointer checking to find out whether the pointer comes from userspace (most notably in access_ok). The untagging is done only when the pointer is being checked, the tag is preserved as the pointer makes its way through the kernel and stays tagged when the kernel dereferences the pointer when perfoming user memory accesses. The mmap and mremap (only new_addr) syscalls do not currently accept tagged addresses. Architectures may interpret the tag as a background colour for the corresponding vma. Other memory syscalls (mprotect, etc.) don't do user memory accesses but rather deal with memory ranges, and untagged pointers are better suited to describe memory ranges internally. Thus for memory syscalls we untag pointers completely when they enter the kernel. === Other approaches One of the alternative approaches to untagging that was considered is to completely strip the pointer tag as the pointer enters the kernel with some kind of a syscall wrapper, but that won't work with the countless number of different ioctl calls. With this approach we would need a custom wrapper for each ioctl variation, which doesn't seem practical. An alternative approach to untagging pointers in memory syscalls prologues is to inspead allow tagged pointers to be passed to find_vma() (and other vma related functions) and untag them there. Unfortunately, a lot of find_vma() callers then compare or subtract the returned vma start and end fields against the pointer that was being searched. Thus this approach would still require changing all find_vma() callers. === Testing The following testing approaches has been taken to find potential issues with user pointer untagging: 1. Static testing (with sparse [2] and separately with a custom static analyzer based on Clang) to track casts of __user pointers to integer types to find places where untagging needs to be done. 2. Static testing with grep to find parts of the kernel that call find_vma() (and other similar functions) or directly compare against vm_start/vm_end fields of vma. 3. Static testing with grep to find parts of the kernel that compare user pointers with TASK_SIZE or other similar consts and macros. 4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running a modified syzkaller version that passes tagged pointers to the kernel. Based on the results of the testing the requried patches have been added to the patchset. === Notes This patchset is meant to be merged together with "arm64 relaxed ABI" [3]. This patchset is a prerequisite for ARM's memory tagging hardware feature support [4]. This patchset has been merged into the Pixel 2 & 3 kernel trees and is now being used to enable testing of Pixel phones with HWASan. Thanks! [1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html [2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060e0145f292 [3] https://lkml.org/lkml/2019/6/12/745 [4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architecture-2018-developments-armv85a This patch (of 11) This patch is a part of a series that extends kernel ABI to allow to pass tagged user pointers (with the top byte set to something else other than 0x00) as syscall arguments. strncpy_from_user and strnlen_user accept user addresses as arguments, and do not go through the same path as copy_from_user and others, so here we need to handle the case of tagged user addresses separately. Untag user pointers passed to these functions. Note, that this patch only temporarily untags the pointers to perform validity checks, but then uses them as is to perform user memory accesses. [[email protected]: fix sparc4 build] Link: http://lkml.kernel.org/r/CAAeHK+yx4a-P0sDrXTUxMvO2H0CJZUFPffBrg_cU7oJOZyC7ew@mail.gmail.com Link: http://lkml.kernel.org/r/c5a78bcad3e94d6cda71fcaa60a423231ae71e4c.1563904656.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <[email protected]> Reviewed-by: Vincenzo Frascino <[email protected]> Reviewed-by: Khalid Aziz <[email protected]> Acked-by: Kees Cook <[email protected]> Reviewed-by: Catalin Marinas <[email protected]> Cc: Al Viro <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Auger <[email protected]> Cc: Felix Kuehling <[email protected]> Cc: Jens Wiklander <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25lib/lzo/lzo1x_compress.c: fix alignment bug in lzo-rleDave Rodgman1-6/+8
Fix an unaligned access which breaks on platforms where this is not permitted (e.g., Sparc). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Dave Rodgman <[email protected]> Cc: Dave Rodgman <[email protected]> Cc: Markus F.X.J. Oberhumer <[email protected]> Cc: Minchan Kim <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25ipc/sem.c: convert to use built-in RCU list checkingJoel Fernandes (Google)1-1/+2
CONFIG_PROVE_RCU_LIST requires list_for_each_entry_rcu() to pass a lockdep expression if using srcu or locking for protection. It can only check regular RCU protection, all other protection needs to be passed as lockdep expression. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joel Fernandes (Google) <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Bjorn Helgaas <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: "Gustavo A. R. Silva" <[email protected]> Cc: Jonathan Derrick <[email protected]> Cc: Keith Busch <[email protected]> Cc: Lorenzo Pieralisi <[email protected]> Cc: "Paul E. McKenney" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25ipc/mqueue: improve exception handling in do_mq_notify()Markus Elfring1-12/+8
Null pointers were assigned to local variables in a few cases as exception handling. The jump target “out” was used where no meaningful data processing actions should eventually be performed by branches of an if statement then. Use an additional jump target for calling dev_kfree_skb() directly. Return also directly after error conditions were detected when no extra clean-up is needed by this function implementation. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Markus Elfring <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2019-09-25ipc/mqueue.c: delete an unnecessary check before the macro call dev_kfree_skb()Markus Elfring1-1/+1
dev_kfree_skb() input parameter validation, thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Markus Elfring <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>