aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2018-04-23earlycon: Use a pointer table to fix __earlycon_table strideDaniel Kurtz1-7/+14
Commit 99492c39f39f ("earlycon: Fix __earlycon_table stride") tried to fix __earlycon_table stride by forcing the earlycon_id struct alignment to 32 and asking the linker to 32-byte align the __earlycon_table symbol. This fix was based on commit 07fca0e57fca92 ("tracing: Properly align linker defined symbols") which tried a similar fix for the tracing subsystem. However, this fix doesn't quite work because there is no guarantee that gcc will place structures packed into an array format. In fact, gcc 4.9 chooses to 64-byte align these structs by inserting additional padding between the entries because it has no clue that they are supposed to be in an array. If we are unlucky, the linker will assign symbol "__earlycon_table" to a 32-byte aligned address which does not correspond to the 64-byte aligned contents of section "__earlycon_table". To address this same problem, the fix to the tracing system was subsequently re-implemented using a more robust table of pointers approach by commits: 3d56e331b653 ("tracing: Replace syscall_meta_data struct array with pointer array") 654986462939 ("tracepoints: Fix section alignment using pointer array") e4a9ea5ee7c8 ("tracing: Replace trace_event struct array with pointer array") Let's use this same "array of pointers to structs" approach for EARLYCON_TABLE. Fixes: 99492c39f39f ("earlycon: Fix __earlycon_table stride") Signed-off-by: Daniel Kurtz <[email protected]> Suggested-by: Aaron Durbin <[email protected]> Reviewed-by: Rob Herring <[email protected]> Tested-by: Guenter Roeck <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2018-04-15Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2-19/+74
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "A set of fixes and updates for x86: - Address a swiotlb regression which was caused by the recent DMA rework and made driver fail because dma_direct_supported() returned false - Fix a signedness bug in the APIC ID validation which caused invalid APIC IDs to be detected as valid thereby bloating the CPU possible space. - Fix inconsisten config dependcy/select magic for the MFD_CS5535 driver. - Fix a corruption of the physical address space bits when encryption has reduced the address space and late cpuinfo updates overwrite the reduced bit information with the original value. - Dominiks syscall rework which consolidates the architecture specific syscall functions so all syscalls can be wrapped with the same macros. This allows to switch x86/64 to struct pt_regs based syscalls. Extend the clearing of user space controlled registers in the entry patch to the lower registers" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/apic: Fix signedness bug in APIC ID validity checks x86/cpu: Prevent cpuinfo_x86::x86_phys_bits adjustment corruption x86/olpc: Fix inconsistent MFD_CS5535 configuration swiotlb: Use dma_direct_supported() for swiotlb_ops syscalls/x86: Adapt syscall_wrapper.h to the new syscall stub naming convention syscalls/core, syscalls/x86: Rename struct pt_regs-based sys_*() to __x64_sys_*() syscalls/core, syscalls/x86: Clean up compat syscall stub naming convention syscalls/core, syscalls/x86: Clean up syscall stub naming convention syscalls/x86: Extend register clearing on syscall entry to lower registers syscalls/x86: Unconditionally enable 'struct pt_regs' based syscalls on x86_64 syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32 syscalls/core: Prepare CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y for compat syscalls syscalls/x86: Use 'struct pt_regs' based syscall calling convention for 64-bit syscalls syscalls/core: Introduce CONFIG_ARCH_HAS_SYSCALL_WRAPPER=y x86/syscalls: Don't pointlessly reload the system call number x86/mm: Fix documentation of module mapping range with 4-level paging x86/cpuid: Switch to 'static const' specifier
2018-04-15Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "A few scheduler fixes: - Prevent a bogus warning vs. runqueue clock update flags in do_sched_rt_period_timer() - Simplify the helper functions which handle requests for skipping the runqueue clock updat. - Do not unlock the tunables mutex in the error path of the cpu frequency scheduler utils. Its not held. - Enforce proper alignement for 'struct util_est' in sched_avg to prevent a misalignment fault on IA64" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Force proper alignment of 'struct util_est' sched/core: Simplify helpers for rq clock update skip requests sched/rt: Fix rq->clock_update_flags < RQCF_ACT_SKIP warning sched/cpufreq/schedutil: Fix error path mutex unlock
2018-04-14Merge branch 'akpm' (patches from Andrew)Linus Torvalds2-28/+83
Merge yet more updates from Andrew Morton: - various hotfixes - kexec_file updates and feature work * emailed patches from Andrew Morton <[email protected]>: (27 commits) kernel/kexec_file.c: move purgatories sha256 to common code kernel/kexec_file.c: allow archs to set purgatory load address kernel/kexec_file.c: remove mis-use of sh_offset field during purgatory load kernel/kexec_file.c: remove unneeded variables in kexec_purgatory_setup_sechdrs kernel/kexec_file.c: remove unneeded for-loop in kexec_purgatory_setup_sechdrs kernel/kexec_file.c: split up __kexec_load_puragory kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations* kernel/kexec_file.c: search symbols in read-only kexec_purgatory kernel/kexec_file.c: make purgatory_info->ehdr const kernel/kexec_file.c: remove checks in kexec_purgatory_load include/linux/kexec.h: silence compile warnings kexec_file, x86: move re-factored code to generic side x86: kexec_file: clean up prepare_elf64_headers() x86: kexec_file: lift CRASH_MAX_RANGES limit on crash_mem buffer x86: kexec_file: remove X86_64 dependency from prepare_elf64_headers() x86: kexec_file: purge system-ram walking from prepare_elf64_headers() kexec_file,x86,powerpc: factor out kexec_file_ops functions kexec_file: make use of purgatory optional proc: revalidate misc dentries mm, slab: reschedule cache_reap() on the same CPU ...
2018-04-13kernel/kexec_file.c: move purgatories sha256 to common codePhilipp Rudo1-0/+30
The code to verify the new kernels sha digest is applicable for all architectures. Move it to common code. One problem is the string.c implementation on x86. Currently sha256 includes x86/boot/string.h which defines memcpy and memset to be gcc builtins. By moving the sha256 implementation to common code and changing the include to linux/string.h both functions are no longer defined. Thus definitions have to be provided in x86/purgatory/string.c Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Philipp Rudo <[email protected]> Acked-by: Dave Young <[email protected]> Cc: AKASHI Takahiro <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Thiago Jung Bauermann <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-13kernel/kexec_file.c: allow archs to set purgatory load addressPhilipp Rudo1-11/+6
For s390 new kernels are loaded to fixed addresses in memory before they are booted. With the current code this is a problem as it assumes the kernel will be loaded to an 'arbitrary' address. In particular, kexec_locate_mem_hole searches for a large enough memory region and sets the load address (kexec_bufer->mem) to it. Luckily there is a simple workaround for this problem. By returning 1 in arch_kexec_walk_mem, kexec_locate_mem_hole is turned off. This allows the architecture to set kbuf->mem by hand. While the trick works fine for the kernel it does not for the purgatory as here the architectures don't have access to its kexec_buffer. Give architectures access to the purgatories kexec_buffer by changing kexec_load_purgatory to take a pointer to it. With this change architectures have access to the buffer and can edit it as they need. A nice side effect of this change is that we can get rid of the purgatory_info->purgatory_load_address field. As now the information stored there can directly be accessed from kbuf->mem. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Philipp Rudo <[email protected]> Reviewed-by: Martin Schwidefsky <[email protected]> Acked-by: Dave Young <[email protected]> Cc: AKASHI Takahiro <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Thiago Jung Bauermann <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-13kernel/kexec_file.c: use read-only sections in arch_kexec_apply_relocations*Philipp Rudo1-4/+9
When the relocations are applied to the purgatory only the section the relocations are applied to is writable. The other sections, i.e. the symtab and .rel/.rela, are in read-only kexec_purgatory. Highlight this by marking the corresponding variables as 'const'. While at it also change the signatures of arch_kexec_apply_relocations* to take section pointers instead of just the index of the relocation section. This removes the second lookup and sanity check of the sections in arch code. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Philipp Rudo <[email protected]> Acked-by: Dave Young <[email protected]> Cc: AKASHI Takahiro <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Thiago Jung Bauermann <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-13kernel/kexec_file.c: make purgatory_info->ehdr constPhilipp Rudo1-6/+11
The kexec_purgatory buffer is read-only. Thus all pointers into kexec_purgatory are read-only, too. Point this out by explicitly marking purgatory_info->ehdr as 'const' and update the comments in purgatory_info. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Philipp Rudo <[email protected]> Acked-by: Dave Young <[email protected]> Cc: AKASHI Takahiro <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Thiago Jung Bauermann <[email protected]> Cc: Vivek Goyal <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-13include/linux/kexec.h: silence compile warningsPhilipp Rudo1-0/+2
Patch series "kexec_file: Clean up purgatory load", v2. Following the discussion with Dave and AKASHI, here are the common code patches extracted from my recent patch set (Add kexec_file_load support to s390) [1]. The patches were extracted to allow upstream integration together with AKASHI's common code patches before the arch code gets adjusted to the new base. The reason for this series is to prepare common code for adding kexec_file_load to s390 as well as cleaning up the mis-use of the sh_offset field during purgatory load. In detail this series contains: Patch #1&2: Minor cleanups/fixes. Patch #3-9: Clean up the purgatory load/relocation code. Especially remove the mis-use of the purgatory_info->sechdrs->sh_offset field, currently holding a pointer into either kexec_purgatory (ro) or purgatory_buf (rw) depending on the section. With these patches the section address will be calculated verbosely and sh_offset will contain the offset of the section in the stripped purgatory binary (purgatory_buf). Patch #10: Allows architectures to set the purgatory load address. This patch is important for s390 as the kernel and purgatory have to be loaded to fixed addresses. In current code this is impossible as the purgatory load is opaque to the architecture. Patch #11: Moves x86 purgatories sha implementation to common lib/ directory to allow reuse in other architectures. This patch (of 11) When building the kernel with CONFIG_KEXEC_FILE enabled gcc prints a compile warning multiple times. In file included from <path>/linux/init/initramfs.c:526:0: <path>/include/linux/kexec.h:120:9: warning: `struct kimage' declared inside parameter list [enabled by default] unsigned long cmdline_len); ^ This is because the typedefs for kexec_file_load uses struct kimage before it is declared. Fix this by simply forward declaring struct kimage. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Philipp Rudo <[email protected]> Acked-by: Dave Young <[email protected]> Cc: Eric Biederman <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Thiago Jung Bauermann <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: AKASHI Takahiro <[email protected]> Cc: Ingo Molnar <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-13kexec_file, x86: move re-factored code to generic sideAKASHI Takahiro1-0/+19
In the previous patches, commonly-used routines, exclude_mem_range() and prepare_elf64_headers(), were carved out. Now place them in kexec common code. A prefix "crash_" is given to each of their names to avoid possible name collisions. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: AKASHI Takahiro <[email protected]> Acked-by: Dave Young <[email protected]> Tested-by: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Baoquan He <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-13kexec_file,x86,powerpc: factor out kexec_file_ops functionsAKASHI Takahiro1-7/+6
As arch_kexec_kernel_image_{probe,load}(), arch_kimage_file_post_load_cleanup() and arch_kexec_kernel_verify_sig() are almost duplicated among architectures, they can be commonalized with an architecture-defined kexec_file_ops array. So let's factor them out. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: AKASHI Takahiro <[email protected]> Acked-by: Dave Young <[email protected]> Tested-by: Dave Young <[email protected]> Cc: Vivek Goyal <[email protected]> Cc: Baoquan He <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Thiago Jung Bauermann <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-13Merge branch 'next' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux Pull thermal management update from Zhang Rui: - Fix race condition in imx_thermal_probe() (Mikhail Lappo) - Add cooling device's statistics in sysfs (Viresh Kumar) * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: thermal: Add cooling device's statistics in sysfs thermal: imx: Fix race condition in imx_thermal_probe()
2018-04-13Merge branch 'dmi-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging Pull dmi updates from Jean Delvare. * 'dmi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging: firmware: dmi_scan: Use lowercase letters for UUID firmware: dmi_scan: Add DMI_OEM_STRING support to dmi_matches firmware: dmi_scan: Fix UUID length safety check
2018-04-13Merge tag 'chrome-platform-for-linus-4.17' of ↵Linus Torvalds3-31/+5
git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform Pull chrome platform updates from Benson Leung: - a series from Dmitry to remove platform data from chromeos_laptop.c, which was the only user of platform data for the atmel_mxt_ts driver. - a series to clean up sysfs and debugfs for cros_ec - other misc cleanups * tag 'chrome-platform-for-linus-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/bleung/chrome-platform: (22 commits) platform/chrome: mfd/cros_ec_dev: Add sysfs entry to set keyboard wake lid angle platform/chrome: cros_ec_debugfs: Add PD port info to debugfs platform/chrome: cros_ec_debugfs: Use octal permissions '0444' platform/chrome: cros_ec_sysfs: use permission-specific DEVICE_ATTR variants platform/chrome: cros_ec_sysfs: introduce to_cros_ec_dev define. platform/chrome: cros_ec_sysfs: Modify error handling platform/chrome: cros_ec_lpc: Add support for Google devices using custom coreboot firmware platform/chrome: cros_ec_lpc: wake up from s2idle on Chrome EC Input: atmel_mxt_ts - remove platform data support platform/chrome: chromeos_laptop - discard data for unneeded boards platform/chrome: chromeos_laptop - use device properties for Pixel platform/chrome: chromeos_laptop - rely on I2C to set up interrupt trigger platform/chrome: chromeos_laptop - use I2C notifier to create devices platform/chrome: chromeos_laptop - parse DMI IRQ data once platform/chrome: chromeos_laptop - rework i2c peripherals initialization platform/chrome: chromeos_laptop - factor out getting IRQ from DMI platform/chrome: chromeos_laptop - introduce pr_fmt() platform/chrome: chromeos_laptop - stop setting suspend mode for Atmel devices platform/chrome: chromeos_laptop - add SPDX identifier Input: atmel_mxt_ts - switch ChromeOS ACPI devices to generic props ...
2018-04-13Merge tag 'clk-for-linus' of ↵Linus Torvalds6-9/+75
git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux Pull clk updates from Stephen Boyd: "The large diff this time around is from the addition of a new clk driver for the TI Davinci family of SoCs. So far those clks have been supported with a custom implementation of the clk API in the arch port instead of in the CCF. With this driver merged we're one step closer to having a single clk API implementation. The other large diff is from the Amlogic clk driver that underwent some major surgery to use regmap. Beyond that, the biggest hitter is Samsung which needed some reworks to properly handle clk provider power domains and a bunch of PLL rate updates. The core framework was fairly quiet this round, just getting some cleanups and small fixes for some of the more esoteric features. And the usual set of driver non-critical fixes, cleanups, and minor additions are here as well. Core: - Rejig clk_ops::init() to be a little earlier for phase/accuracy ops - debugfs ops macroized to shave some lines of boilerplate code - Always calculate the phase instead of caching it in clk_get_phase() - More __must_check on bulk clk APIs New Drivers: - TI's Davinci family of SoCs - Intel's Stratix10 SoC - stm32mp157 SoC - Allwinner H6 CCU - Silicon Labs SI544 clock generator chip - Renesas R-Car M3-N and V3H SoCs - i.MX6SLL SoCs Removed Drivers: - ST-Ericsson AB8540/9540 Updates: - Mediatek MT2701 and MT7622 audsys support and MT2712 updates - STM32F469 DSI and STM32F769 sdmmc2 support - GPIO clks can sleep now - Spreadtrum SC9860 RTC clks - Nvidia Tegra MBIST workarounds and various minor fixes - Rockchip phase handling fixes and a memory leak plugged - Renesas drivers switch to readl/writel from clk_readl/clk_writel - Renesas gained CPU (Z/Z2) and watchdog support - Rockchip rk3328 display clks and rk3399 1.6GHz PLL support - Qualcomm PM8921 PMIC XO buffers - Amlogic migrates to regmap APIs - TI Keystone clk latching support - Allwinner H3 and H5 video clk fixes - Broadcom BCM2835 PLLs needed another bit to enable - i.MX6SX CKO mux fix and i.MX7D Video PLL divider fix - i.MX6UL/ULL epdc_podf support - Hi3798CV200 COMBPHY0 and USB2_OTG_UTMI and phase support for eMMC" * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (233 commits) clk: davinci: add a reset lookup table for psc0 clk: imx: add clock driver for imx6sll dt-bindings: imx: update clock doc for imx6sll clk: imx: add new gate/gate2 wrapper funtion clk: imx: Add CLK_IS_CRITICAL flag for busy divider and busy mux clk: cs2000: set pm_ops in hibernate-compatible way clk: bcm2835: De-assert/assert PLL reset signal when appropriate clk: imx7d: Move clks_init_on before any clock operations clk: imx7d: Correct ahb clk parent select clk: imx7d: Correct dram pll type clk: imx7d: Add USB clock information clk: socfpga: stratix10: add clock driver for Stratix10 platform dt-bindings: documentation: add clock bindings information for Stratix10 clk: ti: fix flag space conflict with clkctrl clocks clk: uniphier: add additional ethernet clock lines for Pro4 clk: uniphier: add SATA clock control support clk: uniphier: add PCIe clock control support clk: Add driver for the si544 clock generator chip clk: davinci: Remove redundant dev_err calls clk: uniphier: add ethernet clock control support for PXs3 ...
2018-04-13Merge tag 'for-linus-20180413' of git://git.kernel.dk/linux-blockLinus Torvalds2-2/+1
Pull block fixes from Jens Axboe: "Followup fixes for this merge window. This contains: - Series from Ming, fixing corner cases in our CPU <-> queue mapping. This triggered repeated warnings on especially s390, but I also hit it in cpu hot plug/unplug testing while doing IO on NVMe on x86-64. - Another fix from Ming, ensuring that we always order budget and driver tag identically, avoiding a deadlock on QD=1 devices. - Loop locking regression fix from this merge window, from Omar. - Another loop locking fix, this time missing an unlock, from Tetsuo Handa. - Fix for racing IO submission with device removal from Bart. - sr reference fix from me, fixing a case where disk change or getevents can race with device removal. - Set of nvme fixes by way of Keith, from various contributors" * tag 'for-linus-20180413' of git://git.kernel.dk/linux-block: (28 commits) nvme: expand nvmf_check_if_ready checks nvme: Use admin command effects for admin commands nvmet: fix space padding in serial number nvme: check return value of init_srcu_struct function nvmet: Fix nvmet_execute_write_zeroes sector count nvme-pci: Separate IO and admin queue IRQ vectors nvme-pci: Remove unused queue parameter nvme-pci: Skip queue deletion if there are no queues nvme: target: fix buffer overflow nvme: don't send keep-alives to the discovery controller nvme: unexport nvme_start_keep_alive nvme-loop: fix kernel oops in case of unhandled command nvme: enforce 64bit offset for nvme_get_log_ext fn sr: get/drop reference to device in revalidate and check_events blk-mq: Revert "blk-mq: reimplement blk_mq_hw_queue_mapped" blk-mq: Avoid that submitting a bio concurrently with device removal triggers a crash backing: silence compiler warning using __printf blk-mq: remove code for dealing with remapping queue blk-mq: reimplement blk_mq_hw_queue_mapped blk-mq: don't check queue mapped in __blk_mq_delay_run_hw_queue() ...
2018-04-13firmware: dmi_scan: Add DMI_OEM_STRING support to dmi_matchesAlex Hung1-0/+1
OEM strings are defined by each OEM and they contain customized and useful OEM information. Supporting it provides more flexible uses of the dmi_matches function. Signed-off-by: Alex Hung <[email protected]> Signed-off-by: Jean Delvare <[email protected]>
2018-04-12Merge tag 'gfs2-4.17.fixes2' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull more gfs2 updates from Bob Peterson: "We decided to request the latest three patches to be merged into this merge window while it's still open. - The first patch adds a new function to lockref: lockref_put_not_zero - The second patch fixes GFS2's glock dump code so it uses the new lockref function. This fixes a problem whereby lock dumps could miss glocks. - I made a minor patch to update some comments and fix the lock ordering text in our gfs2-glocks.txt Documentation file" * tag 'gfs2-4.17.fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: GFS2: Minor improvements to comments and documentation gfs2: Stop using rhashtable_walk_peek lockref: Add lockref_put_not_zero
2018-04-12Merge tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds5-17/+131
Pull NFS client updates from Anna Schumaker: "Stable bugfixes: - xprtrdma: Fix corner cases when handling device removal # v4.12+ - xprtrdma: Fix latency regression on NUMA NFS/RDMA clients # v4.15+ Features: - New sunrpc tracepoint for RPC pings - Finer grained NFSv4 attribute checking - Don't unnecessarily return NFS v4 delegations Other bugfixes and cleanups: - Several other small NFSoRDMA cleanups - Improvements to the sunrpc RTT measurements - A few sunrpc tracepoint cleanups - Various fixes for NFS v4 lock notifications - Various sunrpc and NFS v4 XDR encoding cleanups - Switch to the ida_simple API - Fix NFSv4.1 exclusive create - Forget acl cache after setattr operation - Don't advance the nfs_entry readdir cookie if xdr decoding fails" * tag 'nfs-for-4.17-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (47 commits) NFS: advance nfs_entry cookie only after decoding completes successfully NFSv3/acl: forget acl cache after setattr NFSv4.1: Fix exclusive create NFSv4: Declare the size up to date after it was set. nfs: Use ida_simple API NFSv4: Fix the nfs_inode_set_delegation() arguments NFSv4: Clean up CB_GETATTR encoding NFSv4: Don't ask for attributes when ACCESS is protected by a delegation NFSv4: Add a helper to encode/decode struct timespec NFSv4: Clean up encode_attrs NFSv4; Clean up XDR encoding of type bitmap4 NFSv4: Allow GFP_NOIO sleeps in decode_attr_owner/decode_attr_group SUNRPC: Add a helper for encoding opaque data inline SUNRPC: Add helpers for decoding opaque and string types NFSv4: Ignore change attribute invalidations if we hold a delegation NFS: More fine grained attribute tracking NFS: Don't force unnecessary cache invalidation in nfs_update_inode() NFS: Don't redirty the attribute cache in nfs_wcc_update_inode() NFS: Don't force a revalidation of all attributes if change is missing NFS: Convert NFS_INO_INVALID flags to unsigned long ...
2018-04-12Merge branch 'work.thaw' of ↵Linus Torvalds1-0/+6
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs thaw updates from Al Viro: "An ancient series that has fallen through the cracks in the previous cycle" * 'work.thaw' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: buffer.c: call thaw_super during emergency thaw vfs: factor sb iteration out of do_emergency_remount
2018-04-12Merge branch 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-1/+1
Pull AFS updates from Al Viro: "The AFS series posted by dhowells depended upon lookup_one_len() rework; now that prereq is in the mainline, that series had been rebased on top of it and got some exposure and testing..." * 'afs-dh' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: afs: Do better accretion of small writes on newly created content afs: Add stats for data transfer operations afs: Trace protocol errors afs: Locally edit directory data for mkdir/create/unlink/... afs: Adjust the directory XDR structures afs: Split the directory content defs into a header afs: Fix directory handling afs: Split the dynroot stuff out and give it its own ops tables afs: Keep track of invalid-before version for dentry coherency afs: Rearrange status mapping afs: Make it possible to get the data version in readpage afs: Init inode before accessing cache afs: Introduce a statistics proc file afs: Dump bad status record afs: Implement @cell substitution handling afs: Implement @sys substitution handling afs: Prospectively look up extra files when doing a single lookup afs: Don't over-increment the cell usage count when pinning it afs: Fix checker warnings vfs: Remove the const from dir_context::actor
2018-04-12Merge tag 'for_linus-4.16' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb Pull kdb updates from Jason Wessel: - fix 2032 time access issues and new compiler warnings - minor regression test cleanup - formatting fixes for end user use of kdb * tag 'for_linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/kgdb: kdb: use memmove instead of overlapping memcpy kdb: use ktime_get_mono_fast_ns() instead of ktime_get_ts() kdb: bl: don't use tab character in output kdb: drop newline in unknown command output kdb: make "mdr" command repeat kdb: use __ktime_get_real_seconds instead of __current_kernel_time misc: kgdbts: Display progress of asynchronous tests
2018-04-12lockref: Add lockref_put_not_zeroAndreas Gruenbacher1-0/+1
Put a lockref unless the lockref is dead or its count would become zero. This is the same as lockref_put_or_lock except that the lock is never left held. Signed-off-by: Andreas Gruenbacher <[email protected]> Signed-off-by: Bob Peterson <[email protected]>
2018-04-11Merge tag 'iommu-updates-v4.17' of ↵Linus Torvalds3-14/+19
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU updates from Joerg Roedel: - OF_IOMMU support for the Rockchip iommu driver so that it can use generic DT bindings - rework of locking in the AMD IOMMU interrupt remapping code to make it work better in RT kernels - support for improved iotlb flushing in the AMD IOMMU driver - support for 52-bit physical and virtual addressing in the ARM-SMMU - various other small fixes and cleanups * tag 'iommu-updates-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (53 commits) iommu/io-pgtable-arm: Avoid warning with 32-bit phys_addr_t iommu/rockchip: Support sharing IOMMU between masters iommu/rockchip: Add runtime PM support iommu/rockchip: Fix error handling in init iommu/rockchip: Use OF_IOMMU to attach devices automatically iommu/rockchip: Use IOMMU device for dma mapping operations dt-bindings: iommu/rockchip: Add clock property iommu/rockchip: Control clocks needed to access the IOMMU iommu/rockchip: Fix TLB flush of secondary IOMMUs iommu/rockchip: Use iopoll helpers to wait for hardware iommu/rockchip: Fix error handling in attach iommu/rockchip: Request irqs in rk_iommu_probe() iommu/rockchip: Fix error handling in probe iommu/rockchip: Prohibit unbind and remove iommu/amd: Return proper error code in irq_remapping_alloc() iommu/amd: Make amd_iommu_devtable_lock a spin_lock iommu/amd: Drop the lock while allocating new irq remap table iommu/amd: Factor out setting the remap table for a devid iommu/amd: Use `table' instead `irt' as variable name in amd_iommu_update_ga() iommu/amd: Remove the special case from alloc_irq_table() ...
2018-04-11Merge tag 'pm-4.17-rc1-2' of ↵Linus Torvalds5-10/+33
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull more power management updates from Rafael Wysocki: "These include one big-ticket item which is the rework of the idle loop in order to prevent CPUs from spending too much time in shallow idle states. It reduces idle power on some systems by 10% or more and may improve performance of workloads in which the idle loop overhead matters. This has been in the works for several weeks and it has been tested and reviewed quite thoroughly. Also included are changes that finalize the cpufreq cleanup moving frequency table validation from drivers to the core, a few fixes and cleanups of cpufreq drivers, a cpuidle documentation update and a PM QoS core update to mark the expected switch fall-throughs in it. Specifics: - Rework the idle loop in order to prevent CPUs from spending too much time in shallow idle states by making it stop the scheduler tick before putting the CPU into an idle state only if the idle duration predicted by the idle governor is long enough. That required the code to be reordered to invoke the idle governor before stopping the tick, among other things (Rafael Wysocki, Frederic Weisbecker, Arnd Bergmann). - Add the missing description of the residency sysfs attribute to the cpuidle documentation (Prashanth Prakash). - Finalize the cpufreq cleanup moving frequency table validation from drivers to the core (Viresh Kumar). - Fix a clock leak regression in the armada-37xx cpufreq driver (Gregory Clement). - Fix the initialization of the CPU performance data structures for shared policies in the CPPC cpufreq driver (Shunyong Yang). - Clean up the ti-cpufreq, intel_pstate and CPPC cpufreq drivers a bit (Viresh Kumar, Rafael Wysocki). - Mark the expected switch fall-throughs in the PM QoS core (Gustavo Silva)" * tag 'pm-4.17-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (23 commits) tick-sched: avoid a maybe-uninitialized warning cpufreq: Drop cpufreq_table_validate_and_show() cpufreq: SCMI: Don't validate the frequency table twice cpufreq: CPPC: Initialize shared perf capabilities of CPUs cpufreq: armada-37xx: Fix clock leak cpufreq: CPPC: Don't set transition_latency cpufreq: ti-cpufreq: Use builtin_platform_driver() cpufreq: intel_pstate: Do not include debugfs.h PM / QoS: mark expected switch fall-throughs cpuidle: Add definition of residency to sysfs documentation time: hrtimer: Use timerqueue_iterate_next() to get to the next timer nohz: Avoid duplication of code related to got_idle_tick nohz: Gather tick_sched booleans under a common flag field cpuidle: menu: Avoid selecting shallow states with stopped tick cpuidle: menu: Refine idle state selection for running tick sched: idle: Select idle state before stopping the tick time: hrtimer: Introduce hrtimer_next_event_without() time: tick-sched: Split tick_nohz_stop_sched_tick() cpuidle: Return nohz hint from cpuidle_select() jiffies: Introduce USER_TICK_USEC and redefine TICK_USEC ...
2018-04-11Merge branch 'akpm' (patches from Andrew)Linus Torvalds23-164/+275
Merge more updates from Andrew Morton: - almost all of the rest of MM - kasan updates - lots of procfs work - misc things - lib/ updates - checkpatch - rapidio - ipc/shm updates - the start of willy's XArray conversion * emailed patches from Andrew Morton <[email protected]>: (140 commits) page cache: use xa_lock xarray: add the xa_lock to the radix_tree_root fscache: use appropriate radix tree accessors export __set_page_dirty unicore32: turn flush_dcache_mmap_lock into a no-op arm64: turn flush_dcache_mmap_lock into a no-op mac80211_hwsim: use DEFINE_IDA radix tree: use GFP_ZONEMASK bits of gfp_t for flags linux/const.h: refactor _BITUL and _BITULL a bit linux/const.h: move UL() macro to include/linux/const.h linux/const.h: prefix include guard of uapi/linux/const.h with _UAPI xen, mm: allow deferred page initialization for xen pv domains elf: enforce MAP_FIXED on overlaying elf segments fs, elf: drop MAP_FIXED usage from elf_map mm: introduce MAP_FIXED_NOREPLACE MAINTAINERS: update bouncing [email protected] addresses fs/dcache.c: add cond_resched() in shrink_dentry_list() include/linux/kfifo.h: fix comment ipc/shm.c: shm_split(): remove unneeded test for NULL shm_file_data.vm_ops kernel/sysctl.c: add kdoc comments to do_proc_do{u}intvec_minmax_conv_param ...
2018-04-11page cache: use xa_lockMatthew Wilcox4-14/+14
Remove the address_space ->tree_lock and use the xa_lock newly added to the radix_tree_root. Rename the address_space ->page_tree to ->i_pages, since we don't really care that it's a tree. [[email protected]: fix nds32, fs/dax.c] Link: http://lkml.kernel.org/r/[email protected]: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Jeff Layton <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Ryusuke Konishi <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11xarray: add the xa_lock to the radix_tree_rootMatthew Wilcox3-11/+39
This results in no change in structure size on 64-bit machines as it fits in the padding between the gfp_t and the void *. 32-bit machines will grow the structure from 8 to 12 bytes. Almost all radix trees are protected with (at least) a spinlock, so as they are converted from radix trees to xarrays, the data structures will shrink again. Initialising the spinlock requires a name for the benefit of lockdep, so RADIX_TREE_INIT() now needs to know the name of the radix tree it's initialising, and so do IDR_INIT() and IDA_INIT(). Also add the xa_lock() and xa_unlock() family of wrappers to make it easier to use the lock. If we could rely on -fplan9-extensions in the compiler, we could avoid all of this syntactic sugar, but that wasn't added until gcc 4.6. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Ryusuke Konishi <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11export __set_page_dirtyMatthew Wilcox1-0/+1
XFS currently contains a copy-and-paste of __set_page_dirty(). Export it from buffer.c instead. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Jeff Layton <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Cc: Ryusuke Konishi <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11radix tree: use GFP_ZONEMASK bits of gfp_t for flagsMatthew Wilcox2-4/+6
Patch series "XArray", v9. (First part thereof). This patchset is, I believe, appropriate for merging for 4.17. It contains the XArray implementation, to eventually replace the radix tree, and converts the page cache to use it. This conversion keeps the radix tree and XArray data structures in sync at all times. That allows us to convert the page cache one function at a time and should allow for easier bisection. Other than renaming some elements of the structures, the data structures are fundamentally unchanged; a radix tree walk and an XArray walk will touch the same number of cachelines. I have changes planned to the XArray data structure, but those will happen in future patches. Improvements the XArray has over the radix tree: - The radix tree provides operations like other trees do; 'insert' and 'delete'. But what most users really want is an automatically resizing array, and so it makes more sense to give users an API that is like an array -- 'load' and 'store'. We still have an 'insert' operation for users that really want that semantic. - The XArray considers locking as part of its API. This simplifies a lot of users who formerly had to manage their own locking just for the radix tree. It also improves code generation as we can now tell RCU that we're holding a lock and it doesn't need to generate as much fencing code. The other advantage is that tree nodes can be moved (not yet implemented). - GFP flags are now parameters to calls which may need to allocate memory. The radix tree forced users to decide what the allocation flags would be at creation time. It's much clearer to specify them at allocation time. - Memory is not preloaded; we don't tie up dozens of pages on the off chance that the slab allocator fails. Instead, we drop the lock, allocate a new node and retry the operation. We have to convert all the radix tree, IDA and IDR preload users before we can realise this benefit, but I have not yet found a user which cannot be converted. - The XArray provides a cmpxchg operation. The radix tree forces users to roll their own (and at least four have). - Iterators take a 'max' parameter. That simplifies many users and will reduce the amount of iteration done. - Iteration can proceed backwards. We only have one user for this, but since it's called as part of the pagefault readahead algorithm, that seemed worth mentioning. - RCU-protected pointers are not exposed as part of the API. There are some fun bugs where the page cache forgets to use rcu_dereference() in the current codebase. - Value entries gain an extra bit compared to radix tree exceptional entries. That gives us the extra bit we need to put huge page swap entries in the page cache. - Some iterators now take a 'filter' argument instead of having separate iterators for tagged/untagged iterations. The page cache is improved by this: - Shorter, easier to read code - More efficient iterations - Reduction in size of struct address_space - Fewer walks from the top of the data structure; the XArray API encourages staying at the leaf node and conducting operations there. This patch (of 8): None of these bits may be used for slab allocations, so we can use them as radix tree flags as long as we mask them off before passing them to the slab allocator. Move the IDR flag from the high bits to the GFP_ZONEMASK bits. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Jeff Layton <[email protected]> Cc: Darrick J. Wong <[email protected]> Cc: Dave Chinner <[email protected]> Cc: Ryusuke Konishi <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11linux/const.h: move UL() macro to include/linux/const.hMasahiro Yamada1-0/+9
ARM, ARM64 and UniCore32 duplicate the definition of UL(): #define UL(x) _AC(x, UL) This is not actually arch-specific, so it will be useful to move it to a common header. Currently, we only have the uapi variant for linux/const.h, so I am creating include/linux/const.h. I also added _UL(), _ULL() and ULL() because _AC() is mostly used in the form either _AC(..., UL) or _AC(..., ULL). I expect they will be replaced in follow-up cleanups. The underscore-prefixed ones should be used for exported headers. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Masahiro Yamada <[email protected]> Acked-by: Guan Xuetao <[email protected]> Acked-by: Catalin Marinas <[email protected]> Acked-by: Russell King <[email protected]> Cc: David Howells <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11include/linux/kfifo.h: fix commentValentin Vidic1-4/+4
Clean up unusual formatting in the note about locking. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Valentin Vidic <[email protected]> Cc: Stefani Seibold <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Christophe JAILLET <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Sean Young <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11exec: pin stack limit during execKees Cook1-0/+2
Since the stack rlimit is used in multiple places during exec and it can be changed via other threads (via setrlimit()) or processes (via prlimit()), the assumption that the value doesn't change cannot be made. This leads to races with mm layout selection and argument size calculations. This changes the exec path to use the rlimit stored in bprm instead of in current. Before starting the thread, the bprm stack rlimit is stored back to current. Link: http://lkml.kernel.org/r/[email protected] Fixes: 64701dee4178e ("exec: Use sane stack rlimit under secureexec") Signed-off-by: Kees Cook <[email protected]> Reported-by: Ben Hutchings <[email protected]> Reported-by: Andy Lutomirski <[email protected]> Reported-by: Brad Spengler <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Greg KH <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: "Jason A. Donenfeld" <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Willy Tarreau <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11exec: introduce finalize_exec() before start_thread()Kees Cook1-0/+1
Provide a final callback into fs/exec.c before start_thread() takes over, to handle any last-minute changes, like the coming restoration of the stack limit. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Brad Spengler <[email protected]> Cc: Greg KH <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: "Jason A. Donenfeld" <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Willy Tarreau <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11exec: pass stack rlimit into mm layout functionsKees Cook1-2/+4
Patch series "exec: Pin stack limit during exec". Attempts to solve problems with the stack limit changing during exec continue to be frustrated[1][2]. In addition to the specific issues around the Stack Clash family of flaws, Andy Lutomirski pointed out[3] other places during exec where the stack limit is used and is assumed to be unchanging. Given the many places it gets used and the fact that it can be manipulated/raced via setrlimit() and prlimit(), I think the only way to handle this is to move away from the "current" view of the stack limit and instead attach it to the bprm, and plumb this down into the functions that need to know the stack limits. This series implements the approach. [1] 04e35f4495dd ("exec: avoid RLIMIT_STACK races with prlimit()") [2] 779f4e1c6c7c ("Revert "exec: avoid RLIMIT_STACK races with prlimit()"") [3] to [email protected], "Subject: existing rlimit races?" This patch (of 3): Since it is possible that the stack rlimit can change externally during exec (either via another thread calling setrlimit() or another process calling prlimit()), provide a way to pass the rlimit down into the per-architecture mm layout functions so that the rlimit can stay in the bprm structure instead of sitting in the signal structure until exec is finalized. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Willy Tarreau <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: "Jason A. Donenfeld" <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Greg KH <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Brad Spengler <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11seq_file: allocate seq_file from kmem_cacheAlexey Dobriyan1-0/+1
For fine-grained debugging and usercopy protection. Link: http://lkml.kernel.org/r/20180310085027.GA17121@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Al Viro <[email protected]> Cc: Glauber Costa <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11task_struct: only use anon struct under randstruct pluginKees Cook2-12/+3
The original intent for always adding the anonymous struct in task_struct was to make sure we had compiler coverage. However, this caused pathological padding of 40 bytes at the start of task_struct. Instead, move the anonymous struct to being only used when struct layout randomization is enabled. Link: http://lkml.kernel.org/r/20180327213609.GA2964@beast Fixes: 29e48ce87f1e ("task_struct: Allow randomized") Signed-off-by: Kees Cook <[email protected]> Reported-by: Peter Zijlstra <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11uts: create "struct uts_namespace" from kmem_cacheAlexey Dobriyan1-0/+6
So "struct uts_namespace" can enjoy fine-grained SLAB debugging and usercopy protection. I'd prefer shorter name "utsns" but there is "user_namespace" already. Link: http://lkml.kernel.org/r/20180228215158.GA23146@avx2 Signed-off-by: Alexey Dobriyan <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: "Eric W. Biederman" <[email protected]> Cc: Serge Hallyn <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11taint: add taint for randstructKees Cook1-1/+2
Since the randstruct plugin can intentionally produce extremely unusual kernel structure layouts (even performance pathological ones), some maintainers want to be able to trivially determine if an Oops is coming from a randstruct-built kernel, so as to keep their sanity when debugging. This adds the new flag and initializes taint_mask immediately when built with randstruct. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Al Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11taint: convert to indexed initializationKees Cook1-0/+1
This converts to using indexed initializers instead of comments, adds a comment on why the taint flags can't be an enum, and make sure that no one forgets to update the taint_flags when adding new bits. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Al Viro <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11proc: add seq_put_decimal_ull_width to speed up /proc/pid/smapsAndrei Vagin2-1/+4
seq_put_decimal_ull_w(m, str, val, width) prints a decimal number with a specified minimal field width. It is equivalent of seq_printf(m, "%s%*d", str, width, val), but it works much faster. == test_smaps.py num = 0 with open("/proc/1/smaps") as f: for x in xrange(10000): data = f.read() f.seek(0, 0) == == Before patch == $ time python test_smaps.py real 0m4.593s user 0m0.398s sys 0m4.158s == After patch == $ time python test_smaps.py real 0m3.828s user 0m0.413s sys 0m3.408s $ perf -g record python test_smaps.py == Before patch == - 79.01% 3.36% python [kernel.kallsyms] [k] show_smap.isra.33 - 75.65% show_smap.isra.33 + 48.85% seq_printf + 15.75% __walk_page_range + 9.70% show_map_vma.isra.23 0.61% seq_puts == After patch == - 75.51% 4.62% python [kernel.kallsyms] [k] show_smap.isra.33 - 70.88% show_smap.isra.33 + 24.82% seq_put_decimal_ull_w + 19.78% __walk_page_range + 12.74% seq_printf + 11.08% show_map_vma.isra.23 + 1.68% seq_puts [[email protected]: fix drivers/of/unittest.c build] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andrei Vagin <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11procfs: add seq_put_hex_ll to speed up /proc/pid/mapsAndrei Vagin1-0/+3
seq_put_hex_ll() prints a number in hexadecimal notation and works faster than seq_printf(). == test.py num = 0 with open("/proc/1/maps") as f: while num < 10000 : data = f.read() f.seek(0, 0) num = num + 1 == == Before patch == $ time python test.py real 0m1.561s user 0m0.257s sys 0m1.302s == After patch == $ time python test.py real 0m0.986s user 0m0.279s sys 0m0.707s $ perf -g record python test.py: == Before patch == - 67.42% 2.82% python [kernel.kallsyms] [k] show_map_vma.isra.22 - 64.60% show_map_vma.isra.22 - 44.98% seq_printf - seq_vprintf - vsnprintf + 14.85% number + 12.22% format_decode 5.56% memcpy_erms + 15.06% seq_path + 4.42% seq_pad + 2.45% __GI___libc_read == After patch == - 47.35% 3.38% python [kernel.kallsyms] [k] show_map_vma.isra.23 - 43.97% show_map_vma.isra.23 + 20.84% seq_path - 15.73% show_vma_header_prefix 10.55% seq_put_hex_ll + 2.65% seq_put_decimal_ull 0.95% seq_putc + 6.96% seq_pad + 2.94% __GI___libc_read [[email protected]: use unsigned int instead of int where it is suitable] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: v2] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andrei Vagin <[email protected]> Cc: Alexey Dobriyan <[email protected]> Cc: KAMEZAWA Hiroyuki <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLEJoonsoo Kim2-3/+1
Patch series "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE", v2. 0. History This patchset is the follow-up of the discussion about the "Introduce ZONE_CMA (v7)" [1]. Please reference it if more information is needed. 1. What does this patch do? This patch changes the management way for the memory of the CMA area in the MM subsystem. Currently the memory of the CMA area is managed by the zone where their pfn is belong to. However, this approach has some problems since MM subsystem doesn't have enough logic to handle the situation that different characteristic memories are in a single zone. To solve this issue, this patch try to manage all the memory of the CMA area by using the MOVABLE zone. In MM subsystem's point of view, characteristic of the memory on the MOVABLE zone and the memory of the CMA area are the same. So, managing the memory of the CMA area by using the MOVABLE zone will not have any problem. 2. Motivation There are some problems with current approach. See following. Although these problem would not be inherent and it could be fixed without this conception change, it requires many hooks addition in various code path and it would be intrusive to core MM and would be really error-prone. Therefore, I try to solve them with this new approach. Anyway, following is the problems of the current implementation. o CMA memory utilization First, following is the freepage calculation logic in MM. - For movable allocation: freepage = total freepage - For unmovable allocation: freepage = total freepage - CMA freepage Freepages on the CMA area is used after the normal freepages in the zone where the memory of the CMA area is belong to are exhausted. At that moment that the number of the normal freepages is zero, so - For movable allocation: freepage = total freepage = CMA freepage - For unmovable allocation: freepage = 0 If unmovable allocation comes at this moment, allocation request would fail to pass the watermark check and reclaim is started. After reclaim, there would exist the normal freepages so freepages on the CMA areas would not be used. FYI, there is another attempt [2] trying to solve this problem in lkml. And, as far as I know, Qualcomm also has out-of-tree solution for this problem. Useless reclaim: There is no logic to distinguish CMA pages in the reclaim path. Hence, CMA page is reclaimed even if the system just needs the page that can be usable for the kernel allocation. Atomic allocation failure: This is also related to the fallback allocation policy for the memory of the CMA area. Consider the situation that the number of the normal freepages is *zero* since the bunch of the movable allocation requests come. Kswapd would not be woken up due to following freepage calculation logic. - For movable allocation: freepage = total freepage = CMA freepage If atomic unmovable allocation request comes at this moment, it would fails due to following logic. - For unmovable allocation: freepage = total freepage - CMA freepage = 0 It was reported by Aneesh [3]. Useless compaction: Usual high-order allocation request is unmovable allocation request and it cannot be served from the memory of the CMA area. In compaction, migration scanner try to migrate the page in the CMA area and make high-order page there. As mentioned above, it cannot be usable for the unmovable allocation request so it's just waste. 3. Current approach and new approach Current approach is that the memory of the CMA area is managed by the zone where their pfn is belong to. However, these memory should be distinguishable since they have a strong limitation. So, they are marked as MIGRATE_CMA in pageblock flag and handled specially. However, as mentioned in section 2, the MM subsystem doesn't have enough logic to deal with this special pageblock so many problems raised. New approach is that the memory of the CMA area is managed by the MOVABLE zone. MM already have enough logic to deal with special zone like as HIGHMEM and MOVABLE zone. So, managing the memory of the CMA area by the MOVABLE zone just naturally work well because constraints for the memory of the CMA area that the memory should always be migratable is the same with the constraint for the MOVABLE zone. There is one side-effect for the usability of the memory of the CMA area. The use of MOVABLE zone is only allowed for a request with GFP_HIGHMEM && GFP_MOVABLE so now the memory of the CMA area is also only allowed for this gfp flag. Before this patchset, a request with GFP_MOVABLE can use them. IMO, It would not be a big issue since most of GFP_MOVABLE request also has GFP_HIGHMEM flag. For example, file cache page and anonymous page. However, file cache page for blockdev file is an exception. Request for it has no GFP_HIGHMEM flag. There is pros and cons on this exception. In my experience, blockdev file cache pages are one of the top reason that causes cma_alloc() to fail temporarily. So, we can get more guarantee of cma_alloc() success by discarding this case. Note that there is no change in admin POV since this patchset is just for internal implementation change in MM subsystem. Just one minor difference for admin is that the memory stat for CMA area will be printed in the MOVABLE zone. That's all. 4. Result Following is the experimental result related to utilization problem. 8 CPUs, 1024 MB, VIRTUAL MACHINE make -j16 <Before> CMA area: 0 MB 512 MB Elapsed-time: 92.4 186.5 pswpin: 82 18647 pswpout: 160 69839 <After> CMA : 0 MB 512 MB Elapsed-time: 93.1 93.4 pswpin: 84 46 pswpout: 183 92 akpm: "kernel test robot" reported a 26% improvement in vm-scalability.throughput: http://lkml.kernel.org/r/20180330012721.GA3845@yexl-desktop [1]: lkml.kernel.org/r/[email protected] [2]: https://lkml.org/lkml/2014/10/15/623 [3]: http://www.spinics.net/lists/linux-mm/msg100562.html Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joonsoo Kim <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Tested-by: Tony Lindgren <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Russell King <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE requestJoonsoo Kim1-1/+1
Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that important to reserve. When ZONE_MOVABLE is used, this problem would theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE allocation request which is mainly used for page cache and anon page allocation. So, fix it by setting 0 to sysctl_lowmem_reserve_ratio[ZONE_HIGHMEM]. And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size makes code complex. For example, if there is highmem system, following reserve ratio is activated for *NORMAL ZONE* which would be easyily misleading people. #ifdef CONFIG_HIGHMEM 32 #endif This patch also fixes this situation by defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES and place "#ifdef" to right place. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joonsoo Kim <[email protected]> Reviewed-by: Aneesh Kumar K.V <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Tested-by: Tony Lindgren <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: "Aneesh Kumar K . V" <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Laura Abbott <[email protected]> Cc: Marek Szyprowski <[email protected]> Cc: Michal Nazarewicz <[email protected]> Cc: Russell King <[email protected]> Cc: Will Deacon <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11mm: unclutter THP migrationMichal Hocko1-2/+2
THP migration is hacked into the generic migration with rather surprising semantic. The migration allocation callback is supposed to check whether the THP can be migrated at once and if that is not the case then it allocates a simple page to migrate. unmap_and_move then fixes that up by spliting the THP into small pages while moving the head page to the newly allocated order-0 page. Remaning pages are moved to the LRU list by split_huge_page. The same happens if the THP allocation fails. This is really ugly and error prone [1]. I also believe that split_huge_page to the LRU lists is inherently wrong because all tail pages are not migrated. Some callers will just work around that by retrying (e.g. memory hotplug). There are other pfn walkers which are simply broken though. e.g. madvise_inject_error will migrate head and then advances next pfn by the huge page size. do_move_page_to_node_array, queue_pages_range (migrate_pages, mbind), will simply split the THP before migration if the THP migration is not supported then falls back to single page migration but it doesn't handle tail pages if the THP migration path is not able to allocate a fresh THP so we end up with ENOMEM and fail the whole migration which is a questionable behavior. Page compaction doesn't try to migrate large pages so it should be immune. This patch tries to unclutter the situation by moving the special THP handling up to the migrate_pages layer where it actually belongs. We simply split the THP page into the existing list if unmap_and_move fails with ENOMEM and retry. So we will _always_ migrate all THP subpages and specific migrate_pages users do not have to deal with this case in a special way. [1] http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Zi Yan <[email protected]> Cc: Andrea Reale <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11mm, migrate: remove reason argument from new_page_tMichal Hocko2-4/+2
No allocation callback is using this argument anymore. new_page_node used to use this parameter to convey node_id resp. migration error up to move_pages code (do_move_page_to_node_array). The error status never made it into the final status field and we have a better way to communicate node id to the status field now. All other allocation callbacks simply ignored the argument so we can drop it finally. [[email protected]: fix migration callback] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: fix alloc_misplaced_dst_page()] [[email protected]: fix build] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reviewed-by: Zi Yan <[email protected]> Cc: Andrea Reale <[email protected]> Cc: Anshuman Khandual <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Naoya Horiguchi <[email protected]> Cc: Vlastimil Babka <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11mm: memcg: make sure memory.events is uptodate when waking pollersJohannes Weiner1-17/+18
Commit a983b5ebee57 ("mm: memcontrol: fix excessive complexity in memory.stat reporting") added per-cpu drift to all memory cgroup stats and events shown in memory.stat and memory.events. For memory.stat this is acceptable. But memory.events issues file notifications, and somebody polling the file for changes will be confused when the counters in it are unchanged after a wakeup. Luckily, the events in memory.events - MEMCG_LOW, MEMCG_HIGH, MEMCG_MAX, MEMCG_OOM - are sufficiently rare and high-level that we don't need per-cpu buffering for them: MEMCG_HIGH and MEMCG_MAX would be the most frequent, but they're counting invocations of reclaim, which is a complex operation that touches many shared cachelines. This splits memory.events from the generic VM events and tracks them in their own, unbuffered atomic counters. That's also cleaner, as it eliminates the ugly enum nesting of VM and cgroup events. [[email protected]: "array subscript is above array bounds"] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Fixes: a983b5ebee57 ("mm: memcontrol: fix excessive complexity in memory.stat reporting") Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Tejun Heo <[email protected]> Acked-by: Tejun Heo <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Roman Gushchin <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11mm/hmm: fix header file if/else/endif maze, againArnd Bergmann1-9/+12
The last fix was still wrong, as we need the inline dummy functions also for the case that CONFIG_HMM is enabled but CONFIG_HMM_MIRROR is not: kernel/fork.o: In function `__mmdrop': fork.c:(.text+0x14f6): undefined reference to `hmm_mm_destroy' This adds back the second copy of the dummy functions, hopefully this time in the right place. Link: http://lkml.kernel.org/r/[email protected] Fixes: 8900d06a277a ("mm/hmm: fix header file if/else/endif maze") Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Jérôme Glisse <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11mm/hmm: use device driver encoding for HMM pfnJérôme Glisse1-36/+94
Users of hmm_vma_fault() and hmm_vma_get_pfns() provide a flags array and pfn shift value allowing them to define their own encoding for HMM pfn that are fill inside the pfns array of the hmm_range struct. With this device driver can get pfn that match their own private encoding out of HMM without having to do any conversion. [[email protected]: don't ignore specific pte fault flag in hmm_vma_fault()] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: clarify fault logic for device private memory] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jérôme Glisse <[email protected]> Signed-off-by: Ralph Campbell <[email protected]> Cc: Evgeny Baskakov <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Mark Hairgrove <[email protected]> Cc: John Hubbard <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-04-11mm/hmm: change hmm_vma_fault() to allow write fault on page basisJérôme Glisse1-1/+1
This changes hmm_vma_fault() to not take a global write fault flag for a range but instead rely on caller to populate HMM pfns array with proper fault flag ie HMM_PFN_VALID if driver want read fault for that address or HMM_PFN_VALID and HMM_PFN_WRITE for write. Moreover by setting HMM_PFN_DEVICE_PRIVATE the device driver can ask for device private memory to be migrated back to system memory through page fault. This is more flexible API and it better reflects how device handles and reports fault. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jérôme Glisse <[email protected]> Cc: Evgeny Baskakov <[email protected]> Cc: Ralph Campbell <[email protected]> Cc: Mark Hairgrove <[email protected]> Cc: John Hubbard <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>