aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-04-11x86/Hyper-V: Report crash data in die() when panic_on_oops is setTianyu Lan3-4/+9
When oops happens with panic_on_oops unset, the oops thread is killed by die() and system continues to run. In such case, guest should not report crash register data to host since system still runs. Check panic_on_oops and return directly in hyperv_report_panic() when the function is called in the die() and panic_on_oops is unset. Fix it. Fixes: 7ed4325a44ea ("Drivers: hv: vmbus: Make panic reporting to be more useful") Signed-off-by: Tianyu Lan <[email protected]> Reviewed-by: Michael Kelley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]>
2020-04-11x86/Hyper-V: Report crash register data when sysctl_record_panic_msg is not setTianyu Lan1-9/+14
When sysctl_record_panic_msg is not set, the panic will not be reported to Hyper-V via hyperv_report_panic_msg(). So the crash should be reported via hyperv_report_panic(). Fixes: 81b18bce48af ("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic") Reviewed-by: Michael Kelley <[email protected]> Signed-off-by: Tianyu Lan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]>
2020-04-11x86/Hyper-V: Report crash register data or kmsg before running crash kernelTianyu Lan1-0/+10
We want to notify Hyper-V when a Linux guest VM crash occurs, so there is a record of the crash even when kdump is enabled. But crash_kexec_post_notifiers defaults to "false", so the kdump kernel runs before the notifiers and Hyper-V never gets notified. Fix this by always setting crash_kexec_post_notifiers to be true for Hyper-V VMs. Fixes: 81b18bce48af ("Drivers: HV: Send one page worth of kmsg dump over Hyper-V during panic") Reviewed-by: Michael Kelley <[email protected]> Signed-off-by: Tianyu Lan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]>
2020-04-11x86/Hyper-V: Trigger crash enlightenment only once during system crash.Tianyu Lan1-2/+14
When a guest VM panics, Hyper-V should be notified only once via the crash synthetic MSRs. Current Linux code might write these crash MSRs twice during a system panic: 1) hyperv_panic/die_event() calling hyperv_report_panic() 2) hv_kmsg_dump() calling hyperv_report_panic_msg() Fix this by not calling hyperv_report_panic() if a kmsg dump has been successfully registered. The notification will happen later via hyperv_report_panic_msg(). Fixes: 7ed4325a44ea ("Drivers: hv: vmbus: Make panic reporting to be more useful") Reviewed-by: Michael Kelley <[email protected]> Signed-off-by: Tianyu Lan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wei Liu <[email protected]>
2020-04-11pNFS: Fix RCU lock leakageTrond Myklebust1-0/+1
Another brown paper bag moment. pnfs_alloc_ds_commits_list() is leaking the RCU lock. Fixes: a9901899b649 ("pNFS: Add infrastructure for cleaning up per-layout commit structures") Signed-off-by: Trond Myklebust <[email protected]>
2020-04-11KVM: VMX: Extend VMXs #AC interceptor to handle split lock #AC in guestXiaoyao Li1-3/+34
Two types of #AC can be generated in Intel CPUs: 1. legacy alignment check #AC 2. split lock #AC Reflect #AC back into the guest if the guest has legacy alignment checks enabled or if split lock detection is disabled. If the #AC is not a legacy one and split lock detection is enabled, then invoke handle_guest_split_lock() which will either warn and disable split lock detection for this task or force SIGBUS on it. [ tglx: Switch it to handle_guest_split_lock() and rename the misnamed helper function. ] Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Xiaoyao Li <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-04-11KVM: x86: Emulate split-lock access as a write in emulatorXiaoyao Li1-1/+11
Emulate split-lock accesses as writes if split lock detection is on to avoid #AC during emulation, which will result in a panic(). This should never occur for a well-behaved guest, but a malicious guest can manipulate the TLB to trigger emulation of a locked instruction[1]. More discussion can be found at [2][3]. [1] https://lkml.kernel.org/r/[email protected] [2] https://lkml.kernel.org/r/[email protected] [3] https://lkml.kernel.org/r/[email protected] Suggested-by: Sean Christopherson <[email protected]> Signed-off-by: Xiaoyao Li <[email protected]> Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-04-11x86/split_lock: Provide handle_guest_split_lock()Thomas Gleixner2-5/+34
Without at least minimal handling for split lock detection induced #AC, VMX will just run into the same problem as the VMWare hypervisor, which was reported by Kenneth. It will inject the #AC blindly into the guest whether the guest is prepared or not. Provide a function for guest mode which acts depending on the host SLD mode. If mode == sld_warn, treat it like user space, i.e. emit a warning, disable SLD and mark the task accordingly. Otherwise force SIGBUS. [ bp: Add a !CPU_SUP_INTEL stub for handle_guest_split_lock(). ] Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Acked-by: Paolo Bonzini <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lkml.kernel.org/r/[email protected]
2020-04-11ALSA: hda/realtek - Enable the headset mic on Asus FX505DTAdam Barber1-0/+1
On Asus FX505DT with Realtek ALC233, the headset mic is connected to pin 0x19, with default 0x411111f0. Enable headset mic by reconfiguring the pin to an external mic associated with the headphone on 0x21. Mic jack detection was also found to be working. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=207131 Signed-off-by: Adam Barber <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2020-04-11kbuild: fix comment about missing include guard detectionMasahiro Yamada1-1/+1
The keyword here is 'twice' to explain the trick. Signed-off-by: Masahiro Yamada <[email protected]>
2020-04-10docs: networking: add full DIM APIRandy Dunlap1-0/+6
Add the full net DIM API to the net_dim.rst file. Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-04-10docs: networking: convert DIM to RSTJakub Kicinski3-47/+45
Convert the Dynamic Interrupt Moderation doc to RST and use the RST features like syntax highlight, function and structure documentation, enumerations, table of contents. Reviewed-by: Randy Dunlap <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2020-04-10Merge branch 'akpm' (patches from Andrew)Linus Torvalds130-366/+706
Merge yet more updates from Andrew Morton: - Almost all of the rest of MM (memcg, slab-generic, slab, pagealloc, gup, hugetlb, pagemap, memremap) - Various other things (hfs, ocfs2, kmod, misc, seqfile) * akpm: (34 commits) ipc/util.c: sysvipc_find_ipc() should increase position index kernel/gcov/fs.c: gcov_seq_next() should increase position index fs/seq_file.c: seq_read(): add info message about buggy .next functions drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warnings change email address for Pali Rohár selftests: kmod: test disabling module autoloading selftests: kmod: fix handling test numbers above 9 docs: admin-guide: document the kernel.modprobe sysctl fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once() kmod: make request_module() return an error when autoloading is disabled mm/memremap: set caching mode for PCI P2PDMA memory to WC mm/memory_hotplug: add pgprot_t to mhp_params powerpc/mm: thread pgprot_t through create_section_mapping() x86/mm: introduce __set_memory_prot() x86/mm: thread pgprot_t through init_memory_mapping() mm/memory_hotplug: rename mhp_restrictions to mhp_params mm/memory_hotplug: drop the flags field from struct mhp_restrictions mm/special: create generic fallbacks for pte_special() and pte_mkspecial() mm/vma: introduce VM_ACCESS_FLAGS mm/vma: define a default value for VM_DATA_DEFAULT_FLAGS ...
2020-04-10Merge tag 'docs-5.7-2' of git://git.lwn.net/linuxLinus Torvalds6-24/+22
Pull Documentation fixes from Jonathan Corbet: "A handful of late-arriving fixes for the documentation tree" * tag 'docs-5.7-2' of git://git.lwn.net/linux: Documentation: android: binderfs: add 'stats' mount option Documentation: driver-api/usb/writing_usb_driver.rst Updates documentation links docs: driver-api: address duplicate label warning Documentation: sysrq: fix RST formatting docs: kernel-parameters.txt: Fix broken references docs: kernel-parameters.txt: Remove nompx docs: filesystems: fix typo in qnx6.rst
2020-04-10Merge tag 'for-linus-5.7-ofs1' of ↵Linus Torvalds4-85/+26
git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "A fix and two cleanups. Fix: - Christoph Hellwig noticed that some logic I added to orangefs_file_read_iter introduced a race condition, so he sent a reversion patch. I had to modify his patch since reverting at this point broke Orangefs. Cleanups: - Christoph Hellwig noticed that we were doing some unnecessary work in orangefs_flush, so he sent in a patch that removed the un-needed code. - Al Viro told me he had trouble building Orangefs. Orangefs should be easy to build, even for Al :-). I looked back at the test server build notes in orangefs.txt, just in case that's where the trouble really is, and found a couple of typos and made a couple of clarifications" * tag 'for-linus-5.7-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: clarify build steps for test server in orangefs.txt orangefs: don't mess with I_DIRTY_TIMES in orangefs_flush orangefs: get rid of knob code...
2020-04-10Merge tag 'xtensa-20200410' of git://github.com/jcmvbkbc/linux-xtensaLinus Torvalds4-15/+8
Pull xtensa updates from Max Filippov: - replace setup_irq() by request_irq() - cosmetic fixes in xtensa Kconfig and boot/Makefile * tag 'xtensa-20200410' of git://github.com/jcmvbkbc/linux-xtensa: arch/xtensa: fix grammar in Kconfig help text xtensa: remove meaningless export ccflags-y xtensa: replace setup_irq() by request_irq()
2020-04-10Merge tag 'for-linus-5.7-rc1b-tag' of ↵Linus Torvalds18-123/+142
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull more xen updates from Juergen Gross: - two cleanups - fix a boot regression introduced in this merge window - fix wrong use of memory allocation flags * tag 'for-linus-5.7-rc1b-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: fix booting 32-bit pv guest x86/xen: make xen_pvmmu_arch_setup() static xen/blkfront: fix memory allocation flags in blkfront_setup_indirect() xen: Use evtchn_type_t as a type for event channels
2020-04-10ipc/util.c: sysvipc_find_ipc() should increase position indexVasily Averin1-1/+1
If seq_file .next function does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Waiman Long <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Al Viro <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: NeilBrown <[email protected]> Cc: Peter Oberparleiter <[email protected]> Cc: Steven Rostedt <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10kernel/gcov/fs.c: gcov_seq_next() should increase position indexVasily Averin1-1/+1
If seq_file .next function does not change position index, read after some lseek can generate unexpected output. https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Peter Oberparleiter <[email protected]> Cc: Al Viro <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: NeilBrown <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Waiman Long <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10fs/seq_file.c: seq_read(): add info message about buggy .next functionsVasily Averin1-2/+5
Patch series "seq_file .next functions should increase position index". In Aug 2018 NeilBrown noticed commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface") "Some ->next functions do not increment *pos when they return NULL... Note that such ->next functions are buggy and should be fixed. A simple demonstration is dd if=/proc/swaps bs=1000 skip=1 Choose any block size larger than the size of /proc/swaps. This will always show the whole last line of /proc/swaps" Described problem is still actual. If you make lseek into middle of last output line following read will output end of last line and whole last line once again. $ dd if=/proc/swaps bs=1 # usual output Filename Type Size Used Priority /dev/dm-0 partition 4194812 97536 -2 104+0 records in 104+0 records out 104 bytes copied $ dd if=/proc/swaps bs=40 skip=1 # last line was generated twice dd: /proc/swaps: cannot skip to specified offset v/dm-0 partition 4194812 97536 -2 /dev/dm-0 partition 4194812 97536 -2 3+1 records in 3+1 records out 131 bytes copied There are lot of other affected files, I've found 30+ including /proc/net/ip_tables_matches and /proc/sysvipc/* I've sent patches into maillists of affected subsystems already, this patch-set fixes the problem in files related to pstore, tracing, gcov, sysvipc and other subsystems processed via linux-kernel@ mailing list directly https://bugzilla.kernel.org/show_bug.cgi?id=206283 This patch (of 4): Add debug code to seq_read() to detect missed or out-of-tree incorrect .next seq_file functions. [[email protected]: s/pr_info/pr_info_ratelimited/, per Qian Cai] https://bugzilla.kernel.org/show_bug.cgi?id=206283 Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: NeilBrown <[email protected]> Cc: Al Viro <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Manfred Spraul <[email protected]> Cc: Peter Oberparleiter <[email protected]> Cc: Waiman Long <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10drivers/dma/tegra20-apb-dma.c: fix platform_get_irq.cocci warningskbuild test robot1-1/+0
Remove dev_err() messages after platform_get_irq*() failures. platform_get_irq() already prints an error. Generated by: scripts/coccinelle/api/platform_get_irq.cocci Fixes: 6c41ac96ad92 ("dmaengine: tegra-apb: Support COMPILE_TEST") Signed-off-by: kbuild test robot <[email protected]> Signed-off-by: Julia Lawall <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Dmitry Osipenko <[email protected]> Acked-by: Thierry Reding <[email protected]> Cc: Laxman Dewangan <[email protected]> Cc: Vinod Koul <[email protected]> Cc: Stephen Warren <[email protected]> Cc: Jon Hunter <[email protected]> Link: http://lkml.kernel.org/r/alpine.DEB.2.21.2002271133450.2973@hadrien Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10change email address for Pali RohárPali Rohár24-41/+42
For security reasons I stopped using gmail account and kernel address is now up-to-date alias to my personal address. People periodically send me emails to address which they found in source code of drivers, so this change reflects state where people can contact me. [ Added .mailmap entry as per Joe Perches - Linus ] Signed-off-by: Pali Rohár <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Joe Perches <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10selftests: kmod: test disabling module autoloadingEric Biggers1-0/+30
Test that request_module() fails with -ENOENT when /proc/sys/kernel/modprobe contains (a) a nonexistent path, and (b) an empty path. Case (b) is a regression test for the patch "kmod: make request_module() return an error when autoloading is disabled". Tested with 'kmod.sh -t 0010 && kmod.sh -t 0011', and also simply with 'kmod.sh' to run all kmod tests. Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Luis Chamberlain <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jeff Vander Stoep <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Kees Cook <[email protected]> Cc: NeilBrown <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10selftests: kmod: fix handling test numbers above 9Eric Biggers1-4/+9
get_test_count() and get_test_enabled() were broken for test numbers above 9 due to awk interpreting a field specification like '$0010' as octal rather than decimal. Fix it by stripping the leading zeroes. Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Luis Chamberlain <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jeff Vander Stoep <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Kees Cook <[email protected]> Cc: NeilBrown <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10docs: admin-guide: document the kernel.modprobe sysctlEric Biggers1-0/+21
Document the kernel.modprobe sysctl in the same place that all the other kernel.* sysctls are documented. Make sure to mention how to use this sysctl to completely disable module autoloading, and how this sysctl relates to CONFIG_STATIC_USERMODEHELPER. [[email protected]: v5] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jeff Vander Stoep <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Kees Cook <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: NeilBrown <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10fs/filesystems.c: downgrade user-reachable WARN_ONCE() to pr_warn_once()Eric Biggers1-1/+3
After request_module(), nothing is stopping the module from being unloaded until someone takes a reference to it via try_get_module(). The WARN_ONCE() in get_fs_type() is thus user-reachable, via userspace running 'rmmod' concurrently. Since WARN_ONCE() is for kernel bugs only, not for user-reachable situations, downgrade this warning to pr_warn_once(). Keep it printed once only, since the intent of this warning is to detect a bug in modprobe at boot time. Printing the warning more than once wouldn't really provide any useful extra information. Fixes: 41124db869b7 ("fs: warn in case userspace lied about modprobe return") Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Jessica Yu <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jeff Vander Stoep <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Kees Cook <[email protected]> Cc: Luis Chamberlain <[email protected]> Cc: NeilBrown <[email protected]> Cc: <[email protected]> [4.13+] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10kmod: make request_module() return an error when autoloading is disabledEric Biggers1-2/+2
Patch series "module autoloading fixes and cleanups", v5. This series fixes a bug where request_module() was reporting success to kernel code when module autoloading had been completely disabled via 'echo > /proc/sys/kernel/modprobe'. It also addresses the issues raised on the original thread (https://lkml.kernel.org/lkml/[email protected]/T/#u) bydocumenting the modprobe sysctl, adding a self-test for the empty path case, and downgrading a user-reachable WARN_ONCE(). This patch (of 4): It's long been possible to disable kernel module autoloading completely (while still allowing manual module insertion) by setting /proc/sys/kernel/modprobe to the empty string. This can be preferable to setting it to a nonexistent file since it avoids the overhead of an attempted execve(), avoids potential deadlocks, and avoids the call to security_kernel_module_request() and thus on SELinux-based systems eliminates the need to write SELinux rules to dontaudit module_request. However, when module autoloading is disabled in this way, request_module() returns 0. This is broken because callers expect 0 to mean that the module was successfully loaded. Apparently this was never noticed because this method of disabling module autoloading isn't used much, and also most callers don't use the return value of request_module() since it's always necessary to check whether the module registered its functionality or not anyway. But improperly returning 0 can indeed confuse a few callers, for example get_fs_type() in fs/filesystems.c where it causes a WARNING to be hit: if (!fs && (request_module("fs-%.*s", len, name) == 0)) { fs = __get_fs_type(name, len); WARN_ONCE(!fs, "request_module fs-%.*s succeeded, but still no fs?\n", len, name); } This is easily reproduced with: echo > /proc/sys/kernel/modprobe mount -t NONEXISTENT none / It causes: request_module fs-NONEXISTENT succeeded, but still no fs? WARNING: CPU: 1 PID: 1106 at fs/filesystems.c:275 get_fs_type+0xd6/0xf0 [...] This should actually use pr_warn_once() rather than WARN_ONCE(), since it's also user-reachable if userspace immediately unloads the module. Regardless, request_module() should correctly return an error when it fails. So let's make it return -ENOENT, which matches the error when the modprobe binary doesn't exist. I've also sent patches to document and test this case. Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Kees Cook <[email protected]> Reviewed-by: Jessica Yu <[email protected]> Acked-by: Luis Chamberlain <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jeff Vander Stoep <[email protected]> Cc: Ben Hutchings <[email protected]> Cc: Josh Triplett <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm/memremap: set caching mode for PCI P2PDMA memory to WCLogan Gunthorpe1-0/+3
PCI BAR IO memory should never be mapped as WB, however prior to this the PAT bits were set WB and it was typically overridden by MTRR registers set by the firmware. Set PCI P2PDMA memory to be UC as this is what it currently, typically, ends up being mapped as on x86 after the MTRR registers override the cache setting. Future use-cases may need to generalize this by adding flags to select the caching type, as some P2PDMA cases may not want UC. However, those use-cases are not upstream yet and this can be changed when they arrive. Signed-off-by: Logan Gunthorpe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Dan Williams <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Eric Badger <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm/memory_hotplug: add pgprot_t to mhp_paramsLogan Gunthorpe10-7/+36
devm_memremap_pages() is currently used by the PCI P2PDMA code to create struct page mappings for IO memory. At present, these mappings are created with PAGE_KERNEL which implies setting the PAT bits to be WB. However, on x86, an mtrr register will typically override this and force the cache type to be UC-. In the case firmware doesn't set this register it is effectively WB and will typically result in a machine check exception when it's accessed. Other arches are not currently likely to function correctly seeing they don't have any MTRR registers to fall back on. To solve this, provide a way to specify the pgprot value explicitly to arch_add_memory(). Of the arches that support MEMORY_HOTPLUG: x86_64, and arm64 need a simple change to pass the pgprot_t down to their respective functions which set up the page tables. For x86_32, set the page tables explicitly using _set_memory_prot() (seeing they are already mapped). For ia64, s390 and sh, reject anything but PAGE_KERNEL settings -- this should be fine, for now, seeing these architectures don't support ZONE_DEVICE. A check in __add_pages() is also added to ensure the pgprot parameter was set for all arches. Signed-off-by: Logan Gunthorpe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: David Hildenbrand <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Dan Williams <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Badger <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10powerpc/mm: thread pgprot_t through create_section_mapping()Logan Gunthorpe7-17/+27
In prepartion to support a pgprot_t argument for arch_add_memory(). Signed-off-by: Logan Gunthorpe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dan Williams <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Eric Badger <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10x86/mm: introduce __set_memory_prot()Logan Gunthorpe2-0/+14
For use in the 32bit arch_add_memory() to set the pgprot type of the memory to add. Signed-off-by: Logan Gunthorpe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Dan Williams <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Eric Badger <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Will Deacon <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10x86/mm: thread pgprot_t through init_memory_mapping()Logan Gunthorpe8-25/+34
In preparation to support a pgprot_t argument for arch_add_memory(). It's required to move the prototype of init_memory_mapping() seeing the original location came before the definition of pgprot_t. Signed-off-by: Logan Gunthorpe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Dan Williams <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: David Hildenbrand <[email protected]> Cc: Eric Badger <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Will Deacon <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm/memory_hotplug: rename mhp_restrictions to mhp_paramsLogan Gunthorpe10-33/+33
The mhp_restrictions struct really doesn't specify anything resembling a restriction anymore so rename it to be mhp_params as it is a list of extended parameters. Signed-off-by: Logan Gunthorpe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Dan Williams <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Eric Badger <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm/memory_hotplug: drop the flags field from struct mhp_restrictionsLogan Gunthorpe1-2/+0
Patch series "Allow setting caching mode in arch_add_memory() for P2PDMA", v4. Currently, the page tables created using memremap_pages() are always created with the PAGE_KERNEL cacheing mode. However, the P2PDMA code is creating pages for PCI BAR memory which should never be accessed through the cache and instead use either WC or UC. This still works in most cases, on x86, because the MTRR registers typically override the caching settings in the page tables for all of the IO memory to be UC-. However, this tends not to work so well on other arches or some rare x86 machines that have firmware which does not setup the MTRR registers in this way. Instead of this, this series proposes a change to arch_add_memory() to take the pgprot required by the mapping which allows us to explicitly set pagetable entries for P2PDMA memory to UC. This changes is pretty routine for most of the arches: x86_64, arm64 and powerpc simply need to thread the pgprot through to where the page tables are setup. x86_32 unfortunately sets up the page tables at boot so must use _set_memory_prot() to change their caching mode. ia64, s390 and sh don't appear to have an easy way to change the page tables so, for now at least, we just return -EINVAL on such mappings and thus they will not support P2PDMA memory until the work for this is done. This should be fine as they don't yet support ZONE_DEVICE. This patch (of 7): This variable is not used anywhere and should therefore be removed from the structure. Signed-off-by: Logan Gunthorpe <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Reviewed-by: Dan Williams <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Will Deacon <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Eric Badger <[email protected]> Cc: "H. Peter Anvin" <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Mackerras <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm/special: create generic fallbacks for pte_special() and pte_mkspecial()Anshuman Khandual21-95/+58
Currently there are many platforms that dont enable ARCH_HAS_PTE_SPECIAL but required to define quite similar fallback stubs for special page table entry helpers such as pte_special() and pte_mkspecial(), as they get build in generic MM without a config check. This creates two generic fallback stub definitions for these helpers, eliminating much code duplication. mips platform has a special case where pte_special() and pte_mkspecial() visibility is wider than what ARCH_HAS_PTE_SPECIAL enablement requires. This restricts those symbol visibility in order to avoid redefinitions which is now exposed through this new generic stubs and subsequent build failure. arm platform set_pte_at() definition needs to be moved into a C file just to prevent a build failure. [[email protected]: use defined(CONFIG_ARCH_HAS_PTE_SPECIAL) in mips per Thomas] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Guo Ren <[email protected]> [csky] Acked-by: Geert Uytterhoeven <[email protected]> [m68k] Acked-by: Stafford Horne <[email protected]> [openrisc] Acked-by: Helge Deller <[email protected]> [parisc] Cc: Richard Henderson <[email protected]> Cc: Ivan Kokshaysky <[email protected]> Cc: Matt Turner <[email protected]> Cc: Russell King <[email protected]> Cc: Brian Cain <[email protected]> Cc: Tony Luck <[email protected]> Cc: Fenghua Yu <[email protected]> Cc: Sam Creasey <[email protected]> Cc: Michal Simek <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Paul Burton <[email protected]> Cc: Nick Hu <[email protected]> Cc: Greentime Hu <[email protected]> Cc: Vincent Chen <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Stefan Kristiansson <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Richard Weinberger <[email protected]> Cc: Anton Ivanov <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Chris Zankel <[email protected]> Cc: Max Filippov <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm/vma: introduce VM_ACCESS_FLAGSAnshuman Khandual11-12/+16
There are many places where all basic VMA access flags (read, write, exec) are initialized or checked against as a group. One such example is during page fault. Existing vma_is_accessible() wrapper already creates the notion of VMA accessibility as a group access permissions. Hence lets just create VM_ACCESS_FLAGS (VM_READ|VM_WRITE|VM_EXEC) which will not only reduce code duplication but also extend the VMA accessibility concept in general. Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Cc: Russell King <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Mark Salter <[email protected]> Cc: Nick Hu <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Rob Springer <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm/vma: define a default value for VM_DATA_DEFAULT_FLAGSAnshuman Khandual28-89/+31
There are many platforms with exact same value for VM_DATA_DEFAULT_FLAGS This creates a default value for VM_DATA_DEFAULT_FLAGS in line with the existing VM_STACK_DEFAULT_FLAGS. While here, also define some more macros with standard VMA access flag combinations that are used frequently across many platforms. Apart from simplification, this reduces code duplication as well. Signed-off-by: Anshuman Khandual <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Vlastimil Babka <[email protected]> Acked-by: Geert Uytterhoeven <[email protected]> Cc: Richard Henderson <[email protected]> Cc: Vineet Gupta <[email protected]> Cc: Russell King <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Mark Salter <[email protected]> Cc: Guo Ren <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Brian Cain <[email protected]> Cc: Tony Luck <[email protected]> Cc: Michal Simek <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Paul Burton <[email protected]> Cc: Nick Hu <[email protected]> Cc: Ley Foon Tan <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: "James E.J. Bottomley" <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Paul Walmsley <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Rich Felker <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Guan Xuetao <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Chris Zankel <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm/memory.c: add vm_insert_pages()Arjun Roy2-2/+129
Add the ability to insert multiple pages at once to a user VM with lower PTE spinlock operations. The intention of this patch-set is to reduce atomic ops for tcp zerocopy receives, which normally hits the same spinlock multiple times consecutively. [[email protected]: pte_alloc() no longer takes the `addr' argument] [[email protected]: add missing page_count() check to vm_insert_pages()] Link: http://lkml.kernel.org/r/[email protected] [[email protected]: vm_insert_pages() checks if pte_index defined] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arjun Roy <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: David Miller <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Stephen Rothwell <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm: define pte_index as macro for x86Arjun Roy1-0/+3
pte_index() is either defined as a macro (e.g. sparc64) or as an inlined function (e.g. x86). vm_insert_pages() depends on pte_index but it is not defined on all platforms (e.g. m68k). To fix compilation of vm_insert_pages() on architectures not providing pte_index(), we perform the following fix: 0. For platforms where it is meaningful, and defined as a macro, no change is needed. 1. For platforms where it is meaningful and defined as an inlined function, and we want to use it with vm_insert_pages(), we define a degenerate macro of the form: #define pte_index pte_index 2. vm_insert_pages() checks for the existence of a pte_index macro definition. If found, it implements a batched insert. If not found, it devolves to calling vm_insert_page() in a loop. This patch implements step 1 for x86. v3 of this patch fixes a compilation warning for an unused method. v2 of this patch moved a macro definition to a more readable location. Signed-off-by: Arjun Roy <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: David Miller <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Cc: Stephen Rothwell <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm: bring sparc pte_index() semantics inline with other platformsArjun Roy1-5/+5
pte_index() on platforms other than sparc return a numerical index. On sparc, it returns a pte_t*. This presents an issue for vm_insert_pages(), which relies on pte_index() to find the offset for a pte within a pmd, for batched inserts. This patch: 1. Modifies pte_index() for sparc to return a numerical index, like other platforms, 2. Defines pte_entry() for sparc which returns a pte_t* (as pte_index() used to), 3. Converts existing sparc callers for pte_index() to use pte_entry(). [[email protected]: remove pte_entry and just directly modified pte_offset_kernel instead] Signed-off-by: Arjun Roy <[email protected]> Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Cc: Eric Dumazet <[email protected]> Cc: Soheil Hassas Yeganeh <[email protected]> Cc: David Miller <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Arjun Roy <[email protected]> Cc: Jason Gunthorpe <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm/memory.c: refactor insert_page to prepare for batched-lock insertArjun Roy1-15/+24
Add helper methods for vm_insert_page()/insert_page() to prepare for vm_insert_pages(), which batch-inserts pages to reduce spinlock operations when inserting multiple consecutive pages into the user page table. The intention of this patch-set is to reduce atomic ops for tcp zerocopy receives, which normally hits the same spinlock multiple times consecutively. Signed-off-by: Arjun Roy <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Soheil Hassas Yeganeh <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: David Miller <[email protected]> Cc: Matthew Wilcox <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Stephen Rothwell <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm/mmap.c: initialize align_offset explicitly for vm_unmapped_areaJaewon Kim1-0/+2
On passing requirement to vm_unmapped_area, arch_get_unmapped_area and arch_get_unmapped_area_topdown did not set align_offset. Internally on both unmapped_area and unmapped_area_topdown, if info->align_mask is 0, then info->align_offset was meaningless. But commit df529cabb7a2 ("mm: mmap: add trace point of vm_unmapped_area") always prints info->align_offset even though it is uninitialized. Fix this uninitialized value issue by setting it to 0 explicitly. Before: vm_unmapped_area: addr=0x755b155000 err=0 total_vm=0x15aaf0 flags=0x1 len=0x109000 lo=0x8000 hi=0x75eed48000 mask=0x0 ofs=0x4022 After: vm_unmapped_area: addr=0x74a4ca1000 err=0 total_vm=0x168ab1 flags=0x1 len=0x9000 lo=0x8000 hi=0x753d94b000 mask=0x0 ofs=0x0 Signed-off-by: Jaewon Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Cc: Matthew Wilcox (Oracle) <[email protected]> Cc: Michel Lespinasse <[email protected]> Cc: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm: hugetlb: optionally allocate gigantic hugepages using cmaRoman Gushchin5-0/+139
Commit 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime") has added the run-time allocation of gigantic pages. However it actually works only at early stages of the system loading, when the majority of memory is free. After some time the memory gets fragmented by non-movable pages, so the chances to find a contiguous 1GB block are getting close to zero. Even dropping caches manually doesn't help a lot. At large scale rebooting servers in order to allocate gigantic hugepages is quite expensive and complex. At the same time keeping some constant percentage of memory in reserved hugepages even if the workload isn't using it is a big waste: not all workloads can benefit from using 1 GB pages. The following solution can solve the problem: 1) On boot time a dedicated cma area* is reserved. The size is passed as a kernel argument. 2) Run-time allocations of gigantic hugepages are performed using the cma allocator and the dedicated cma area In this case gigantic hugepages can be allocated successfully with a high probability, however the memory isn't completely wasted if nobody is using 1GB hugepages: it can be used for pagecache, anon memory, THPs, etc. * On a multi-node machine a per-node cma area is allocated on each node. Following gigantic hugetlb allocation are using the first available numa node if the mask isn't specified by a user. Usage: 1) configure the kernel to allocate a cma area for hugetlb allocations: pass hugetlb_cma=10G as a kernel argument 2) allocate hugetlb pages as usual, e.g. echo 10 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages If the option isn't enabled or the allocation of the cma area failed, the current behavior of the system is preserved. x86 and arm-64 are covered by this patch, other architectures can be trivially added later. The patch contains clean-ups and fixes proposed and implemented by Aslan Bakirov and Randy Dunlap. It also contains ideas and suggestions proposed by Rik van Riel, Michal Hocko and Mike Kravetz. Thanks! Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Tested-by: Andreas Schaufler <[email protected]> Acked-by: Mike Kravetz <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Aslan Bakirov <[email protected]> Cc: Randy Dunlap <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Joonsoo Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm: cma: NUMA node interfaceAslan Bakirov4-10/+25
I've noticed that there is no interface exposed by CMA which would let me to declare contigous memory on particular NUMA node. This patchset adds the ability to try to allocate contiguous memory on a specific node. It will fallback to other nodes if the specified one doesn't work. Implement a new method for declaring contigous memory on particular node and keep cma_declare_contiguous() as a wrapper. [[email protected]: build fix] Signed-off-by: Aslan Bakirov <[email protected]> Signed-off-by: Roman Gushchin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Andreas Schaufler <[email protected]> Cc: Mike Kravetz <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Joonsoo Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10ocfs2: no need try to truncate file beyond i_sizeChangwei Ge1-0/+4
Linux fallocate(2) with FALLOC_FL_PUNCH_HOLE mode set, its offset can exceed the inode size. Ocfs2 now doesn't allow that offset beyond inode size. This restriction is not necessary and violates fallocate(2) semantics. If fallocate(2) offset is beyond inode size, just return success and do nothing further. Otherwise, ocfs2 will crash the kernel. kernel BUG at fs/ocfs2//alloc.c:7264! ocfs2_truncate_inline+0x20f/0x360 [ocfs2] ocfs2_remove_inode_range+0x23c/0xcb0 [ocfs2] __ocfs2_change_file_space+0x4a5/0x650 [ocfs2] ocfs2_fallocate+0x83/0xa0 [ocfs2] vfs_fallocate+0x148/0x230 SyS_fallocate+0x48/0x80 do_syscall_64+0x79/0x170 Signed-off-by: Changwei Ge <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Joseph Qi <[email protected]> Cc: Mark Fasheh <[email protected]> Cc: Joel Becker <[email protected]> Cc: Junxiao Bi <[email protected]> Cc: Changwei Ge <[email protected]> Cc: Gang He <[email protected]> Cc: Jun Piao <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm/page_alloc: make pcpu_drain_mutex and pcpu_drain staticJason Yan1-2/+2
Fix the following sparse warning: mm/page_alloc.c:106:1: warning: symbol 'pcpu_drain_mutex' was not declared. Should it be static? mm/page_alloc.c:107:1: warning: symbol '__pcpu_scope_pcpu_drain' was not declared. Should it be static? Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm/page_alloc.c: fix kernel-doc warningRandy Dunlap1-0/+1
Add description of function parameter 'mt' to fix kernel-doc warning: mm/page_alloc.c:3246: warning: Function parameter or member 'mt' not described in '__putback_isolated_page' Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Pankaj Gupta <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10docs: mm: slab.h: fix a broken cross-referenceMauro Carvalho Chehab1-1/+1
There is a typo at the cross-reference link, causing this warning: include/linux/slab.h:11: WARNING: undefined label: memory-allocation (if the link has no caption the label must precede a section header) Signed-off-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Link: http://lkml.kernel.org/r/0aeac24235d356ebd935d11e147dcc6edbb6465c.1586359676.git.mchehab+huawei@kernel.org Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm, slab_common: fix a typo in comment "eariler"->"earlier"Qiujun Huang1-1/+1
There is a typo in comment, fix it. s/eariler/earlier/ Signed-off-by: Qiujun Huang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Christoph Lameter <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-10mm, memcg: do not high throttle allocators based on wraparoundJakub Kicinski1-0/+3
If a cgroup violates its memory.high constraints, we may end up unduly penalising it. For example, for the following hierarchy: A: max high, 20 usage A/B: 9 high, 10 usage A/C: max high, 10 usage We would end up doing the following calculation below when calculating high delay for A/B: A/B: 10 - 9 = 1... A: 20 - PAGE_COUNTER_MAX = 21, so set max_overage to 21. This gets worse with higher disparities in usage in the parent. I have no idea how this disappeared from the final version of the patch, but it is certainly Not Good(tm). This wasn't obvious in testing because, for a simple cgroup hierarchy with only one child, the result is usually roughly the same. It's only in more complex hierarchies that things go really awry (although still, the effects are limited to a maximum of 2 seconds in schedule_timeout_killable at a maximum). [[email protected]: changelog] Fixes: e26733e0d0ec ("mm, memcg: throttle allocators based on ancestral memory.high") Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Chris Down <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: <[email protected]> [5.4.x] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>