aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-02-23KVM: nVMX: Don't emulate instructions in guest modePaolo Bonzini1-1/+1
vmx_check_intercept is not yet fully implemented. To avoid emulating instructions disallowed by the L1 hypervisor, refuse to emulate instructions by default. Cc: [email protected] [Made commit, added commit msg - Oliver] Signed-off-by: Oliver Upton <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-23KVM: nVMX: Emulate MTF when performing instruction emulationOliver Upton8-2/+83
Since commit 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG"), KVM has allowed an L1 guest to use the monitor trap flag processor-based execution control for its L2 guest. KVM simply forwards any MTF VM-exits to the L1 guest, which works for normal instruction execution. However, when KVM needs to emulate an instruction on the behalf of an L2 guest, the monitor trap flag is not emulated. Add the necessary logic to kvm_skip_emulated_instruction() to synthesize an MTF VM-exit to L1 upon instruction emulation for L2. Fixes: 5f3d45e7f282 ("kvm/x86: add support for MONITOR_TRAP_FLAG") Signed-off-by: Oliver Upton <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-23KVM: fix error handling in svm_hardware_setupLi RongQing1-21/+20
rename svm_hardware_unsetup as svm_hardware_teardown, move it before svm_hardware_setup, and call it to free all memory if fail to setup in svm_hardware_setup, otherwise memory will be leaked remove __exit attribute for it since it is called in __init function Signed-off-by: Li RongQing <[email protected]> Reviewed-by: Vitaly Kuznetsov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-22net: genetlink: return the error code when attribute parsing fails.Paolo Abeni1-2/+3
Currently if attribute parsing fails and the genl family does not support parallel operation, the error code returned by __nlmsg_parse() is discarded by genl_family_rcv_msg_attrs_parse(). Be sure to report the error for all genl families. Fixes: c10e6cf85e7d ("net: genetlink: push attrbuf allocation and parsing to a separate function") Fixes: ab5b526da048 ("net: genetlink: always allocate separate attrs for dumpit ops") Signed-off-by: Paolo Abeni <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-22ipv4: ensure rcu_read_lock() in cipso_v4_error()Matteo Croce1-1/+6
Similarly to commit c543cb4a5f07 ("ipv4: ensure rcu_read_lock() in ipv4_link_failure()"), __ip_options_compile() must be called under rcu protection. Fixes: 3da1ed7ac398 ("net: avoid use IPCB in cipso_v4_error") Suggested-by: Guillaume Nault <[email protected]> Signed-off-by: Matteo Croce <[email protected]> Acked-by: Paul Moore <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-22vhost: Check docket sk_family instead of call getnameEugenio Pérez1-9/+1
Doing so, we save one call to get data we already have in the struct. Also, since there is no guarantee that getname use sockaddr_ll parameter beyond its size, we add a little bit of security here. It should do not do beyond MAX_ADDR_LEN, but syzbot found that ax25_getname writes more (72 bytes, the size of full_sockaddr_ax25, versus 20 + 32 bytes of sockaddr_ll + MAX_ADDR_LEN in syzbot repro). Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server") Reported-by: [email protected] Signed-off-by: Eugenio Pérez <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-23csky: Replace <linux/clk-provider.h> by <linux/of_clk.h>Geert Uytterhoeven1-1/+1
The C-Sky platform code is not a clock provider, and just needs to call of_clk_init(). Hence it can include <linux/of_clk.h> instead of <linux/clk-provider.h>. Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Guo Ren <[email protected]>
2020-02-22Merge tag 'ras-urgent-2020-02-22' of ↵Linus Torvalds1-23/+27
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RAS fixes from Thomas Gleixner: "Two fixes for the AMD MCE driver: - Populate the per CPU MCA bank descriptor pointer only after it has been completely set up to prevent a use-after-free in case that one of the subsequent initialization step fails - Implement a proper release function for the sysfs entries of MCA threshold controls instead of freeing the memory right in the CPU teardown code, which leads to another use-after-free when the associated sysfs file is opened and accessed" * tag 'ras-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/mce/amd: Fix kobject lifetime x86/mce/amd: Publish the bank pointer only after setup has succeeded
2020-02-22audit: fix error handling in audit_data_to_entry()Paul Moore1-32/+39
Commit 219ca39427bf ("audit: use union for audit_field values since they are mutually exclusive") combined a number of separate fields in the audit_field struct into a single union. Generally this worked just fine because they are generally mutually exclusive. Unfortunately in audit_data_to_entry() the overlap can be a problem when a specific error case is triggered that causes the error path code to attempt to cleanup an audit_field struct and the cleanup involves attempting to free a stored LSM string (the lsm_str field). Currently the code always has a non-NULL value in the audit_field.lsm_str field as the top of the for-loop transfers a value into audit_field.val (both .lsm_str and .val are part of the same union); if audit_data_to_entry() fails and the audit_field struct is specified to contain a LSM string, but the audit_field.lsm_str has not yet been properly set, the error handling code will attempt to free the bogus audit_field.lsm_str value that was set with audit_field.val at the top of the for-loop. This patch corrects this by ensuring that the audit_field.val is only set when needed (it is cleared when the audit_field struct is allocated with kcalloc()). It also corrects a few other issues to ensure that in case of error the proper error code is returned. Cc: [email protected] Fixes: 219ca39427bf ("audit: use union for audit_field values since they are mutually exclusive") Reported-by: [email protected] Signed-off-by: Paul Moore <[email protected]>
2020-02-22Merge tag 'irq-urgent-2020-02-22' of ↵Linus Torvalds4-19/+25
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fixes from Thomas Gleixner: "Two fixes for the irq core code which are follow ups to the recent MSI fixes: - The WARN_ON which was put into the MSI setaffinity callback for paranoia reasons actually triggered via a callchain which escaped when all the possible ways to reach that code were analyzed. The proc/irq/$N/*affinity interfaces have a quirk which came in when ALPHA moved to the generic interface: In case that the written affinity mask does not contain any online CPU it calls into ALPHAs magic auto affinity setting code. A few years later this mechanism was also made available to x86 for no good reasons and in a way which circumvents all sanity checks for interrupts which cannot have their affinity set from process context on X86 due to the way the X86 interrupt delivery works. It would be possible to make this work properly, but there is no point in doing so. If the interrupt is not yet started then the affinity setting has no effect and if it is started already then it is already assigned to an online CPU so there is no point to randomly move it to some other CPU. Just return EINVAL as the code has done before that change forever. - The new MSI quirk bit in the irq domain flags turned out to be already occupied, which escaped the author and the reviewers because the already in use bits were 0,6,2,3,4,5 listed in that order. That bit 6 was simply overlooked because the ordering was straight forward linear otherwise. So the new bit ended up being a duplicate. Fix it up by switching the oddball 6 to the obvious 1" * tag 'irq-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: genirq/irqdomain: Make sure all irq domain flags are distinct genirq/proc: Reject invalid affinity masks (again)
2020-02-22Merge tag 'x86-urgent-2020-02-22' of ↵Linus Torvalds3-3/+16
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Thomas Gleixner: "Two fixes for x86: - Remove the __force_oder definiton from the kaslr boot code as it is already defined in the page table code which makes GCC 10 builds fail because it changed the default to -fno-common. - Address the AMD erratum 1054 concerning the IRPERF capability and enable the Instructions Retired fixed counter on machines which are not affected by the erratum" * tag 'x86-urgent-2020-02-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/cpu/amd: Enable the fixed Instructions Retired counter IRPERF x86/boot/compressed: Don't declare __force_order in kaslr_64.c
2020-02-22Merge tag 'zonefs-5.6-rc3' of ↵Linus Torvalds1-10/+10
git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs Pull zonefs fix from Damien Le Moal: "A single patch fixing typos in the documentation file" * tag 'zonefs-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs: zonefs: fix documentation typos etc.
2020-02-22Merge tag 'io_uring-5.6-2020-02-22' of git://git.kernel.dk/linux-blockLinus Torvalds1-35/+30
Pull io_uring fixes from Jens Axboe: "Here's a small collection of fixes that were queued up: - Remove unnecessary NULL check (Dan) - Missing io_req_cancelled() call in fallocate (Pavel) - Put the cleanup check for aux data in the right spot (Pavel) - Two fixes for SQPOLL (Stefano, Xiaoguang)" * tag 'io_uring-5.6-2020-02-22' of git://git.kernel.dk/linux-block: io_uring: fix __io_iopoll_check deadlock in io_sq_thread io_uring: prevent sq_thread from spinning when it should stop io_uring: fix use-after-free by io_cleanup_req() io_uring: remove unnecessary NULL checks io_uring: add missing io_req_cancelled()
2020-02-22Merge tag 'block-5.6-2020-02-22' of git://git.kernel.dk/linux-blockLinus Torvalds3-2/+16
Pull block fixes from Jens Axboe: "Just a set of NVMe fixes via Keith" * tag 'block-5.6-2020-02-22' of git://git.kernel.dk/linux-block: nvme-multipath: Fix memory leak with ana_log_buf nvme: Fix uninitialized-variable warning nvme-pci: Use single IRQ vector for old Apple models nvme/pci: Add sleep quirk for Samsung and Toshiba drives
2020-02-22Merge tag 'scsi-fixes' of ↵Linus Torvalds5-17/+48
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Four non-core fixes. Two are reverts of target fixes which turned out to have unwanted side effects, one is a revert of an RDMA fix with the same problem and the final one fixes an incorrect warning about memory allocation failures in megaraid_sas (the driver actually reduces the allocation size until it succeeds)" Signed-off-by: James E.J. Bottomley <[email protected]> * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: Revert "target: iscsi: Wait for all commands to finish before freeing a session" scsi: Revert "RDMA/isert: Fix a recently introduced regression related to logout" scsi: megaraid_sas: silence a warning scsi: Revert "target/core: Inline transport_lun_remove_cmd()"
2020-02-22Merge tag 'hwmon-for-v5.6-rc3' of ↵Linus Torvalds3-9/+15
git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging Pull hwmon fixes from Guenter Roeck: - Fix crash in w83627ehf driver seen with W83627DHG-P - Fix lockdep splat in acpi_power_meter driver - Fix xdpe12284 documentation Sphinx warnings * tag 'hwmon-for-v5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (w83627ehf) Fix crash seen with W83627DHG-P hwmon: (acpi_power_meter) Fix lockdep splat Documentation/hwmon: fix xdpe12284 Sphinx warnings
2020-02-22Merge tag 'devicetree-fixes-for-5.6-2' of ↵Linus Torvalds4-22/+41
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree fixes deom Rob Herring: "A handful of fixes in DT bindings for MDIO bus, Allwinner CSI, OMAP HSMMC, and Tegra124 EMC" * tag 'devicetree-fixes-for-5.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: dt-bindings: media: csi: Fix clocks description dt-bindings: media: csi: Add interconnects properties dt-bindings: net: mdio: remove compatible string from example dt-bindings: memory-controller: Update example for Tegra124 EMC dt-bindings: mmc: omap-hsmmc: Fix SDIO interrupt
2020-02-22Merge tag 's390-5.6-4' of ↵Linus Torvalds17-75/+64
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Vasily Gorbik: - Remove ieee_emulation_warnings sysctl which is a dead code. - Avoid triggering rebuild of the kernel during make install. - Enable protected virtualization guest support in default configs. - Fix cio_ignore seq_file .next function to increase position index. And use kobj_to_dev instead of container_of in cio code. - Fix storage block address lists to contain absolute addresses in qdio code. - Few clang warnings and spelling fixes. * tag 's390-5.6-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/qdio: fill SBALEs with absolute addresses s390/qdio: fill SL with absolute addresses s390: remove obsolete ieee_emulation_warnings s390: make 'install' not depend on vmlinux s390/kaslr: Fix casts in get_random s390/mm: Explicitly compare PAGE_DEFAULT_KEY against zero in storage_key_init_range s390/pkey/zcrypt: spelling s/crytp/crypt/ s390/cio: use kobj_to_dev() API s390/defconfig: enable CONFIG_PROTECTED_VIRTUALIZATION_GUEST s390/cio: cio_ignore_proc_seq_next should increase position index
2020-02-22io_uring: fix __io_iopoll_check deadlock in io_sq_threadXiaoguang Wang1-18/+9
Since commit a3a0e43fd770 ("io_uring: don't enter poll loop if we have CQEs pending"), if we already events pending, we won't enter poll loop. In case SETUP_IOPOLL and SETUP_SQPOLL are both enabled, if app has been terminated and don't reap pending events which are already in cq ring, and there are some reqs in poll_list, io_sq_thread will enter __io_iopoll_check(), and find pending events, then return, this loop will never have a chance to exit. I have seen this issue in fio stress tests, to fix this issue, let io_sq_thread call io_iopoll_getevents() with argument 'min' being zero, and remove __io_iopoll_check(). Fixes: a3a0e43fd770 ("io_uring: don't enter poll loop if we have CQEs pending") Signed-off-by: Xiaoguang Wang <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-02-22netfilter: ipset: Fix forceadd evaluation pathJozsef Kadlecsik1-0/+2
When the forceadd option is enabled, the hash:* types should find and replace the first entry in the bucket with the new one if there are no reuseable (deleted or timed out) entries. However, the position index was just not set to zero and remained the invalid -1 if there were no reuseable entries. Reported-by: [email protected] Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <[email protected]>
2020-02-22arm64: Ask the compiler to __always_inline functions used by KVM at HYPJames Morse4-8/+8
KVM uses some of the static-inline helpers like icache_is_vipt() from its HYP code. This assumes the function is inlined so that the code is mapped to EL2. The compiler may decide not to inline these, and the out-of-line version may not be in the __hyp_text section. Add the additional __always_ hint to these static-inlines that are used by KVM. Signed-off-by: James Morse <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Acked-by: Will Deacon <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-02-22KVM: arm64: Define our own swab32() to avoid a uapi static inlineJames Morse2-2/+9
KVM uses swab32() when mediating GIC MMIO accesses if the GICV is badly aligned, and the host and guest differ in endianness. arm64 doesn't provide a __arch_swab32(), so __fswab32() is always backed by the macro implementation that the compiler reduces to a single instruction. But the static-inline causes problems for KVM if the compiler chooses not to inline this function, it may not be located in the __hyp_text where __vgic_v2_perform_cpuif_access() needs it. Create our own __kvm_swab32() macro that calls ___constant_swab32() directly. This way we know it will always be inlined. Signed-off-by: James Morse <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-02-22KVM: arm64: Ask the compiler to __always_inline functions used at HYPJames Morse5-28/+29
On non VHE CPUs, KVM's __hyp_text contains code run at EL2 while the rest of the kernel runs at EL1. This code lives in its own section with start and end markers so we can map it to EL2. The compiler may decide not to inline static-inline functions from the header file. It may also decide not to put these out-of-line functions in the same section, meaning they aren't mapped when called at EL2. Clang-9 does exactly this with __kern_hyp_va() and a few others when x18 is reserved for the shadow call stack. Add the additional __always_ hint to all the static-inlines that are called from a hyp file. Signed-off-by: James Morse <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] ---- kvm_get_hyp_vector() pulls in all the regular per-cpu accessors and this_cpu_has_cap(), fortunately its only called for VHE.
2020-02-22netfilter: ipset: Fix "INFO: rcu detected stall in hash_xxx" reportsJozsef Kadlecsik3-206/+472
In the case of huge hash:* types of sets, due to the single spinlock of a set the processing of the whole set under spinlock protection could take too long. There were four places where the whole hash table of the set was processed from bucket to bucket under holding the spinlock: - During resizing a set, the original set was locked to exclude kernel side add/del element operations (userspace add/del is excluded by the nfnetlink mutex). The original set is actually just read during the resize, so the spinlocking is replaced with rcu locking of regions. However, thus there can be parallel kernel side add/del of entries. In order not to loose those operations a backlog is added and replayed after the successful resize. - Garbage collection of timed out entries was also protected by the spinlock. In order not to lock too long, region locking is introduced and a single region is processed in one gc go. Also, the simple timer based gc running is replaced with a workqueue based solution. The internal book-keeping (number of elements, size of extensions) is moved to region level due to the region locking. - Adding elements: when the max number of the elements is reached, the gc was called to evict the timed out entries. The new approach is that the gc is called just for the matching region, assuming that if the region (proportionally) seems to be full, then the whole set does. We could scan the other regions to check every entry under rcu locking, but for huge sets it'd mean a slowdown at adding elements. - Listing the set header data: when the set was defined with timeout support, the garbage collector was called to clean up timed out entries to get the correct element numbers and set size values. Now the set is scanned to check non-timed out entries, without actually calling the gc for the whole set. Thanks to Florian Westphal for helping me to solve the SOFTIRQ-safe -> SOFTIRQ-unsafe lock order issues during working on the patch. Reported-by: [email protected] Reported-by: [email protected] Reported-by: [email protected] Fixes: 23c42a403a9c ("netfilter: ipset: Introduction of new commands and protocol version 7") Signed-off-by: Jozsef Kadlecsik <[email protected]>
2020-02-21ext4: fix mount failure with quota configured as moduleJan Kara1-1/+1
When CONFIG_QFMT_V2 is configured as a module, the test in ext4_feature_set_ok() fails and so mount of filesystems with quota or project features fails. Fix the test to use IS_ENABLED macro which works properly even for modules. Link: https://lore.kernel.org/r/[email protected] Fixes: d65d87a07476 ("ext4: improve explanation of a mount failure caused by a misconfigured kernel") Signed-off-by: Jan Kara <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Cc: [email protected]
2020-02-21jbd2: fix ocfs2 corrupt when clearing block group bitswangyan1-2/+6
I found a NULL pointer dereference in ocfs2_block_group_clear_bits(). The running environment: kernel version: 4.19 A cluster with two nodes, 5 luns mounted on two nodes, and do some file operations like dd/fallocate/truncate/rm on every lun with storage network disconnection. The fallocate operation on dm-23-45 caused an null pointer dereference. The information of NULL pointer dereference as follows: [577992.878282] JBD2: Error -5 detected when updating journal superblock for dm-23-45. [577992.878290] Aborting journal on device dm-23-45. ... [577992.890778] JBD2: Error -5 detected when updating journal superblock for dm-24-46. [577992.890908] __journal_remove_journal_head: freeing b_committed_data [577992.890916] (fallocate,88392,52):ocfs2_extend_trans:474 ERROR: status = -30 [577992.890918] __journal_remove_journal_head: freeing b_committed_data [577992.890920] (fallocate,88392,52):ocfs2_rotate_tree_right:2500 ERROR: status = -30 [577992.890922] __journal_remove_journal_head: freeing b_committed_data [577992.890924] (fallocate,88392,52):ocfs2_do_insert_extent:4382 ERROR: status = -30 [577992.890928] (fallocate,88392,52):ocfs2_insert_extent:4842 ERROR: status = -30 [577992.890928] __journal_remove_journal_head: freeing b_committed_data [577992.890930] (fallocate,88392,52):ocfs2_add_clusters_in_btree:4947 ERROR: status = -30 [577992.890933] __journal_remove_journal_head: freeing b_committed_data [577992.890939] __journal_remove_journal_head: freeing b_committed_data [577992.890949] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 [577992.890950] Mem abort info: [577992.890951] ESR = 0x96000004 [577992.890952] Exception class = DABT (current EL), IL = 32 bits [577992.890952] SET = 0, FnV = 0 [577992.890953] EA = 0, S1PTW = 0 [577992.890954] Data abort info: [577992.890955] ISV = 0, ISS = 0x00000004 [577992.890956] CM = 0, WnR = 0 [577992.890958] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000f8da07a9 [577992.890960] [0000000000000020] pgd=0000000000000000 [577992.890964] Internal error: Oops: 96000004 [#1] SMP [577992.890965] Process fallocate (pid: 88392, stack limit = 0x00000000013db2fd) [577992.890968] CPU: 52 PID: 88392 Comm: fallocate Kdump: loaded Tainted: G W OE 4.19.36 #1 [577992.890969] Hardware name: Huawei TaiShan 2280 V2/BC82AMDD, BIOS 0.98 08/25/2019 [577992.890971] pstate: 60400009 (nZCv daif +PAN -UAO) [577992.891054] pc : _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2] [577992.891082] lr : _ocfs2_free_suballoc_bits+0x618/0x968 [ocfs2] [577992.891084] sp : ffff0000c8e2b810 [577992.891085] x29: ffff0000c8e2b820 x28: 0000000000000000 [577992.891087] x27: 00000000000006f3 x26: ffffa07957b02e70 [577992.891089] x25: ffff807c59d50000 x24: 00000000000006f2 [577992.891091] x23: 0000000000000001 x22: ffff807bd39abc30 [577992.891093] x21: ffff0000811d9000 x20: ffffa07535d6a000 [577992.891097] x19: ffff000001681638 x18: ffffffffffffffff [577992.891098] x17: 0000000000000000 x16: ffff000080a03df0 [577992.891100] x15: ffff0000811d9708 x14: 203d207375746174 [577992.891101] x13: 73203a524f525245 x12: 20373439343a6565 [577992.891103] x11: 0000000000000038 x10: 0101010101010101 [577992.891106] x9 : ffffa07c68a85d70 x8 : 7f7f7f7f7f7f7f7f [577992.891109] x7 : 0000000000000000 x6 : 0000000000000080 [577992.891110] x5 : 0000000000000000 x4 : 0000000000000002 [577992.891112] x3 : ffff000001713390 x2 : 2ff90f88b1c22f00 [577992.891114] x1 : ffff807bd39abc30 x0 : 0000000000000000 [577992.891116] Call trace: [577992.891139] _ocfs2_free_suballoc_bits+0x63c/0x968 [ocfs2] [577992.891162] _ocfs2_free_clusters+0x100/0x290 [ocfs2] [577992.891185] ocfs2_free_clusters+0x50/0x68 [ocfs2] [577992.891206] ocfs2_add_clusters_in_btree+0x198/0x5e0 [ocfs2] [577992.891227] ocfs2_add_inode_data+0x94/0xc8 [ocfs2] [577992.891248] ocfs2_extend_allocation+0x1bc/0x7a8 [ocfs2] [577992.891269] ocfs2_allocate_extents+0x14c/0x338 [ocfs2] [577992.891290] __ocfs2_change_file_space+0x3f8/0x610 [ocfs2] [577992.891309] ocfs2_fallocate+0xe4/0x128 [ocfs2] [577992.891316] vfs_fallocate+0x11c/0x250 [577992.891317] ksys_fallocate+0x54/0x88 [577992.891319] __arm64_sys_fallocate+0x28/0x38 [577992.891323] el0_svc_common+0x78/0x130 [577992.891325] el0_svc_handler+0x38/0x78 [577992.891327] el0_svc+0x8/0xc My analysis process as follows: ocfs2_fallocate __ocfs2_change_file_space ocfs2_allocate_extents ocfs2_extend_allocation ocfs2_add_inode_data ocfs2_add_clusters_in_btree ocfs2_insert_extent ocfs2_do_insert_extent ocfs2_rotate_tree_right ocfs2_extend_rotate_transaction ocfs2_extend_trans jbd2_journal_restart jbd2__journal_restart /* handle->h_transaction is NULL, * is_handle_aborted(handle) is true */ handle->h_transaction = NULL; start_this_handle return -EROFS; ocfs2_free_clusters _ocfs2_free_clusters _ocfs2_free_suballoc_bits ocfs2_block_group_clear_bits ocfs2_journal_access_gd __ocfs2_journal_access jbd2_journal_get_undo_access /* I think jbd2_write_access_granted() will * return true, because do_get_write_access() * will return -EROFS. */ if (jbd2_write_access_granted(...)) return 0; do_get_write_access /* handle->h_transaction is NULL, it will * return -EROFS here, so do_get_write_access() * was not called. */ if (is_handle_aborted(handle)) return -EROFS; /* bh2jh(group_bh) is NULL, caused NULL pointer dereference */ undo_bg = (struct ocfs2_group_desc *) bh2jh(group_bh)->b_committed_data; If handle->h_transaction == NULL, then jbd2_write_access_granted() does not really guarantee that journal_head will stay around, not even speaking of its b_committed_data. The bh2jh(group_bh) can be removed after ocfs2_journal_access_gd() and before call "bh2jh(group_bh)->b_committed_data". So, we should move is_handle_aborted() check from do_get_write_access() into jbd2_journal_get_undo_access() and jbd2_journal_get_write_access() before the call to jbd2_write_access_granted(). Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Yan Wang <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Reviewed-by: Jun Piao <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: [email protected]
2020-02-21ext4: fix race between writepages and enabling EXT4_EXTENTS_FLEric Biggers2-9/+23
If EXT4_EXTENTS_FL is set on an inode while ext4_writepages() is running on it, the following warning in ext4_add_complete_io() can be hit: WARNING: CPU: 1 PID: 0 at fs/ext4/page-io.c:234 ext4_put_io_end_defer+0xf0/0x120 Here's a minimal reproducer (not 100% reliable) (root isn't required): while true; do sync done & while true; do rm -f file touch file chattr -e file echo X >> file chattr +e file done The problem is that in ext4_writepages(), ext4_should_dioread_nolock() (which only returns true on extent-based files) is checked once to set the number of reserved journal credits, and also again later to select the flags for ext4_map_blocks() and copy the reserved journal handle to ext4_io_end::handle. But if EXT4_EXTENTS_FL is being concurrently set, the first check can see dioread_nolock disabled while the later one can see it enabled, causing the reserved handle to unexpectedly be NULL. Since changing EXT4_EXTENTS_FL is uncommon, and there may be other races related to doing so as well, fix this by synchronizing changing EXT4_EXTENTS_FL with ext4_writepages() via the existing s_writepages_rwsem (previously called s_journal_flag_rwsem). This was originally reported by syzbot without a reproducer at https://syzkaller.appspot.com/bug?extid=2202a584a00fffd19fbf, but now that dioread_nolock is the default I also started seeing this when running syzkaller locally. Link: https://lore.kernel.org/r/[email protected] Reported-by: [email protected] Fixes: 6b523df4fb5a ("ext4: use transaction reservation for extent conversion in ext4_end_io") Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: [email protected]
2020-02-21ext4: rename s_journal_flag_rwsem to s_writepages_rwsemEric Biggers3-11/+11
In preparation for making s_journal_flag_rwsem synchronize ext4_writepages() with changes to both the EXTENTS and JOURNAL_DATA flags (rather than just JOURNAL_DATA as it does currently), rename it to s_writepages_rwsem. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Eric Biggers <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: [email protected]
2020-02-21ext4: fix potential race between s_flex_groups online resizing and accessSuraj Jitindar Singh5-37/+76
During an online resize an array of s_flex_groups structures gets replaced so it can get enlarged. If there is a concurrent access to the array and this memory has been reused then this can lead to an invalid memory access. The s_flex_group array has been converted into an array of pointers rather than an array of structures. This is to ensure that the information contained in the structures cannot get out of sync during a resize due to an accessor updating the value in the old structure after it has been copied but before the array pointer is updated. Since the structures them- selves are no longer copied but only the pointers to them this case is mitigated. Link: https://bugzilla.kernel.org/show_bug.cgi?id=206443 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Suraj Jitindar Singh <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]> Cc: [email protected]
2020-02-21Merge tag 'for-linus-5.6-rc3-tag' of ↵Linus Torvalds2-4/+7
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: "Two small fixes for Xen: - a fix to avoid warnings with new gcc - a fix for incorrectly disabled interrupts when calling _cond_resched()" * tag 'for-linus-5.6-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: Enable interrupts when calling _cond_resched() x86/xen: Distribute switch variables for initialization
2020-02-21Merge tag 'arm64-fixes' of ↵Linus Torvalds6-10/+12
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Will Deacon: "It's all straightforward apart from the changes to mmap()/mremap() in relation to their handling of address arguments from userspace with non-zero tag bits in the upper byte. The change to brk() is necessary to fix a nasty user-visible regression in malloc(), but we tightened up mmap() and mremap() at the same time because they also allow the user to create virtual aliases by accident. It's much less likely than brk() to matter in practice, but enforcing the principle of "don't permit the creation of mappings using tagged addresses" leads to a straightforward ABI without having to worry about the "but what if a crazy program did foo?" aspect of things. Summary: - Fix regression in malloc() caused by ignored address tags in brk() - Add missing brackets around argument to untagged_addr() macro - Fix clang build when using binutils assembler - Fix silly typo in virtual memory map documentation" * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: mm: Avoid creating virtual address aliases in brk()/mmap()/mremap() docs: arm64: fix trivial spelling enought to enough in memory.rst arm64: memory: Add missing brackets to untagged_addr() macro arm64: lse: Fix LSE atomics with LLVM
2020-02-21Merge tag 'powerpc-5.6-3' of ↵Linus Torvalds17-99/+308
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Some more powerpc fixes for 5.6. This is two weeks worth as I was out sick last week: - Three fixes for the recently added VMAP_STACK on 32-bit. - Three fixes related to hugepages on 8xx (32-bit). - A fix for a bug in our transactional memory handling that could lead to a kernel crash if we saw a page fault during signal delivery. - A fix for a deadlock in our PCI EEH (Enhanced Error Handling) code. - A couple of other minor fixes. Thanks to: Christophe Leroy, Erhard F, Frederic Barrat, Gustavo Luiz Duarte, Larry Finger, Leonardo Bras, Oliver O'Halloran, Sam Bobroff" * tag 'powerpc-5.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/entry: Fix an #if which should be an #ifdef in entry_32.S powerpc/xmon: Fix whitespace handling in getstring() powerpc/6xx: Fix power_save_ppc32_restore() with CONFIG_VMAP_STACK powerpc/chrp: Fix enter_rtas() with CONFIG_VMAP_STACK powerpc/32s: Fix DSI and ISI exceptions for CONFIG_VMAP_STACK powerpc/tm: Fix clearing MSR[TS] in current when reclaiming on signal delivery powerpc/8xx: Fix clearing of bits 20-23 in ITLB miss powerpc/hugetlb: Fix 8M hugepages on 8xx powerpc/hugetlb: Fix 512k hugepages on 8xx with 16k page size powerpc/eeh: Fix deadlock handling dead PHB
2020-02-21scsi: libfc: free response frame from GPN_IDIgor Druzhinin1-0/+2
fc_disc_gpn_id_resp() should be the last function using it so free it here to avoid memory leak. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Igor Druzhinin <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2020-02-21Merge tag 'linux-watchdog-5.6-rc3' of ↵Linus Torvalds2-7/+14
git://www.linux-watchdog.org/linux-watchdog Pull watchdog fixes from Wim Van Sebroeck: - mtk_wdt needs RESET_CONTROLLER to build - da9062 driver fixes: - fix power management ops - do not ping the hw during stop() - add dependency on I2C * tag 'linux-watchdog-5.6-rc3' of git://www.linux-watchdog.org/linux-watchdog: watchdog: da9062: Add dependency on I2C watchdog: da9062: fix power management ops watchdog: da9062: do not ping the hw during stop() watchdog: fix mtk_wdt.c RESET_CONTROLLER build error
2020-02-21Merge tag 'char-misc-5.6-rc3' of ↵Linus Torvalds7-19/+65
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are some small char/misc driver fixes for 5.6-rc3. Also included in here are some updates for some documentation files that I seem to be maintaining these days. The driver fixes are: - small fixes for the habanalabs driver - fsi driver bugfix All of these have been in linux-next for a while with no reported issues" * tag 'char-misc-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: Documentation/process: Swap out the ambassador for Canonical habanalabs: patched cb equals user cb in device memset habanalabs: do not halt CoreSight during hard reset habanalabs: halt the engines before hard-reset MAINTAINERS: remove unnecessary ':' characters fsi: aspeed: add unspecified HAS_IOMEM dependency COPYING: state that all contributions really are covered by this file Documentation/process: Change Microsoft contact for embargoed hardware issues embargoed-hardware-issues: drop Amazon contact as the email address now bounces Documentation/process: Add Arm contact for embargoed HW issues
2020-02-21Merge tag 'staging-5.6-rc3' of ↵Linus Torvalds11-1530/+56
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging driver fixes from Greg KH: "Here are some small staging driver fixes for 5.6-rc3, along with the removal of an unused/unneeded driver as well. The android vsoc driver is not needed anymore by anyone, so it was removed. The other driver fixes are: - ashmem bugfixes - greybus audio driver bugfix - wireless driver bugfixes and tiny cleanups to error paths All of these have been in linux-next for a while now with no reported issues" * tag 'staging-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: rtl8723bs: Remove unneeded goto statements staging: rtl8188eu: Remove some unneeded goto statements staging: rtl8723bs: Fix potential overuse of kernel memory staging: rtl8188eu: Fix potential overuse of kernel memory staging: rtl8723bs: Fix potential security hole staging: rtl8188eu: Fix potential security hole staging: greybus: use after free in gb_audio_manager_remove_all() staging: android: Delete the 'vsoc' driver staging: rtl8723bs: fix copy of overlapping memory staging: android: ashmem: Disallow ashmem memory from being remapped staging: vt6656: fix sign of rx_dbm to bb_pre_ed_rssi.
2020-02-21Merge tag 'tty-5.6-rc3' of ↵Linus Torvalds16-51/+104
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are a number of small tty and serial driver fixes for 5.6-rc3 that resolve a bunch of reported issues. They are: - vt selection and ioctl fixes - serdev bugfix - atmel serial driver fixes - qcom serial driver fixes - other minor serial driver fixes All of these have been in linux-next for a while with no reported issues" * tag 'tty-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: vt: selection, close sel_buffer race vt: selection, handle pending signals in paste_selection serial: cpm_uart: call cpm_muram_init before registering console tty: serial: qcom_geni_serial: Fix RX cancel command failure serial: 8250: Check UPF_IRQ_SHARED in advance tty: serial: imx: setup the correct sg entry for tx dma vt: vt_ioctl: fix race in VT_RESIZEX vt: fix scrollback flushing on background consoles tty: serial: tegra: Handle RX transfer in PIO mode if DMA wasn't started tty/serial: atmel: manage shutdown in case of RS485 or ISO7816 mode serdev: ttyport: restore client ops on deregistration serial: ar933x_uart: set UART_CS_{RX,TX}_READY_ORIDE
2020-02-21Merge tag 'usb-5.6-rc3' of ↵Linus Torvalds26-140/+327
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB/Thunderbolt fixes from Greg KH: "Here are a number of small USB driver fixes for 5.6-rc3. Included in here are: - MAINTAINER file updates - USB gadget driver fixes - usb core quirk additions and fixes for regressions - xhci driver fixes - usb serial driver id additions and fixes - thunderbolt bugfix Thunderbolt patches come in through here now that USB4 is really thunderbolt. All of these have been in linux-next for a while with no reported issues" * tag 'usb-5.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (34 commits) USB: misc: iowarrior: add support for the 100 device thunderbolt: Prevent crash if non-active NVMem file is read usb: gadget: udc-xilinx: Fix xudc_stop() kernel-doc format USB: misc: iowarrior: add support for the 28 and 28L devices USB: misc: iowarrior: add support for 2 OEMed devices USB: Fix novation SourceControl XL after suspend xhci: Fix memory leak when caching protocol extended capability PSI tables - take 2 Revert "xhci: Fix memory leak when caching protocol extended capability PSI tables" MAINTAINERS: Sort entries in database for THUNDERBOLT usb: dwc3: debug: fix string position formatting mixup with ret and len usb: gadget: serial: fix Tx stall after buffer overflow usb: gadget: ffs: ffs_aio_cancel(): Save/restore IRQ flags usb: dwc2: Fix SET/CLEAR_FEATURE and GET_STATUS flows usb: dwc2: Fix in ISOC request length checking usb: gadget: composite: Support more than 500mA MaxPower usb: gadget: composite: Fix bMaxPower for SuperSpeedPlus usb: gadget: u_audio: Fix high-speed max packet size usb: dwc3: gadget: Check for IOC/LST bit in TRB->ctrl fields USB: core: clean up endpoint-descriptor parsing USB: quirks: blacklist duplicate ep on Sound Devices USBPre2 ...
2020-02-21Merge tag 'drm-fixes-2020-02-21' of git://anongit.freedesktop.org/drm/drmLinus Torvalds50-243/+486
Pull drm fixes from Dave Airlie: "Varied fixes for rc3. i915 is the largest, they are seeing some ACPI problems with their CI which hopefully get solved soon [1]. msm has a bunch of fixes for new hw added in the merge, a bunch of amdgpu fixes, and nouveau adds support for some new firmwares for turing tu11x GPUs that were just released into linux-firmware by nvidia, they operate the same as the ones we already have for tu10x so should be fine to hook up. Otherwise it's just misc fixes for panfrost and sun4i. core: - Allow only one rotation argument, and allow zero rotation in video cmdline. i915: - Workaround missing Display Stream Compression (DSC) state readout by forcing modeset when its enabled at probe - Fix EHL port clock voltage level requirements - Fix queuing retire workers on the virtual engine - Fix use of partially initialized waiters - Stop using drm_pci_alloc/drm_pci/free - Fix rewind of RING_TAIL by forcing a context reload - Fix locking on resetting ring->head - Propagate our bug filing URL change to stable kernels panfrost: - Small compiler warning fix for panfrost. - Fix when using performance counters in panfrost when using per fd address space. sun4xi: - Fix dt binding nouveau: - tu11x modesetting fix - ACR/GR firmware support for tu11x (fw is public now) msm: - fix UBWC on GPU and display side for sc7180 - fix DSI suspend/resume issue encountered on sc7180 - fix some breakage on so called "linux-android" devices (fallout from sc7180/a618 support, not seen earlier due to bootloader/firmware differences) - couple other misc fixes amdgpu: - HDCP fixes - xclk fix for raven - GFXOFF fixes" [1] The Intel suspend testing should now be fixed by commit 63fb9623427f ("ACPI: PM: s2idle: Check fixed wakeup events in acpi_s2idle_wake()") * tag 'drm-fixes-2020-02-21' of git://anongit.freedesktop.org/drm/drm: (39 commits) drm/amdgpu/display: clean up hdcp workqueue handling drm/amdgpu: add is_raven_kicker judgement for raven1 drm/i915/gt: Avoid resetting ring->head outside of its timeline mutex drm/i915/execlists: Always force a context reload when rewinding RING_TAIL drm/i915: Wean off drm_pci_alloc/drm_pci_free drm/i915/gt: Protect defer_request() from new waiters drm/i915/gt: Prevent queuing retire workers on the virtual engine drm/i915/dsc: force full modeset whenever DSC is enabled at probe drm/i915/ehl: Update port clock voltage level requirements drm/i915: Update drm/i915 bug filing URL MAINTAINERS: Update drm/i915 bug filing URL drm/i915: Initialise basic fence before acquiring seqno drm/i915/gem: Require per-engine reset support for non-persistent contexts drm/nouveau/kms/gv100-: Re-set LUT after clearing for modesets drm/nouveau/gr/tu11x: initial support drm/nouveau/acr/tu11x: initial support drm/amdgpu/gfx10: disable gfxoff when reading rlc clock drm/amdgpu/gfx9: disable gfxoff when reading rlc clock drm/amdgpu/soc15: fix xclk for raven drm/amd/powerplay: always refetch the enabled features status on dpm enablement ...
2020-02-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds104-506/+1132
Pull networking fixes from David Miller: 1) Limit xt_hashlimit hash table size to avoid OOM or hung tasks, from Cong Wang. 2) Fix deadlock in xsk by publishing global consumer pointers when NAPI is finished, from Magnus Karlsson. 3) Set table field properly to RT_TABLE_COMPAT when necessary, from Jethro Beekman. 4) NLA_STRING attributes are not necessary NULL terminated, deal wiht that in IFLA_ALT_IFNAME. From Eric Dumazet. 5) Fix checksum handling in atlantic driver, from Dmitry Bezrukov. 6) Handle mtu==0 devices properly in wireguard, from Jason A. Donenfeld. 7) Fix several lockdep warnings in bonding, from Taehee Yoo. 8) Fix cls_flower port blocking, from Jason Baron. 9) Sanitize internal map names in libbpf, from Toke Høiland-Jørgensen. 10) Fix RDMA race in qede driver, from Michal Kalderon. 11) Fix several false lockdep warnings by adding conditions to list_for_each_entry_rcu(), from Madhuparna Bhowmik. 12) Fix sleep in atomic in mlx5 driver, from Huy Nguyen. 13) Fix potential deadlock in bpf_map_do_batch(), from Yonghong Song. 14) Hey, variables declared in switch statement before any case statements are not initialized. I learn something every day. Get rids of this stuff in several parts of the networking, from Kees Cook. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (99 commits) bnxt_en: Issue PCIe FLR in kdump kernel to cleanup pending DMAs. bnxt_en: Improve device shutdown method. net: netlink: cap max groups which will be considered in netlink_bind() net: thunderx: workaround BGX TX Underflow issue ionic: fix fw_status read net: disable BRIDGE_NETFILTER by default net: macb: Properly handle phylink on at91rm9200 s390/qeth: fix off-by-one in RX copybreak check s390/qeth: don't warn for napi with 0 budget s390/qeth: vnicc Fix EOPNOTSUPP precedence openvswitch: Distribute switch variables for initialization net: ip6_gre: Distribute switch variables for initialization net: core: Distribute switch variables for initialization udp: rehash on disconnect net/tls: Fix to avoid gettig invalid tls record bpf: Fix a potential deadlock with bpf_map_do_batch bpf: Do not grab the bucket spinlock by default on htab batch ops ice: Wait for VF to be reset/ready before configuration ice: Don't tell the OS that link is going down ice: Don't reject odd values of usecs set by user ...
2020-02-21Merge branch 'akpm' (patches from Andrew)Linus Torvalds20-419/+93
Merge misc fixes from Andrew Morton: - A few y2038 fixes which missed the merge window while dependencies in NFS were being sorted out. - A bunch of fixes. Some minor, some not. * emailed patches from Andrew Morton <[email protected]>: MAINTAINERS: use tabs for SAFESETID lib/stackdepot.c: fix global out-of-bounds in stack_slabs mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEM mm/vmscan.c: don't round up scan size for online memory cgroup lib/string.c: update match_string() doc-strings with correct behavior mm/memcontrol.c: lost css_put in memcg_expand_shrinker_maps() mm/swapfile.c: fix a comment in sys_swapon() scripts/get_maintainer.pl: deprioritize old Fixes: addresses get_maintainer: remove uses of P: for maintainer name selftests/vm: add missed tests in run_vmtests include/uapi/linux/swab.h: fix userspace breakage, use __BITS_PER_LONG for swap Revert "ipc,sem: remove uneeded sem_undo_list lock usage in exit_sem()" y2038: hide timeval/timespec/itimerval/itimerspec types y2038: remove unused time32 interfaces y2038: remove ktime to/from timespec/timeval conversion
2020-02-21MAINTAINERS: use tabs for SAFESETIDRandy Dunlap1-4/+4
Use tabs for indentation instead of spaces for SAFESETID. All (!) other entries in MAINTAINERS use tabs (according to my simple grepping). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Cc: Micah Morton <[email protected]> Cc: James Morris <[email protected]> Cc: "Serge E. Hallyn" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-21lib/stackdepot.c: fix global out-of-bounds in stack_slabsAlexander Potapenko1-2/+6
Walter Wu has reported a potential case in which init_stack_slab() is called after stack_slabs[STACK_ALLOC_MAX_SLABS - 1] has already been initialized. In that case init_stack_slab() will overwrite stack_slabs[STACK_ALLOC_MAX_SLABS], which may result in a memory corruption. Link: http://lkml.kernel.org/r/[email protected] Fixes: cd11016e5f521 ("mm, kasan: stackdepot implementation. Enable stackdepot for SLAB") Signed-off-by: Alexander Potapenko <[email protected]> Reported-by: Walter Wu <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: Matthias Brugger <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kate Stewart <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-21mm/sparsemem: pfn_to_page is not valid yet on SPARSEMEMWei Yang1-1/+1
When we use SPARSEMEM instead of SPARSEMEM_VMEMMAP, pfn_to_page() doesn't work before sparse_init_one_section() is called. This leads to a crash when hotplug memory: BUG: unable to handle page fault for address: 0000000006400000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP PTI CPU: 3 PID: 221 Comm: kworker/u16:1 Tainted: G W 5.5.0-next-20200205+ #343 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 Workqueue: kacpi_hotplug acpi_hotplug_work_fn RIP: 0010:__memset+0x24/0x30 Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 <f3> 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 RSP: 0018:ffffb43ac0373c80 EFLAGS: 00010a87 RAX: ffffffffffffffff RBX: ffff8a1518800000 RCX: 0000000000050000 RDX: 0000000000000000 RSI: 00000000000000ff RDI: 0000000006400000 RBP: 0000000000140000 R08: 0000000000100000 R09: 0000000006400000 R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000000000 R13: 0000000000000028 R14: 0000000000000000 R15: ffff8a153ffd9280 FS: 0000000000000000(0000) GS:ffff8a153ab00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000006400000 CR3: 0000000136fca000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: sparse_add_section+0x1c9/0x26a __add_pages+0xbf/0x150 add_pages+0x12/0x60 add_memory_resource+0xc8/0x210 __add_memory+0x62/0xb0 acpi_memory_device_add+0x13f/0x300 acpi_bus_attach+0xf6/0x200 acpi_bus_scan+0x43/0x90 acpi_device_hotplug+0x275/0x3d0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x1a7/0x370 worker_thread+0x30/0x380 kthread+0x112/0x130 ret_from_fork+0x35/0x40 We should use memmap as it did. On x86 the impact is limited to x86_32 builds, or x86_64 configurations that override the default setting for SPARSEMEM_VMEMMAP. Other memory hotplug archs (arm64, ia64, and ppc) also default to SPARSEMEM_VMEMMAP=y. [[email protected]: changelog update] {[email protected]: changelog update] Link: http://lkml.kernel.org/r/[email protected] Fixes: ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug") Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Baoquan He <[email protected]> Acked-by: David Hildenbrand <[email protected]> Reviewed-by: Baoquan He <[email protected]> Reviewed-by: Dan Williams <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-21mm/vmscan.c: don't round up scan size for online memory cgroupGavin Shan1-3/+6
Commit 68600f623d69 ("mm: don't miss the last page because of round-off error") makes the scan size round up to @denominator regardless of the memory cgroup's state, online or offline. This affects the overall reclaiming behavior: the corresponding LRU list is eligible for reclaiming only when its size logically right shifted by @sc->priority is bigger than zero in the former formula. For example, the inactive anonymous LRU list should have at least 0x4000 pages to be eligible for reclaiming when we have 60/12 for swappiness/priority and without taking scan/rotation ratio into account. After the roundup is applied, the inactive anonymous LRU list becomes eligible for reclaiming when its size is bigger than or equal to 0x1000 in the same condition. (0x4000 >> 12) * 60 / (60 + 140 + 1) = 1 ((0x1000 >> 12) * 60) + 200) / (60 + 140 + 1) = 1 aarch64 has 512MB huge page size when the base page size is 64KB. The memory cgroup that has a huge page is always eligible for reclaiming in that case. The reclaiming is likely to stop after the huge page is reclaimed, meaing the further iteration on @sc->priority and the silbing and child memory cgroups will be skipped. The overall behaviour has been changed. This fixes the issue by applying the roundup to offlined memory cgroups only, to give more preference to reclaim memory from offlined memory cgroup. It sounds reasonable as those memory is unlikedly to be used by anyone. The issue was found by starting up 8 VMs on a Ampere Mustang machine, which has 8 CPUs and 16 GB memory. Each VM is given with 2 vCPUs and 2GB memory. It took 264 seconds for all VMs to be completely up and 784MB swap is consumed after that. With this patch applied, it took 236 seconds and 60MB swap to do same thing. So there is 10% performance improvement for my case. Note that KSM is disable while THP is enabled in the testing. total used free shared buff/cache available Mem: 16196 10065 2049 16 4081 3749 Swap: 8175 784 7391 total used free shared buff/cache available Mem: 16196 11324 3656 24 1215 2936 Swap: 8175 60 8115 Link: http://lkml.kernel.org/r/[email protected] Fixes: 68600f623d69 ("mm: don't miss the last page because of round-off error") Signed-off-by: Gavin Shan <[email protected]> Acked-by: Roman Gushchin <[email protected]> Cc: <[email protected]> [4.20+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-21lib/string.c: update match_string() doc-strings with correct behaviorAlexandru Ardelean1-0/+16
There were a few attempts at changing behavior of the match_string() helpers (i.e. 'match_string()' & 'sysfs_match_string()'), to change & extend the behavior according to the doc-string. But the simplest approach is to just fix the doc-strings. The current behavior is fine as-is, and some bugs were introduced trying to fix it. As for extending the behavior, new helpers can always be introduced if needed. The match_string() helpers behave more like 'strncmp()' in the sense that they go up to n elements or until the first NULL element in the array of strings. This change updates the doc-strings with this info. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Alexandru Ardelean <[email protected]> Acked-by: Andy Shevchenko <[email protected]> Cc: Kees Cook <[email protected]> Cc: "Tobin C . Harding" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-21mm/memcontrol.c: lost css_put in memcg_expand_shrinker_maps()Vasily Averin1-1/+3
for_each_mem_cgroup() increases css reference counter for memory cgroup and requires to use mem_cgroup_iter_break() if the walk is cancelled. Link: http://lkml.kernel.org/r/[email protected] Fixes: 0a4465d34028 ("mm, memcg: assign memcg-aware shrinkers bitmap to memcg") Signed-off-by: Vasily Averin <[email protected]> Acked-by: Kirill Tkhai <[email protected]> Acked-by: Michal Hocko <[email protected]> Reviewed-by: Roman Gushchin <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-21mm/swapfile.c: fix a comment in sys_swapon()Christoph Hellwig1-1/+1
claim_swapfile now always takes i_rwsem. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-21scripts/get_maintainer.pl: deprioritize old Fixes: addressesDouglas Anderson1-4/+4
Recently, I found that get_maintainer was causing me to send emails to the old addresses for maintainers. Since I usually just trust the output of get_maintainer to know the right email address, I didn't even look carefully and fired off two patch series that went to the wrong place. Oops. The problem was introduced recently when trying to add signatures from Fixes. The problem was that these email addresses were added too early in the process of compiling our list of places to send. Things added to the list earlier are considered more canonical and when we later added maintainer entries we ended up deduplicating to the old address. Here are two examples using mainline commits (to make it easier to replicate) for the two maintainers that I messed up recently: $ git format-patch d8549bcd0529~..d8549bcd0529 $ ./scripts/get_maintainer.pl 0001-clk-Add-clk_hw*.patch | grep Boyd Stephen Boyd <[email protected]>... $ git format-patch 6d1238aa3395~..6d1238aa3395 $ ./scripts/get_maintainer.pl 0001-arm64-dts-qcom-qcs404*.patch | grep Andy Andy Gross <[email protected]> Let's move the adding of addresses from Fixes: to the end since the email addresses from these are much more likely to be older. After this patch the above examples get the right addresses for the two examples. Link: http://lkml.kernel.org/r/20200127095001.1.I41fba9f33590bfd92cd01960161d8384268c6569@changeid Fixes: 2f5bd343694e ("scripts/get_maintainer.pl: add signatures from Fixes: <badcommit> lines in commit message") Signed-off-by: Douglas Anderson <[email protected]> Acked-by: Joe Perches <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: Bjorn Andersson <[email protected]> Cc: Andy Gross <[email protected]> Cc: Kees Cook <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-02-21get_maintainer: remove uses of P: for maintainer nameJoe Perches1-24/+0
Commit 1ca84ed6425f ("MAINTAINERS: Reclaim the P: tag for Maintainer Entry Profile") changed the use of the "P:" tag from "Person" to "Profile (ie: special subsystem coding styles and characteristics)" Change how get_maintainer.pl parses the "P:" tag to match. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Joe Perches <[email protected]> Acked-by: Dan Williams <[email protected]> Cc: Jonathan Corbet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>