aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-08-11bpf: fix two missing target_size settings in bpf_convert_ctx_accessDaniel Borkmann1-0/+2
When CONFIG_NET_SCHED or CONFIG_NET_RX_BUSY_POLL is /not/ set and we try a narrow __sk_buff load of tc_index or napi_id, respectively, then verifier rightfully complains that it's misconfigured, because we need to set target_size in each of the two cases. The rewrite for the ctx access is just a dummy op, but needs to pass, so fix this up. Fixes: f96da09473b5 ("bpf: simplify narrower ctx access") Reported-by: Shubham Bansal <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-08-11net: fix compilation when busy poll is not enabledDaniel Borkmann1-6/+6
MIN_NAPI_ID is used in various places outside of CONFIG_NET_RX_BUSY_POLL wrapping, so when it's not set we run into build errors such as: net/core/dev.c: In function 'dev_get_by_napi_id': net/core/dev.c:886:16: error: ‘MIN_NAPI_ID’ undeclared (first use in this function) if (napi_id < MIN_NAPI_ID) ^~~~~~~~~~~ Thus, have MIN_NAPI_ID always defined to fix these errors. Signed-off-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-08-11mISDN: Fix null pointer dereference at mISDN_FsmNewAnton Vasilyev5-9/+36
If mISDN_FsmNew() fails to allocate memory for jumpmatrix then null pointer dereference will occur on any write to jumpmatrix. The patch adds check on successful allocation and corresponding error handling. Found by Linux Driver Verification project (linuxtesting.org). Signed-off-by: Anton Vasilyev <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-08-11nfp: do not update MTU from BH in flower appSimon Horman1-6/+2
The Flower app may receive a request to update the MTU of a representor netdev upon receipt of a control message from the firmware. This requires the RTNL lock which needs to be taken outside of the packet processing path. As a handling of this correctly seems a little to invasive for a fix simply skip setting the MTU for now. Relevant backtrace: [ 1496.288489] BUG: scheduling while atomic: kworker/0:3/373/0x00000100 [ 1496.294911] dca syscopyarea sysfillrect sysimgblt fb_sys_fops ptp drm mxm_wmi ahci pps_core libahci i2c_algo_bit wmi [last unloaded: nfp] [ 1496.294918] CPU: 0 PID: 373 Comm: kworker/0:3 Tainted: G OE 4.13.0-rc3+ #3 [ 1496.294919] Hardware name: Supermicro X10DRi/X10DRi, BIOS 2.0 12/28/2015 [ 1496.294923] Workqueue: events work_for_cpu_fn [ 1496.294924] Call Trace: [ 1496.294927] <IRQ> [ 1496.294931] dump_stack+0x63/0x82 [ 1496.294935] __schedule_bug+0x54/0x70 [ 1496.294937] __schedule+0x62f/0x890 [ 1496.294941] ? intel_unmap_sg+0x90/0x90 [ 1496.294942] schedule+0x36/0x80 [ 1496.294943] schedule_preempt_disabled+0xe/0x10 [ 1496.294945] __mutex_lock.isra.2+0x445/0x4a0 [ 1496.294947] ? device_is_rmrr_locked+0x12/0x50 [ 1496.294950] ? kfree+0x162/0x170 [ 1496.294952] ? device_is_rmrr_locked+0x12/0x50 [ 1496.294953] ? iommu_should_identity_map+0x50/0xe0 [ 1496.294954] __mutex_lock_slowpath+0x13/0x20 [ 1496.294955] ? iommu_no_mapping+0x48/0xd0 [ 1496.294956] ? __mutex_lock_slowpath+0x13/0x20 [ 1496.294957] mutex_lock+0x2f/0x40 [ 1496.294960] rtnl_lock+0x15/0x20 [ 1496.294979] nfp_flower_cmsg_rx+0xc8/0x150 [nfp] [ 1496.294986] nfp_ctrl_poll+0x286/0x350 [nfp] [ 1496.294989] tasklet_action+0xf6/0x110 [ 1496.294992] __do_softirq+0xed/0x278 [ 1496.294993] irq_exit+0xb6/0xc0 [ 1496.294994] do_IRQ+0x4f/0xd0 [ 1496.294996] common_interrupt+0x89/0x89 Fixes: 948faa46c05b ("nfp: add support for control messages for flower app") Signed-off-by: Simon Horman <[email protected]> Reviewed-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-08-11net: stmmac: Use the right logging function in stmmac_mdio_registerRomain Perier1-5/+4
Currently, the function stmmac_mdio_register() is only used by stmmac_dvr_probe() from stmmac_main.c, in order to register the MDIO bus and probe information about the PHY. As this function is called before calling register_netdev(), all messages logged from stmmac_mdio_register are prefixed by "(unnamed net_device)". The goal of netdev_info or netdev_err is to dump useful infos about a net_device, when this data structure is partially initialized, there is no point for using these functions. This commit fixes the issue by replacing all netdev_*() by the corresponding dev_*() function for logging. The last netdev_info is replaced by phy_attached_info(), as a valid phydev can be used at this point. Signed-off-by: Romain Perier <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-08-11net/sched/hfsc: allocate tcf block for hfsc root classKonstantin Khlebnikov1-0/+8
Without this filters cannot be attached. Signed-off-by: Konstantin Khlebnikov <[email protected]> Fixes: 6529eaba33f0 ("net: sched: introduce tcf block infractructure") Acked-by: Cong Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-08-11bonding: require speed/duplex only for 802.3ad, alb and tlbAndreas Born2-2/+9
The patch c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") puts the link state to down if bond_update_speed_duplex() cannot retrieve speed and duplex settings. Assumably the patch was written with 802.3ad mode in mind which relies on link speed/duplex settings. For other modes like active-backup these settings are not required. Thus, only for these other modes, this patch reintroduces support for slaves that do not support reporting speed or duplex such as wireless devices. This fixes the regression reported in bug 196547 (https://bugzilla.kernel.org/show_bug.cgi?id=196547). Fixes: c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") Signed-off-by: Andreas Born <[email protected]> Acked-by: Mahesh Bandewar <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-08-11net: dsa: ksz: fix skb freeingVivien Didelot1-4/+9
The DSA layer frees the original skb when an xmit function returns NULL, meaning an error occurred. But if the tagging code copied the original skb, it is responsible of freeing the copy if an error occurs. The ksz tagging code currently has two issues: if skb_put_padto fails, the skb copy is not freed, and the original skb will be freed twice. To fix that, move skb_put_padto inside both branches of the skb_tailroom condition, before freeing the original skb, and free the copy on error. Signed-off-by: Vivien Didelot <[email protected]> Reviewed-by: Woojung Huh <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2017-08-11Merge tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfsLinus Torvalds3-2/+3
Pull NFS client fixes from Anna Schumaker: "A few more NFS client bugfixes from me for rc5. Dros has a stable fix for flexfiles to prevent leaking the nfs4_ff_ds_version arrays when freeing a layout, Trond fixed a potential recovery loop situation with the TEST_STATEID operation, and Christoph fixed up the pNFS blocklayout Kconfig options to prevent unsafe use with kernels that don't have large block device support. Summary: Stable fix: - fix leaking nfs4_ff_ds_version array Other fixes: - improve TEST_STATEID OLD_STATEID handling to prevent recovery loop - require 64-bit sector_t for pNFS blocklayout to prevent 32-bit compile errors" * tag 'nfs-for-4.13-5' of git://git.linux-nfs.org/projects/anna/linux-nfs: pnfs/blocklayout: require 64-bit sector_t NFSv4: Ignore NFS4ERR_OLD_STATEID in nfs41_check_open_stateid() nfs/flexfiles: fix leak of nfs4_ff_ds_version arrays
2017-08-11Merge branch 'for-linus' of git://git.kernel.dk/linux-blockLinus Torvalds10-55/+267
Pull block fixes from Jens Axboe: "A set of fixes that should go into this series. This contains: - Fix from Bart for blk-mq requeue queue running, preventing a continued loop of run/restart. - Fix for a bio/blk-integrity issue, in two parts. One from Christoph, fixing where verification happens, and one from Milan, for a NULL profile. - NVMe pull request, most of the changes being for nvme-fc, but also a few trivial core/pci fixes" * 'for-linus' of git://git.kernel.dk/linux-block: nvme: fix directive command numd calculation nvme: fix nvme reset command timeout handling nvme-pci: fix CMB sysfs file removal in reset path lpfc: support nvmet_fc defer_rcv callback nvmet_fc: add defer_req callback for deferment of cmd buffer return nvme: strip trailing 0-bytes in wwid_show block: Make blk_mq_delay_kick_requeue_list() rerun the queue at a quiet time bio-integrity: only verify integrity on the lowest stacked driver bio-integrity: Fix regression if profile verify_fn is NULL
2017-08-11Merge tag 'mmc-v4.13-rc4' of ↵Linus Torvalds3-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc Pull MMC fixes from Ulf Hansson: "MMC core: - fix lockdep splat when removing mmc_block module - fix the logic for setting eMMC HS400ES signal voltage MMC host: - omap_hsmmc: add CMD23 capability to fix -EIO errors" * tag 'mmc-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc: mmc: block: fix lockdep splat when removing mmc_block module mmc: mmc: correct the logic for setting HS400ES signal voltage mmc: host: omap_hsmmc: Add CMD23 capability to omap_hsmmc driver
2017-08-11Merge tag 'fbdev-v4.13-rc5' of git://github.com/bzolnier/linuxLinus Torvalds4-9/+16
Pull fbdev fixes from Bartlomiej Zolnierkiewicz: - allow user to disable write combined mapping in efifb driver (Dave Airlie) - fix use after free bugs on driver removal in imxfb driver (Dan Carpenter) - fix unused variable warning in omapfb driver (Arnd Bergmann) * tag 'fbdev-v4.13-rc5' of git://github.com/bzolnier/linux: efifb: allow user to disable write combined mapping. fbdev: omapfb: remove unused variable video: fbdev: imxfb: use after free in imxfb_remove()
2017-08-11Merge branch 'for-linus' of ↵Linus Torvalds2-4/+6
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse Pull fuse fixes from Miklos Szeredi: "Fix a few bugs in fuse" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: set mapping error in writepage_locked when it fails fuse: Dont call set_page_dirty_lock() for ITER_BVEC pages for async_dio fuse: initialize the flock flag in fuse_file on allocation
2017-08-11Merge tag 'iommu-fixes-v4.13-rc4' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull IOMMU fix from Joerg Roedel: "Fix a NULL-pointer dereference in arm_smmu_add_device" * tag 'iommu-fixes-v4.13-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device
2017-08-11pnfs/blocklayout: require 64-bit sector_tChristoph Hellwig1-0/+1
The blocklayout code does not compile cleanly for a 32-bit sector_t, and also has no reliable checks for devices sizes, which makes it unsafe to use with a kernel that doesn't support large block devices. Signed-off-by: Christoph Hellwig <[email protected]> Reported-by: Arnd Bergmann <[email protected]> Fixes: 5c83746a0cf2 ("pnfs/blocklayout: in-kernel GETDEVICEINFO XDR parsing") Signed-off-by: Anna Schumaker <[email protected]>
2017-08-11Merge tag 'powerpc-4.13-6' of ↵Linus Torvalds9-68/+111
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "All fixes for code that went in this cycle. - a revert of an optimisation to the syscall exit path, which could lead to an oops on either older machines or machines with > 1TB of memory - disable some deep idle states if the firmware configuration for them fails - re-enable HARD/SOFT lockup detectors in defconfigs after a Kconfig change - six fairly small patches fixing bugs in our new watchdog code Thanks to: Gautham R Shenoy, Nicholas Piggin" * tag 'powerpc-4.13-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/watchdog: add locking around init/exit functions powerpc/watchdog: Fix marking of stuck CPUs powerpc/watchdog: Fix final-check recovered case powerpc/watchdog: Moderate touch_nmi_watchdog overhead powerpc/watchdog: Improve watchdog lock primitive powerpc: NMI IPI improve lock primitive powerpc/configs: Re-enable HARD/SOFT lockup detectors powerpc/powernv/idle: Disable LOSE_FULL_CONTEXT states when stop-api fails Revert "powerpc/64: Avoid restore_math call if possible in syscall exit"
2017-08-11selftests: timers: freq-step: fix compile errorShuah Khan1-4/+3
Fix compile error due to ksft_exit_skip() update to take var_args. freq-step.c: In function ‘init_test’: freq-step.c:234:3: error: too few arguments to function ‘ksft_exit_skip’ ksft_exit_skip(); ^~~~~~~~~~~~~~ In file included from freq-step.c:26:0: ../kselftest.h:167:19: note: declared here static inline int ksft_exit_skip(const char *msg, ...) ^~~~~~~~~~~~~~ <builtin>: recipe for target 'freq-step' failed Signed-off-by: Shuah Khan <[email protected]>
2017-08-11iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_deviceArtem Savkov1-0/+7
Commit c54451a "iommu/arm-smmu: Fix the error path in arm_smmu_add_device" removed fwspec assignment in legacy_binding path as redundant which is wrong. It needs to be updated after fwspec initialisation in arm_smmu_register_legacy_master() as it is dereferenced later. Without this there is a NULL-pointer dereference panic during boot on some hosts. Signed-off-by: Artem Savkov <[email protected]> Reviewed-by: Robin Murphy <[email protected]> Acked-by: Will Deacon <[email protected]> Signed-off-by: Joerg Roedel <[email protected]>
2017-08-11xen/events: Fix interrupt lost during irq_disable and irq_enableLiu Shuo1-1/+1
Here is a device has xen-pirq-MSI interrupt. Dom0 might lost interrupt during driver irq_disable/irq_enable. Here is the scenario, 1. irq_disable -> disable_dynirq -> mask_evtchn(irq channel) 2. dev interrupt raised by HW and Xen mark its evtchn as pending 3. irq_enable -> startup_pirq -> eoi_pirq -> clear_evtchn(channel of irq) -> clear pending status 4. consume_one_event process the irq event without pending bit assert which result in interrupt lost once 5. No HW interrupt raising anymore. Now use enable_dynirq for enable_pirq of xen_pirq_chip to remove eoi_pirq when irq_enable. Signed-off-by: Liu Shuo <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2017-08-11xen: avoid deadlock in xenbusJuergen Gross1-1/+2
When starting the xenwatch thread a theoretical deadlock situation is possible: xs_init() contains: task = kthread_run(xenwatch_thread, NULL, "xenwatch"); if (IS_ERR(task)) return PTR_ERR(task); xenwatch_pid = task->pid; And xenwatch_thread() does: mutex_lock(&xenwatch_mutex); ... event->handle->callback(); ... mutex_unlock(&xenwatch_mutex); The callback could call unregister_xenbus_watch() which does: ... if (current->pid != xenwatch_pid) mutex_lock(&xenwatch_mutex); ... In case a watch is firing before xenwatch_pid could be set and the callback of that watch unregisters a watch, then a self-deadlock would occur. Avoid this by setting xenwatch_pid in xenwatch_thread(). Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2017-08-11Merge branch 'nvme-4.13' of git://git.infradead.org/nvme into for-linusJens Axboe8-51/+261
Pull NVMe fixes from Christoph: "A few more small fixes - the fc/lpfc update is the biggest by far."
2017-08-11xen: fix hvm guest with kaslr enabledJuergen Gross1-2/+14
A Xen HVM guest running with KASLR enabled will die rather soon today because the shared info page mapping is using va() too early. This was introduced by commit a5d5f328b0e2baa5ee7c119fd66324eb79eeeb66 ("xen: allocate page for shared info page from low memory"). In order to fix this use early_memremap() to get a temporary virtual address for shared info until va() can be used safely. Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2017-08-11xen: split up xen_hvm_init_shared_info()Juergen Gross1-21/+24
Instead of calling xen_hvm_init_shared_info() on boot and resume split it up into a boot time function searching for the pfn to use and a mapping function doing the hypervisor mapping call. Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2017-08-11x86: provide an init_mem_mapping hypervisor hookJuergen Gross2-0/+13
Provide a hook in hypervisor_x86 called after setting up initial memory mapping. This is needed e.g. by Xen HVM guests to map the hypervisor shared info page. Signed-off-by: Juergen Gross <[email protected]> Reviewed-by: Boris Ostrovsky <[email protected]> Acked-by: Ingo Molnar <[email protected]> Signed-off-by: Juergen Gross <[email protected]>
2017-08-11fuse: set mapping error in writepage_locked when it failsJeff Layton1-0/+1
This ensures that we see errors on fsync when writeback fails. Signed-off-by: Jeff Layton <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Miklos Szeredi <[email protected]>
2017-08-10Merge tag 'drm-fixes-for-v4.13-rc5' of ↵Linus Torvalds32-214/+247
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Nothing too earth shattering here, it just seems like lots of little things all over the place. msm has probably the larger amount of changes, but they all seem fine, otherwise, some rockchip, i915, etnaviv and exynos fixes, along with one nouveau regression fix for some older GPUs" * tag 'drm-fixes-for-v4.13-rc5' of git://people.freedesktop.org/~airlied/linux: (35 commits) drm/nouveau/disp/nv04: avoid creation of output paths drm: make DRM_STM default n drm/exynos: forbid creating framebuffers from too small GEM buffers drm/etnaviv: Fix off-by-one error in reloc checking drm/i915: fix backlight invert for non-zero minimum brightness drm/i915/shrinker: Wrap need_resched() inside preempt-disable drm/i915/perf: fix flex eu registers programming drm/i915: Fix out-of-bounds array access in bdw_load_gamma_lut drm/i915/gvt: Change the max length of mmio_reg_rw from 4 to 8 drm/i915/gvt: Initialize MMIO Block with HW state drm/rockchip: vop: report error when check resource error drm/rockchip: vop: round_up pitches to word align drm/rockchip: vop: fix NV12 video display error drm/rockchip: vop: fix iommu page fault when resume drm/i915/gvt: clean workload queue if error happened drm/i915/gvt: change resetting to resetting_eng drm/msm: gpu: don't abuse dma_alloc for non-DMA allocations drm/msm: gpu: call qcom_mdt interfaces only for ARCH_QCOM drm/msm/adreno: Prevent unclocked access when retrieving timestamps drm/msm: Remove __user from __u64 data types ...
2017-08-10Merge branch 'akpm' (patches from Andrew)Linus Torvalds28-121/+208
Merge misc fixes from Andrew Morton: "21 fixes" * emailed patches from Andrew Morton <[email protected]>: (21 commits) userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropage zram: rework copy of compressor name in comp_algorithm_store() rmap: do not call mmu_notifier_invalidate_page() under ptl mm: fix list corruptions on shmem shrinklist mm/balloon_compaction.c: don't zero ballooned pages MAINTAINERS: copy virtio on balloon_compaction.c mm: fix KSM data corruption mm: fix MADV_[FREE|DONTNEED] TLB flush miss problem mm: make tlb_flush_pending global mm: refactor TLB gathering API Revert "mm: numa: defer TLB flush for THP migration as long as possible" mm: migrate: fix barriers around tlb_flush_pending mm: migrate: prevent racy access to tlb_flush_pending fault-inject: fix wrong should_fail() decision in task context test_kmod: fix small memory leak on filesystem tests test_kmod: fix the lock in register_test_dev_kmod() test_kmod: fix bug which allows negative values on two config options test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY" userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED case mm: ratelimit PFNs busy info message ...
2017-08-10userfaultfd: replace ENOSPC with ESRCH in case mm has gone during copy/zeropageMike Rapoport1-2/+2
When the process exit races with outstanding mcopy_atomic, it would be better to return ESRCH error. When such race occurs the process and it's mm are going away and returning "no such process" to the uffd monitor seems better fit than ENOSPC. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport <[email protected]> Suggested-by: Michal Hocko <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Pavel Emelyanov <[email protected]> Cc: Mike Kravetz <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10zram: rework copy of compressor name in comp_algorithm_store()Matthias Kaehlcke1-2/+2
comp_algorithm_store() passes the size of the source buffer to strlcpy() instead of the destination buffer size. Make it explicit that the two buffers have the same size and use strcpy() instead of strlcpy(). The latter can be done safely since the function ensures that the string in the source buffer is terminated. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthias Kaehlcke <[email protected]> Reviewed-by: Douglas Anderson <[email protected]> Reviewed-by: Sergey Senozhatsky <[email protected]> Acked-by: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10rmap: do not call mmu_notifier_invalidate_page() under ptlKirill A. Shutemov1-22/+30
MMU notifiers can sleep, but in page_mkclean_one() we call mmu_notifier_invalidate_page() under page table lock. Let's instead use mmu_notifier_invalidate_range() outside page_vma_mapped_walk() loop. [[email protected]: try_to_unmap_one() do not call mmu_notifier under ptl] Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Fixes: c7ab0d2fdc84 ("mm: convert try_to_unmap_one() to use page_vma_mapped_walk()") Signed-off-by: Kirill A. Shutemov <[email protected]> Signed-off-by: Jérôme Glisse <[email protected]> Reported-by: axie <[email protected]> Cc: Alex Deucher <[email protected]> Cc: "Writer, Tim" <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10mm: fix list corruptions on shmem shrinklistCong Wang1-2/+10
We saw many list corruption warnings on shmem shrinklist: WARNING: CPU: 18 PID: 177 at lib/list_debug.c:59 __list_del_entry+0x9e/0xc0 list_del corruption. prev->next should be ffff9ae5694b82d8, but was ffff9ae5699ba960 Modules linked in: intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid0 dcdbas shpchp wmi hed i2c_i801 ioatdma lpc_ich i2c_smbus acpi_cpufreq tcp_diag inet_diag sch_fq_codel ipmi_si ipmi_devintf ipmi_msghandler igb ptp crc32c_intel pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt CPU: 18 PID: 177 Comm: kswapd1 Not tainted 4.9.34-t3.el7.twitter.x86_64 #1 Hardware name: Dell Inc. PowerEdge C6220/0W6W6G, BIOS 2.2.3 11/07/2013 Call Trace: dump_stack+0x4d/0x66 __warn+0xcb/0xf0 warn_slowpath_fmt+0x4f/0x60 __list_del_entry+0x9e/0xc0 shmem_unused_huge_shrink+0xfa/0x2e0 shmem_unused_huge_scan+0x20/0x30 super_cache_scan+0x193/0x1a0 shrink_slab.part.41+0x1e3/0x3f0 shrink_slab+0x29/0x30 shrink_node+0xf9/0x2f0 kswapd+0x2d8/0x6c0 kthread+0xd7/0xf0 ret_from_fork+0x22/0x30 WARNING: CPU: 23 PID: 639 at lib/list_debug.c:33 __list_add+0x89/0xb0 list_add corruption. prev->next should be next (ffff9ae5699ba960), but was ffff9ae5694b82d8. (prev=ffff9ae5694b82d8). Modules linked in: intel_rapl sb_edac edac_core x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul ghash_clmulni_intel raid0 dcdbas shpchp wmi hed i2c_i801 ioatdma lpc_ich i2c_smbus acpi_cpufreq tcp_diag inet_diag sch_fq_codel ipmi_si ipmi_devintf ipmi_msghandler igb ptp crc32c_intel pps_core i2c_algo_bit i2c_core dca ipv6 crc_ccitt CPU: 23 PID: 639 Comm: systemd-udevd Tainted: G W 4.9.34-t3.el7.twitter.x86_64 #1 Hardware name: Dell Inc. PowerEdge C6220/0W6W6G, BIOS 2.2.3 11/07/2013 Call Trace: dump_stack+0x4d/0x66 __warn+0xcb/0xf0 warn_slowpath_fmt+0x4f/0x60 __list_add+0x89/0xb0 shmem_setattr+0x204/0x230 notify_change+0x2ef/0x440 do_truncate+0x5d/0x90 path_openat+0x331/0x1190 do_filp_open+0x7e/0xe0 do_sys_open+0x123/0x200 SyS_open+0x1e/0x20 do_syscall_64+0x61/0x170 entry_SYSCALL64_slow_path+0x25/0x25 The problem is that shmem_unused_huge_shrink() moves entries from the global sbinfo->shrinklist to its local lists and then releases the spinlock. However, a parallel shmem_setattr() could access one of these entries directly and add it back to the global shrinklist if it is removed, with the spinlock held. The logic itself looks solid since an entry could be either in a local list or the global list, otherwise it is removed from one of them by list_del_init(). So probably the race condition is that, one CPU is in the middle of INIT_LIST_HEAD() but the other CPU calls list_empty() which returns true too early then the following list_add_tail() sees a corrupted entry. list_empty_careful() is designed to fix this situation. [[email protected]: add comments] Link: http://lkml.kernel.org/r/[email protected] Fixes: 779750d20b93 ("shmem: split huge pages beyond i_size under memory pressure") Signed-off-by: Cong Wang <[email protected]> Acked-by: Linus Torvalds <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10mm/balloon_compaction.c: don't zero ballooned pagesWei Wang1-1/+1
Revert commit bb01b64cfab7 ("mm/balloon_compaction.c: enqueue zero page to balloon device")' Zeroing ballon pages is rather time consuming, especially when a lot of pages are in flight. E.g. 7GB worth of ballooned memory takes 2.8s with __GFP_ZERO while it takes ~491ms without it. The original commit argued that zeroing will help ksmd to merge these pages on the host but this argument is assuming that the host actually marks balloon pages for ksm which is not universally true. So we pay performance penalty for something that even might not be used in the end which is wrong. The host can zero out pages on its own when there is a need. [[email protected]: new changelog text] Link: http://lkml.kernel.org/r/[email protected] Fixes: bb01b64cfab7 ("mm/balloon_compaction.c: enqueue zero page to balloon device") Signed-off-by: Wei Wang <[email protected]> Acked-by: Michael S. Tsirkin <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: zhenwei.pi <[email protected]> Cc: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10MAINTAINERS: copy virtio on balloon_compaction.cMichael S. Tsirkin1-0/+1
Changes to mm/balloon_compaction.c can easily break virtio, and virtio is the only user of that interface. Add a line to MAINTAINERS so whoever changes that file remembers to copy us. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Rafael Aquini <[email protected]> Acked-by: Wei Wang <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10mm: fix KSM data corruptionMinchan Kim2-3/+7
Nadav reported KSM can corrupt the user data by the TLB batching race[1]. That means data user written can be lost. Quote from Nadav Amit: "For this race we need 4 CPUs: CPU0: Caches a writable and dirty PTE entry, and uses the stale value for write later. CPU1: Runs madvise_free on the range that includes the PTE. It would clear the dirty-bit. It batches TLB flushes. CPU2: Writes 4 to /proc/PID/clear_refs , clearing the PTEs soft-dirty. We care about the fact that it clears the PTE write-bit, and of course, batches TLB flushes. CPU3: Runs KSM. Our purpose is to pass the following test in write_protect_page(): if (pte_write(*pvmw.pte) || pte_dirty(*pvmw.pte) || (pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte))) Since it will avoid TLB flush. And we want to do it while the PTE is stale. Later, and before replacing the page, we would be able to change the page. Note that all the operations the CPU1-3 perform canhappen in parallel since they only acquire mmap_sem for read. We start with two identical pages. Everything below regards the same page/PTE. CPU0 CPU1 CPU2 CPU3 ---- ---- ---- ---- Write the same value on page [cache PTE as dirty in TLB] MADV_FREE pte_mkclean() 4 > clear_refs pte_wrprotect() write_protect_page() [ success, no flush ] pages_indentical() [ ok ] Write to page different value [Ok, using stale PTE] replace_page() Later, CPU1, CPU2 and CPU3 would flush the TLB, but that is too late. CPU0 already wrote on the page, but KSM ignored this write, and it got lost" In above scenario, MADV_FREE is fixed by changing TLB batching API including [set|clear]_tlb_flush_pending. Remained thing is soft-dirty part. This patch changes soft-dirty uses TLB batching API instead of flush_tlb_mm and KSM checks pending TLB flush by using mm_tlb_flush_pending so that it will flush TLB to avoid data lost if there are other parallel threads pending TLB flush. [1] http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Signed-off-by: Nadav Amit <[email protected]> Reported-by: Nadav Amit <[email protected]> Tested-by: Nadav Amit <[email protected]> Reviewed-by: Andrea Arcangeli <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Russell King <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Tony Luck <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10mm: fix MADV_[FREE|DONTNEED] TLB flush miss problemMinchan Kim8-9/+48
Nadav reported parallel MADV_DONTNEED on same range has a stale TLB problem and Mel fixed it[1] and found same problem on MADV_FREE[2]. Quote from Mel Gorman: "The race in question is CPU 0 running madv_free and updating some PTEs while CPU 1 is also running madv_free and looking at the same PTEs. CPU 1 may have writable TLB entries for a page but fail the pte_dirty check (because CPU 0 has updated it already) and potentially fail to flush. Hence, when madv_free on CPU 1 returns, there are still potentially writable TLB entries and the underlying PTE is still present so that a subsequent write does not necessarily propagate the dirty bit to the underlying PTE any more. Reclaim at some unknown time at the future may then see that the PTE is still clean and discard the page even though a write has happened in the meantime. I think this is possible but I could have missed some protection in madv_free that prevents it happening." This patch aims for solving both problems all at once and is ready for other problem with KSM, MADV_FREE and soft-dirty story[3]. TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can catch there are parallel threads going on. In that case, forcefully, flush TLB to prevent for user to access memory via stale TLB entry although it fail to gather page table entry. I confirmed this patch works with [4] test program Nadav gave so this patch supersedes "mm: Always flush VMA ranges affected by zap_page_range v2" in current mmotm. NOTE: This patch modifies arch-specific TLB gathering interface(x86, ia64, s390, sh, um). It seems most of architecture are straightforward but s390 need to be careful because tlb_flush_mmu works only if mm->context.flush_mm is set to non-zero which happens only a pte entry really is cleared by ptep_get_and_clear and friends. However, this problem never changes the pte entries but need to flush to prevent memory access from stale tlb. [1] http://lkml.kernel.org/r/[email protected] [2] http://lkml.kernel.org/r/[email protected] [3] http://lkml.kernel.org/r/[email protected] [4] https://patchwork.kernel.org/patch/9861621/ [[email protected]: decrease tlb flush pending count in tlb_finish_mmu] Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Signed-off-by: Nadav Amit <[email protected]> Reported-by: Nadav Amit <[email protected]> Reported-by: Mel Gorman <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Luck <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10mm: make tlb_flush_pending globalMinchan Kim2-25/+0
Currently, tlb_flush_pending is used only for CONFIG_[NUMA_BALANCING| COMPACTION] but upcoming patches to solve subtle TLB flush batching problem will use it regardless of compaction/NUMA so this patch doesn't remove the dependency. [[email protected]: remove more ifdefs from world's ugliest printk statement] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Signed-off-by: Nadav Amit <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Russell King <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Tony Luck <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10mm: refactor TLB gathering APIMinchan Kim8-25/+54
This patch is a preparatory patch for solving race problems caused by TLB batch. For that, we will increase/decrease TLB flush pending count of mm_struct whenever tlb_[gather|finish]_mmu is called. Before making it simple, this patch separates architecture specific part and rename it to arch_tlb_[gather|finish]_mmu and generic part just calls it. It shouldn't change any behavior. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Minchan Kim <[email protected]> Signed-off-by: Nadav Amit <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Luck <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Yoshinori Sato <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Rik van Riel <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10Revert "mm: numa: defer TLB flush for THP migration as long as possible"Nadav Amit2-6/+7
While deferring TLB flushes is a good practice, the reverted patch caused pending TLB flushes to be checked while the page-table lock is not taken. As a result, in architectures with weak memory model (PPC), Linux may miss a memory-barrier, miss the fact TLB flushes are pending, and cause (in theory) a memory corruption. Since the alternative of using smp_mb__after_unlock_lock() was considered a bit open-coded, and the performance impact is expected to be small, the previous patch is reverted. This reverts b0943d61b8fa ("mm: numa: defer TLB flush for THP migration as long as possible"). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Nadav Amit <[email protected]> Suggested-by: Mel Gorman <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Luck <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10mm: migrate: fix barriers around tlb_flush_pendingNadav Amit1-4/+10
Reading tlb_flush_pending while the page-table lock is taken does not require a barrier, since the lock/unlock already acts as a barrier. Removing the barrier in mm_tlb_flush_pending() to address this issue. However, migrate_misplaced_transhuge_page() calls mm_tlb_flush_pending() while the page-table lock is already released, which may present a problem on architectures with weak memory model (PPC). To deal with this case, a new parameter is added to mm_tlb_flush_pending() to indicate if it is read without the page-table lock taken, and calling smp_mb__after_unlock_lock() in this case. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Nadav Amit <[email protected]> Acked-by: Rik van Riel <[email protected]> Cc: Minchan Kim <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Mel Gorman <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Nadav Amit <[email protected]> Cc: Russell King <[email protected]> Cc: Tony Luck <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10mm: migrate: prevent racy access to tlb_flush_pendingNadav Amit4-13/+26
Patch series "fixes of TLB batching races", v6. It turns out that Linux TLB batching mechanism suffers from various races. Races that are caused due to batching during reclamation were recently handled by Mel and this patch-set deals with others. The more fundamental issue is that concurrent updates of the page-tables allow for TLB flushes to be batched on one core, while another core changes the page-tables. This other core may assume a PTE change does not require a flush based on the updated PTE value, while it is unaware that TLB flushes are still pending. This behavior affects KSM (which may result in memory corruption) and MADV_FREE and MADV_DONTNEED (which may result in incorrect behavior). A proof-of-concept can easily produce the wrong behavior of MADV_DONTNEED. Memory corruption in KSM is harder to produce in practice, but was observed by hacking the kernel and adding a delay before flushing and replacing the KSM page. Finally, there is also one memory barrier missing, which may affect architectures with weak memory model. This patch (of 7): Setting and clearing mm->tlb_flush_pending can be performed by multiple threads, since mmap_sem may only be acquired for read in task_numa_work(). If this happens, tlb_flush_pending might be cleared while one of the threads still changes PTEs and batches TLB flushes. This can lead to the same race between migration and change_protection_range() that led to the introduction of tlb_flush_pending. The result of this race was data corruption, which means that this patch also addresses a theoretically possible data corruption. An actual data corruption was not observed, yet the race was was confirmed by adding assertion to check tlb_flush_pending is not set by two threads, adding artificial latency in change_protection_range() and using sysctl to reduce kernel.numa_balancing_scan_delay_ms. Link: http://lkml.kernel.org/r/[email protected] Fixes: 20841405940e ("mm: fix TLB flush race between migration, and change_protection_range") Signed-off-by: Nadav Amit <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Rik van Riel <[email protected]> Acked-by: Minchan Kim <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: Heiko Carstens <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Jeff Dike <[email protected]> Cc: Martin Schwidefsky <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Russell King <[email protected]> Cc: Sergey Senozhatsky <[email protected]> Cc: Tony Luck <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10fault-inject: fix wrong should_fail() decision in task contextAkinobu Mita1-3/+5
Commit 1203c8e6fb0a ("fault-inject: simplify access check for fail-nth") unintentionally broke a conditional statement in should_fail(). Any faults are not injected in the task context by the change when the systematic fault injection is not used. This change restores to the previous correct behaviour. Link: http://lkml.kernel.org/r/[email protected] Fixes: 1203c8e6fb0a ("fault-inject: simplify access check for fail-nth") Signed-off-by: Akinobu Mita <[email protected]> Reported-by: Lu Fengqi <[email protected]> Tested-by: Lu Fengqi <[email protected]> Cc: Dmitry Vyukov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10test_kmod: fix small memory leak on filesystem testsDan Carpenter1-1/+1
The break was in the wrong place so file system tests don't work as intended, leaking memory at each test switch. [[email protected]: massaged commit subject, noted memory leak issue without the fix] Link: http://lkml.kernel.org/r/[email protected] Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Luis R. Rodriguez <[email protected]> Reported-by: David Binderman <[email protected]> Cc: Colin Ian King <[email protected]> Cc: Dmitry Torokhov <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kees Cook <[email protected]> Cc: Michal Marek <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10test_kmod: fix the lock in register_test_dev_kmod()Dan Carpenter1-1/+1
We accidentally just drop the lock twice instead of taking it and then releasing it. This isn't a big issue unless you are adding more than one device to test on, and the kmod.sh doesn't do that yet, however this obviously is the correct thing to do. [[email protected]: massaged subject, explain what happens] Link: http://lkml.kernel.org/r/[email protected] Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader") Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: Luis R. Rodriguez <[email protected]> Cc: Colin Ian King <[email protected]> Cc: David Binderman <[email protected]> Cc: Dmitry Torokhov <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kees Cook <[email protected]> Cc: Michal Marek <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10test_kmod: fix bug which allows negative values on two config optionsLuis R. Rodriguez1-4/+4
Parsing with kstrtol() enables values to be negative, and we failed to check for negative values when parsing with test_dev_config_update_uint_sync() or test_dev_config_update_uint_range(). test_dev_config_update_uint_range() has a minimum check though so an issue is not present there. test_dev_config_update_uint_sync() is only used for the number of threads to use (config_num_threads_store()), and indeed this would fail with an attempt for a large allocation. Although the issue is only present in practice with the first fix both by using kstrtoul() instead of kstrtol(). Link: http://lkml.kernel.org/r/[email protected] Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader") Signed-off-by: Luis R. Rodriguez <[email protected]> Reported-by: Dan Carpenter <[email protected]> Cc: Colin Ian King <[email protected]> Cc: David Binderman <[email protected]> Cc: Dmitry Torokhov <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Kees Cook <[email protected]> Cc: Michal Marek <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Shuah Khan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10test_kmod: fix spelling mistake: "EMTPY" -> "EMPTY"Colin Ian King1-2/+2
Trivial fix to spelling mistake in snprintf text [[email protected]: massaged commit message] Link: http://lkml.kernel.org/r/[email protected] Fixes: 39258f448d71 ("kmod: add test driver to stress test the module loader") Signed-off-by: Colin Ian King <[email protected]> Signed-off-by: Luis R. Rodriguez <[email protected]> Cc: Dmitry Torokhov <[email protected]> Cc: Kees Cook <[email protected]> Cc: Jessica Yu <[email protected]> Cc: Rusty Russell <[email protected]> Cc: Michal Marek <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Miroslav Benes <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Eric W. Biederman <[email protected]> Cc: Shuah Khan <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: David Binderman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10userfaultfd: hugetlbfs: remove superfluous page unlock in VM_SHARED caseAndrea Arcangeli1-1/+1
huge_add_to_page_cache->add_to_page_cache implicitly unlocks the page before returning in case of errors. The error returned was -EEXIST by running UFFDIO_COPY on a non-hole offset of a VM_SHARED hugetlbfs mapping. It was an userland bug that triggered it and the kernel must cope with it returning -EEXIST from ioctl(UFFDIO_COPY) as expected. page dumped because: VM_BUG_ON_PAGE(!PageLocked(page)) kernel BUG at mm/filemap.c:964! invalid opcode: 0000 [#1] SMP CPU: 1 PID: 22582 Comm: qemu-system-x86 Not tainted 4.11.11-300.fc26.x86_64 #1 RIP: unlock_page+0x4a/0x50 Call Trace: hugetlb_mcopy_atomic_pte+0xc0/0x320 mcopy_atomic+0x96f/0xbe0 userfaultfd_ioctl+0x218/0xe90 do_vfs_ioctl+0xa5/0x600 SyS_ioctl+0x79/0x90 entry_SYSCALL_64_fastpath+0x1a/0xa9 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andrea Arcangeli <[email protected]> Tested-by: Maxime Coquelin <[email protected]> Reviewed-by: Mike Kravetz <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Mike Rapoport <[email protected]> Cc: Alexey Perevalov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10mm: ratelimit PFNs busy info messageJonathan Toppins1-1/+1
The RDMA subsystem can generate several thousand of these messages per second eventually leading to a kernel crash. Ratelimit these messages to prevent this crash. Doug said: "I've been carrying a version of this for several kernel versions. I don't remember when they started, but we have one (and only one) class of machines: Dell PE R730xd, that generate these errors. When it happens, without a rate limit, we get rcu timeouts and kernel oopses. With the rate limit, we just get a lot of annoying kernel messages but the machine continues on, recovers, and eventually the memory operations all succeed" And: "> Well... why are all these EBUSY's occurring? It sounds inefficient > (at least) but if it is expected, normal and unavoidable then > perhaps we should just remove that message altogether? I don't have an answer to that question. To be honest, I haven't looked real hard. We never had this at all, then it started out of the blue, but only on our Dell 730xd machines (and it hits all of them), but no other classes or brands of machines. And we have our 730xd machines loaded up with different brands and models of cards (for instance one dedicated to mlx4 hardware, one for qib, one for mlx5, an ocrdma/cxgb4 combo, etc), so the fact that it hit all of the machines meant it wasn't tied to any particular brand/model of RDMA hardware. To me, it always smelled of a hardware oddity specific to maybe the CPUs or mainboard chipsets in these machines, so given that I'm not an mm expert anyway, I never chased it down. A few other relevant details: it showed up somewhere around 4.8/4.9 or thereabouts. It never happened before, but the prinkt has been there since the 3.18 days, so possibly the test to trigger this message was changed, or something else in the allocator changed such that the situation started happening on these machines? And, like I said, it is specific to our 730xd machines (but they are all identical, so that could mean it's something like their specific ram configuration is causing the allocator to hit this on these machine but not on other machines in the cluster, I don't want to say it's necessarily the model of chipset or CPU, there are other bits of identicalness between these machines)" Link: http://lkml.kernel.org/r/499c0f6cc10d6eb829a67f2a4d75b4228a9b356e.1501695897.git.jtoppins@redhat.com Signed-off-by: Jonathan Toppins <[email protected]> Reviewed-by: Doug Ledford <[email protected]> Tested-by: Doug Ledford <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Hillf Danton <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10mm: fix global NR_SLAB_.*CLAIMABLE counter readsJohannes Weiner4-10/+11
As Tetsuo points out: "Commit 385386cff4c6 ("mm: vmstat: move slab statistics from zone to node counters") broke "Slab:" field of /proc/meminfo . It shows nearly 0kB" In addition to /proc/meminfo, this problem also affects the slab counters OOM/allocation failure info dumps, can cause early -ENOMEM from overcommit protection, and miscalculate image size requirements during suspend-to-disk. This is because the patch in question switched the slab counters from the zone level to the node level, but forgot to update the global accessor functions to read the aggregate node data instead of the aggregate zone data. Use global_node_page_state() to access the global slab counters. Fixes: 385386cff4c6 ("mm: vmstat: move slab statistics from zone to node counters") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Johannes Weiner <[email protected]> Reported-by: Tetsuo Handa <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Josef Bacik <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: Stefan Agner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-08-10Merge tag 'pci-v4.13-fixes-2' of ↵Linus Torvalds5-0/+64
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI fix from Bjorn Helgaas: "Work around Renesas uPD72020x 32-bit DMA issue" * tag 'pci-v4.13-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: xhci: Reset Renesas uPD72020x USB controller for 32-bit DMA issue PCI: Add pci_reset_function_locked()
2017-08-10thunderbolt: Do not enumerate more ports from DROM than the controller hasMika Westerberg1-0/+9
Some Alpine Ridge LP DROMs (there might be others) erroneusly list more ports than the controller actually has. Most probably because DROM of the full Dual/Single port Thunderbolt controller was reused for LP version. The current DROM parser does not check the upper bound thus it leads to crash when sw->ports[] is accessed over bounds: BUG: unable to handle kernel NULL pointer dereference at 00000000000002ec IP: tb_drom_read+0x383/0x890 [thunderbolt] PGD 0 P4D 0 Oops: 0000 [#1] SMP CPU: 3 PID: 12248 Comm: systemd-udevd Not tainted 4.13.0-rc1-next-20170719 #1 Hardware name: LENOVO 20HF000YGE/20HF000YGE, BIOS N1WET32W (1.11 ) 05/23/2017 task: ffff8a293e4bcd80 task.stack: ffffa698027a8000 RIP: 0010:tb_drom_read+0x383/0x890 [thunderbolt] RSP: 0018:ffffa698027ab990 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff8a2940af7800 RCX: 0000000000000000 RDX: ffff8a2940ebb400 RSI: 0000000000000000 RDI: ffffa698027ab9a0 RBP: ffffa698027ab9d0 R08: 0000000000000001 R09: 0000000000000002 R10: ffff8a2940ebb5b0 R11: 0000000000000000 R12: ffff8a293bfa968c R13: 000000000000002c R14: 0000000000000056 R15: 0000000000000056 FS: 00007f0a945a38c0(0000) GS:ffff8a2961580000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000002ec CR3: 000000043e785000 CR4: 00000000003606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: tb_switch_add+0x9d/0x730 [thunderbolt] ? tb_switch_alloc+0x3cd/0x4d0 [thunderbolt] icm_start+0x5a/0xa0 [thunderbolt] tb_domain_add+0xc3/0xf0 [thunderbolt] nhi_probe+0x19e/0x310 [thunderbolt] local_pci_probe+0x42/0xa0 pci_device_probe+0x18d/0x1a0 driver_probe_device+0x2ff/0x450 __driver_attach+0xa4/0xe0 ? driver_probe_device+0x450/0x450 bus_for_each_dev+0x6e/0xb0 driver_attach+0x1e/0x20 bus_add_driver+0x1d0/0x270 ? 0xffffffffc0bbb000 driver_register+0x60/0xe0 ? 0xffffffffc0bbb000 __pci_register_driver+0x4c/0x50 nhi_init+0x28/0x1000 [thunderbolt] do_one_initcall+0x50/0x190 ? __vunmap+0x81/0xb0 ? _cond_resched+0x1a/0x50 ? kmem_cache_alloc_trace+0x15f/0x1c0 ? do_init_module+0x27/0x1e9 do_init_module+0x5f/0x1e9 load_module+0x24e7/0x2a60 ? vfs_read+0x115/0x130 SYSC_finit_module+0xfc/0x120 ? SYSC_finit_module+0xfc/0x120 SyS_finit_module+0xe/0x10 do_syscall_64+0x67/0x170 entry_SYSCALL64_slow_path+0x25/0x25 Fix this by making sure we only enumerate DROM port entries the hardware actually has. Reported-by: Christian Kellner <[email protected]> Signed-off-by: Mika Westerberg <[email protected]> Reviewed-by: Lukas Wunner <[email protected]> Tested-by: Christian Kellner <[email protected]> Cc: stable <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>