aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2017-01-25xfs: remove racy hasattr check from attr opsBrian Foster1-6/+0
xfs_attr_[get|remove]() have unlocked attribute fork checks to optimize away a lock cycle in cases where the fork does not exist or is otherwise empty. This check is not safe, however, because an attribute fork short form to extent format conversion includes a transient state that causes the xfs_inode_hasattr() check to fail. Specifically, xfs_attr_shortform_to_leaf() creates an empty extent format attribute fork and then adds the existing shortform attributes to it. This means that lookup of an existing xattr can spuriously return -ENOATTR when racing against a setxattr that causes the associated format conversion. This was originally reproduced by an untar on a particularly configured glusterfs volume, but can also be reproduced on demand with properly crafted xattr requests. The format conversion occurs under the exclusive ilock. xfs_attr_get() and xfs_attr_remove() already have the proper locking and checks further down in the functions to handle this situation correctly. Drop the unlocked checks to avoid the spurious failure and rely on the existing logic. Signed-off-by: Brian Foster <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-01-25xfs: use per-AG reservations for the finobtChristoph Hellwig5-20/+144
Currently we try to rely on the global reserved block pool for block allocations for the free inode btree, but I have customer reports (fairly complex workload, need to find an easier reproducer) where that is not enough as the AG where we free an inode that requires a new finobt block is entirely full. This causes us to cancel a dirty transaction and thus a file system shutdown. I think the right way to guard against this is to treat the finot the same way as the refcount btree and have a per-AG reservations for the possible worst case size of it, and the patch below implements that. Note that this could increase mount times with large finobt trees. In an ideal world we would have added a field for the number of finobt fields to the AGI, similar to what we did for the refcount blocks. We should do add it next time we rev the AGI or AGF format by adding new fields. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-01-25xfs: only update mount/resv fields on success in __xfs_ag_resv_initChristoph Hellwig1-9/+14
Try to reserve the blocks first and only then update the fields in or hanging off the mount structure. This way we can call __xfs_ag_resv_init again after a previous failure. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Darrick J. Wong <[email protected]>
2017-01-25Revert "ACPI / video: Add force_native quirk for HP Pavilion dv6"Hans de Goede1-11/+0
Revert commit 6276e53fa8c0 (ACPI / video: Add force_native quirk for HP Pavilion dv6). In the commit message for the quirk this revert removes I wrote: "Note that there are quite a few HP Pavilion dv6 variants, some woth ATI and some with NVIDIA hybrid gfx, both seem to need this quirk to have working backlight control. There are also some versions with only Intel integrated gfx, these may not need this quirk, but it should not hurt there." Unfortunately that seems wrong, I've already received 2 reports of this commit causing regressions on some dv6 variants (at least one of which actually has a nvidia GPU). So it seems that HP has made a mess here by using the same model-name both in marketing and in the DMI data for many different variants. Some of which need acpi_backlight=native for functional backlight control (as the quirk this commit reverts was doing), where as others are broken by it. So lets get back to the old sitation so as to avoid regressing on models which used to work without any kernel cmdline arguments before. Fixes: 6276e53fa8c0 (ACPI / video: Add force_native quirk for HP Pavilion dv6) Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2017-01-25Merge tag 'gvt-fixes-2017-01-25' of https://github.com/01org/gvt-linux into ↵Jani Nikula5-57/+25
drm-intel-fixes gvt-fixes-2017-01-25 - re-enable shadow batch buffer for security that was falsely turned off. - kvmgt/mdev typo fix for correct ABI - gvt mail list change Signed-off-by: Jani Nikula <[email protected]>
2017-01-25drm/i915: reinstate call to trace_i915_vma_bindDaniele Ceraolo Spurio1-0/+1
The call went away in: commit 3b16525cc4c1a43e9053cfdc414356eea24bdfad Author: Chris Wilson <[email protected]> Date: Thu Aug 4 16:32:25 2016 +0100 drm/i915: Split insertion/binding of an object into the VM It is useful to have this trace as it pairs nicely with the vma_unbind one to track vma activity. Added inside the i915_vma_bind function (was outside before) to keep a similar placement as trace_i915_vma_unbind. v2: print bind_flags instead of flags (Chris) Fixes: 3b16525cc4c1 ("drm/i915: Split insertion/binding of an object into the VM") Cc: Chris Wilson <[email protected]> Signed-off-by: Daniele Ceraolo Spurio <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/1484949083-11430-1-git-send-email-daniele.ceraolospurio@intel.com Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> (cherry picked from commit 6146e6da5c961735dacf9b6c0c8b5f1382193ee2) Signed-off-by: Jani Nikula <[email protected]>
2017-01-25drm/i915: Move atomic state free from out of fence releaseChris Wilson3-2/+33
Fences are required to support being released from under an atomic context. The drm_atomic_state struct may take a mutex when being released and so we cannot drop a reference to the drm_atomic_state from the fence release path directly, and so we need to defer that unreference to a worker. [ 326.576697] WARNING: CPU: 2 PID: 366 at kernel/sched/core.c:7737 __might_sleep+0x5d/0x80 [ 326.576816] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffffc0359549>] intel_breadcrumbs_signaler+0x59/0x270 [i915] [ 326.576818] Modules linked in: rfcomm fuse snd_hda_codec_hdmi bnep snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_pcm snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer input_leds led_class snd punit_atom_debug btusb btrtl btbcm btintel intel_rapl bluetooth i915 drm_kms_helper syscopyarea sysfillrect iwlwifi sysimgblt soundcore fb_sys_fops mei_txe cfg80211 drm pwm_lpss_platform pwm_lpss pinctrl_cherryview fjes acpi_pad parport_pc ppdev parport autofs4 [ 326.576899] CPU: 2 PID: 366 Comm: i915/signal:0 Tainted: G U 4.10.0-rc3-patser+ #5030 [ 326.576902] Hardware name: /NUC5PPYB, BIOS PYBSWCEL.86A.0031.2015.0601.1712 06/01/2015 [ 326.576905] Call Trace: [ 326.576920] dump_stack+0x4d/0x6d [ 326.576926] __warn+0xc0/0xe0 [ 326.576931] warn_slowpath_fmt+0x5a/0x80 [ 326.577004] ? intel_breadcrumbs_signaler+0x59/0x270 [i915] [ 326.577075] ? intel_breadcrumbs_signaler+0x59/0x270 [i915] [ 326.577079] __might_sleep+0x5d/0x80 [ 326.577087] mutex_lock+0x1b/0x40 [ 326.577133] drm_property_free_blob+0x1e/0x80 [drm] [ 326.577167] ? drm_property_destroy+0xe0/0xe0 [drm] [ 326.577200] drm_mode_object_unreference+0x5c/0x70 [drm] [ 326.577233] drm_property_unreference_blob+0xe/0x10 [drm] [ 326.577260] __drm_atomic_helper_crtc_destroy_state+0x14/0x40 [drm_kms_helper] [ 326.577278] drm_atomic_helper_crtc_destroy_state+0x10/0x20 [drm_kms_helper] [ 326.577352] intel_crtc_destroy_state+0x9/0x10 [i915] [ 326.577388] drm_atomic_state_default_clear+0xea/0x1d0 [drm] [ 326.577462] intel_atomic_state_clear+0xd/0x20 [i915] [ 326.577497] drm_atomic_state_clear+0x1a/0x30 [drm] [ 326.577532] __drm_atomic_state_free+0x13/0x60 [drm] [ 326.577607] intel_atomic_commit_ready+0x6f/0x78 [i915] [ 326.577670] i915_sw_fence_release+0x3a/0x50 [i915] [ 326.577733] dma_i915_sw_fence_wake+0x39/0x80 [i915] [ 326.577741] dma_fence_signal+0xda/0x120 [ 326.577812] ? intel_breadcrumbs_signaler+0x59/0x270 [i915] [ 326.577884] intel_breadcrumbs_signaler+0xb1/0x270 [i915] [ 326.577889] kthread+0x127/0x130 [ 326.577961] ? intel_engine_remove_wait+0x1a0/0x1a0 [i915] [ 326.577964] ? kthread_stop+0x120/0x120 [ 326.577970] ret_from_fork+0x22/0x30 Fixes: c004a90b7263 ("drm/i915: Restore nonblocking awaits for modesetting") Reported-by: Maarten Lankhorst <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Joonas Lahtinen <[email protected]> Cc: Maarten Lankhorst <[email protected]> Cc: Daniel Vetter <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Cc: <[email protected]> # v4.10-rc1+ Reviewed-by: Joonas Lahtinen <[email protected]> (cherry picked from commit eb955eee27d9dc176871540c43c9070ee4701642) Signed-off-by: Jani Nikula <[email protected]>
2017-01-25drm/i915: Check for NULL atomic state in intel_crtc_disable_noatomic()Ander Conselvan de Oliveira1-0/+6
In intel_crtc_disable_noatomic(), bail on a failure to allocate an atomic state to avoid a NULL pointer dereference. Fixes: 4a80655827af ("drm/i915: Pass atomic state to crtc enable/disable functions") Cc: Maarten Lankhorst <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Jani Nikula <[email protected]> Cc: [email protected] Cc: <[email protected]> # v4.9+ Signed-off-by: Ander Conselvan de Oliveira <[email protected]> Reviewed-by: Ville Syrjälä <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/1484922525-6131-4-git-send-email-ander.conselvan.de.oliveira@intel.com (cherry picked from commit 31bb2ef97ea9db343348f9b5ccaa9bb6f48fc655) Signed-off-by: Jani Nikula <[email protected]>
2017-01-25drm/i915: Fix calculation of rotated x and y offsets for planar formatsAnder Conselvan de Oliveira1-2/+3
Parameters tile_size, tile_width and tile_height were passed in the wrong order to _intel_adjust_tile_offset() when calculating the rotated offsets. This doesn't fix any user visible bug, since for packed formats new and old offset are the same and the rotated offsets are within a tile before they are fed to _intel_adjust_tile_offset(). In that case, the offsets are unchanged. That is not true for planar formats, but those are currently not supported. Fixes: 66a2d927cb0e ("drm/i915: Make intel_adjust_tile_offset() work for linear buffers") Cc: Ville Syrjälä <[email protected]> Cc: Sivakumar Thulasimani <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Jani Nikula <[email protected]> Cc: [email protected] Cc: <[email protected]> # v4.9+ Signed-off-by: Ander Conselvan de Oliveira <[email protected]> Reviewed-by: Ville Syrjälä <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/1484922525-6131-3-git-send-email-ander.conselvan.de.oliveira@intel.com (cherry picked from commit 46a1bd289507dfcc428fb9daf65421ed6be6af8b) Signed-off-by: Jani Nikula <[email protected]>
2017-01-25drm/i915: Don't init hpd polling for vlv and chv from runtime_suspend()Ander Conselvan de Oliveira1-1/+1
An error in the condition for avoiding the call to intel_hpd_poll_init() for valleyview and cherryview from intel_runtime_suspend() caused it to be called unconditionally. Fix it. Fixes: 19625e85c6ec ("drm/i915: Enable polling when we don't have hpd") Cc: [email protected] Cc: Ville Syrjälä <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Lyude <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Jani Nikula <[email protected]> Cc: [email protected] Cc: <[email protected]> # v4.9+ Signed-off-by: Ander Conselvan de Oliveira <[email protected]> Reviewed-by: Ville Syrjälä <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/1484922525-6131-2-git-send-email-ander.conselvan.de.oliveira@intel.com (cherry picked from commit 04313b00b79405f86d815100f85c47a2ee5b8ca0) Signed-off-by: Jani Nikula <[email protected]>
2017-01-25drm/i915: Don't leak edid in intel_crt_detect_ddc()Ander Conselvan de Oliveira1-4/+5
In the path where intel_crt_detect_ddc() detects a CRT, if would return true without freeing the edid. Fixes: a2bd1f541f19 ("drm/i915: check whether we actually received an edid in detect_ddc") Cc: Chris Wilson <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Jani Nikula <[email protected]> Cc: [email protected] Cc: <[email protected]> # v3.6+ Signed-off-by: Ander Conselvan de Oliveira <[email protected]> Reviewed-by: Ville Syrjälä <[email protected]> Reviewed-by: Jani Nikula <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/1484922525-6131-1-git-send-email-ander.conselvan.de.oliveira@intel.com (cherry picked from commit c96b63a6a7ac4bd670ec2e663793a9a31418b790) Signed-off-by: Jani Nikula <[email protected]>
2017-01-25drm/i915: Release temporary load-detect state upon switchingChris Wilson1-0/+1
After we call drm_atomic_commit() on the load-detect state, we can free our local reference. Upon restore, we only apply and free the previous state. Fixes: 0853695c3ba4 ("drm: Add reference counting to drm_atomic_state") Signed-off-by: Chris Wilson <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: <[email protected]> # v4.10-rc1+ Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Ville Syrjälä <[email protected]> (cherry picked from commit 7abbd11f344aa7abe29befb218774a1ea26018ac) Signed-off-by: Jani Nikula <[email protected]>
2017-01-25drm/i915: prevent crash with .disable_display parameterClint Taylor1-0/+3
The .disable_display parameter was causing a fatal crash when fbdev was dereferenced during driver init. V1: protection in i915_drv.c V2: Moved protection to intel_fbdev.c Fixes: 43cee314345a ("drm/i915/fbdev: Limit the global async-domain synchronization") Testcase: igt/drv_module_reload/basic-no-display Cc: Chris Wilson <[email protected]> Signed-off-by: Clint Taylor <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Chris Wilson <[email protected]> Cc: Lukas Wunner <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Jani Nikula <[email protected]> Cc: <[email protected]> # v4.8+ Signed-off-by: Chris Wilson <[email protected]> (cherry picked from commit 5b8cd0755f8a06a851c436a013e7be0823fb155a) Signed-off-by: Jani Nikula <[email protected]>
2017-01-25drm/i915: Avoid drm_atomic_state_put(NULL) in intel_display_resumeChris Wilson1-1/+2
intel_display_resume() may be called without an atomic state to restore, i.e. dev_priv->modeset_reset_restore state is NULL. One such case is following a lid open/close event and the forced modeset in intel_lid_notify(). Reported-by: Stefan Seyfried <[email protected]> Tested-by: Stefan Seyfried <[email protected]> Fixes: 0853695c3ba4 ("drm: Add reference counting to drm_atomic_state") Signed-off-by: Chris Wilson <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Jani Nikula <[email protected]> Cc: <[email protected]> # v4.10-rc1+ Link: http://patchwork.freedesktop.org/patch/msgid/[email protected] Reviewed-by: Ander Conselvan de Oliveira <[email protected]> (cherry picked from commit 3c5e37f169cb67cbd03c6116fbc93e0805815d29) Signed-off-by: Jani Nikula <[email protected]>
2017-01-25MAINTAINERS: update new mail list for intel gvt driverZhenyu Wang1-1/+1
We've moved to lists.freedesktop.org from lists.01.org. Update info in MAINTAINERS. Signed-off-by: Zhenyu Wang <[email protected]>
2017-01-25drm/i915/gvt: Fix kmem_cache_create() nameAlex Williamson1-1/+1
According to kmem_cache_sanity_check(), spaces are not allowed in the name of a cache and results in a kernel oops with CONFIG_DEBUG_VM. Convert to underscores. Signed-off-by: Alex Williamson <[email protected]> Signed-off-by: Zhenyu Wang <[email protected]>
2017-01-25drm/i915/gvt/kvmgt: mdev ABI is available_instances, not available_instanceAlex Williamson1-4/+4
Per the ABI specification[1], each mdev_supported_types entry should have an available_instances, with an "s", not available_instance. [1] Documentation/ABI/testing/sysfs-bus-vfio-mdev Signed-off-by: Alex Williamson <[email protected]> Signed-off-by: Zhenyu Wang <[email protected]>
2017-01-25Revert "thermal: thermal_hwmon: Convert to hwmon_device_register_with_info()"Fabio Estevam1-3/+17
This reverts commit 7611fb68062f ("thermal: thermal_hwmon: Convert to hwmon_device_register_with_info()"). Pavel Machek reported breakage in the Nokia N900 due to this commit. We can revisit a proper fix for the warning later. Reported-by: Pavel Machek <[email protected]> Signed-off-by: Fabio Estevam <[email protected]> Acked-by: Guenter Roeck <[email protected]> Acked-by: Pavel Machek <[email protected]> Signed-off-by: Zhang Rui <[email protected]>
2017-01-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds28-78/+237
Merge fixes from Andrew Morton: "26 fixes" * emailed patches from Andrew Morton <[email protected]>: (26 commits) MAINTAINERS: add Dan Streetman to zbud maintainers MAINTAINERS: add Dan Streetman to zswap maintainers mm: do not export ioremap_page_range symbol for external module mn10300: fix build error of missing fpu_save() romfs: use different way to generate fsid for BLOCK or MTD frv: add missing atomic64 operations mm, page_alloc: fix premature OOM when racing with cpuset mems update mm, page_alloc: move cpuset seqcount checking to slowpath mm, page_alloc: fix fast-path race with cpuset update or removal mm, page_alloc: fix check for NULL preferred_zone kernel/panic.c: add missing \n fbdev: color map copying bounds checking frv: add atomic64_add_unless() mm/mempolicy.c: do not put mempolicy before using its nodemask radix-tree: fix private list warnings Documentation/filesystems/proc.txt: add VmPin mm, memcg: do not retry precharge charges proc: add a schedule point in proc_pid_readdir() mm: alloc_contig: re-allow CMA to compact FS pages mm/slub.c: trace free objects at KERN_INFO ...
2017-01-24MAINTAINERS: add Dan Streetman to zbud maintainersDan Streetman1-0/+1
Add myself as zbud maintainer. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Dan Streetman <[email protected]> Cc: Seth Jennings <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24MAINTAINERS: add Dan Streetman to zswap maintainersDan Streetman1-0/+1
Add myself as zswap maintainer. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Dan Streetman <[email protected]> Acked-by: Seth Jennings <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24mm: do not export ioremap_page_range symbol for external modulezhong jiang1-1/+0
Recently, I've found cases in which ioremap_page_range was used incorrectly, in external modules, leading to crashes. This can be partly attributed to the fact that ioremap_page_range is lower-level, with fewer protections, as compared to the other functions that an external module would typically call. Those include: ioremap_cache ioremap_nocache ioremap_prot ioremap_uc ioremap_wc ioremap_wt ...each of which wraps __ioremap_caller, which in turn provides a safer way to achieve the mapping. Therefore, stop EXPORT-ing ioremap_page_range. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: zhong jiang <[email protected]> Reviewed-by: John Hubbard <[email protected]> Suggested-by: John Hubbard <[email protected]> Acked-by: Michal Hocko <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24mn10300: fix build error of missing fpu_save()Randy Dunlap1-1/+1
When CONFIG_FPU is not enabled on arch/mn10300, <asm/switch_to.h> causes a build error with a call to fpu_save(): kernel/built-in.o: In function `.L410': core.c:(.sched.text+0x28a): undefined reference to `fpu_save' Fix this by including <asm/fpu.h> in <asm/switch_to.h> so that an empty static inline fpu_save() is defined. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Randy Dunlap <[email protected]> Reported-by: kbuild test robot <[email protected]> Reviewed-by: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24romfs: use different way to generate fsid for BLOCK or MTDColy Li1-1/+22
Commit 8a59f5d25265 ("fs/romfs: return f_fsid for statfs(2)") generates a 64bit id from sb->s_bdev->bd_dev. This is only correct when romfs is defined with CONFIG_ROMFS_ON_BLOCK. If romfs is only defined with CONFIG_ROMFS_ON_MTD, sb->s_bdev is NULL, referencing sb->s_bdev->bd_dev will triger an oops. Richard Weinberger points out that when CONFIG_ROMFS_BACKED_BY_BOTH=y, both CONFIG_ROMFS_ON_BLOCK and CONFIG_ROMFS_ON_MTD are defined. Therefore when calling huge_encode_dev() to generate a 64bit id, I use the follow order to choose parameter, - CONFIG_ROMFS_ON_BLOCK defined use sb->s_bdev->bd_dev - CONFIG_ROMFS_ON_BLOCK undefined and CONFIG_ROMFS_ON_MTD defined use sb->s_dev when, - both CONFIG_ROMFS_ON_BLOCK and CONFIG_ROMFS_ON_MTD undefined leave id as 0 When CONFIG_ROMFS_ON_MTD is defined and sb->s_mtd is not NULL, sb->s_dev is set to a device ID generated by MTD_BLOCK_MAJOR and mtd index, otherwise sb->s_dev is 0. This is a try-best effort to generate a uniq file system ID, if all the above conditions are not meet, f_fsid of this romfs instance will be 0. Generally only one romfs can be built on single MTD block device, this method is enough to identify multiple romfs instances in a computer. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Coly Li <[email protected]> Reported-by: Nong Li <[email protected]> Tested-by: Nong Li <[email protected]> Cc: Richard Weinberger <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24frv: add missing atomic64 operationsSudip Mukherjee1-1/+18
Some more atomic64 operations were missing and as a result frv allmodconfig was failing. Add the missing operations. Link: http://lkml.kernel.org/r/1485193844-12850-1-git-send-email-sudip.mukherjee@codethink.co.uk Signed-off-by: Sudip Mukherjee <[email protected]> Cc: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24mm, page_alloc: fix premature OOM when racing with cpuset mems updateVlastimil Babka1-11/+24
Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode triggers OOM killer in few seconds, despite lots of free memory. The test attempts to repeatedly fault in memory in one process in a cpuset, while changing allowed nodes of the cpuset between 0 and 1 in another process. The problem comes from insufficient protection against cpuset changes, which can cause get_page_from_freelist() to consider all zones as non-eligible due to nodemask and/or current->mems_allowed. This was masked in the past by sufficient retries, but since commit 682a3385e773 ("mm, page_alloc: inline the fast path of the zonelist iterator") we fix the preferred_zoneref once, and don't iterate over the whole zonelist in further attempts, thus the only eligible zones might be placed in the zonelist before our starting point and we always miss them. A previous patch fixed this problem for current->mems_allowed. However, cpuset changes also update the task's mempolicy nodemask. The fix has two parts. We have to repeat the preferred_zoneref search when we detect cpuset update by way of seqcount, and we have to check the seqcount before considering OOM. [[email protected]: fix typo in comment] Link: http://lkml.kernel.org/r/[email protected] Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Signed-off-by: Vlastimil Babka <[email protected]> Reported-by: Ganapatrao Kulkarni <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24mm, page_alloc: move cpuset seqcount checking to slowpathVlastimil Babka1-21/+26
This is a preparation for the following patch to make review simpler. While the primary motivation is a bug fix, this also simplifies the fast path, although the moved code is only enabled when cpusets are in use. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24mm, page_alloc: fix fast-path race with cpuset update or removalVlastimil Babka1-1/+9
Ganapatrao Kulkarni reported that the LTP test cpuset01 in stress mode triggers OOM killer in few seconds, despite lots of free memory. The test attempts to repeatedly fault in memory in one process in a cpuset, while changing allowed nodes of the cpuset between 0 and 1 in another process. One possible cause is that in the fast path we find the preferred zoneref according to current mems_allowed, so that it points to the middle of the zonelist, skipping e.g. zones of node 1 completely. If the mems_allowed is updated to contain only node 1, we never reach it in the zonelist, and trigger OOM before checking the cpuset_mems_cookie. This patch fixes the particular case by redoing the preferred zoneref search if we switch back to the original nodemask. The condition is also slightly changed so that when the last non-root cpuset is removed, we don't miss it. Note that this is not a full fix, and more patches will follow. Link: http://lkml.kernel.org/r/[email protected] Fixes: 682a3385e773 ("mm, page_alloc: inline the fast path of the zonelist iterator") Signed-off-by: Vlastimil Babka <[email protected]> Reported-by: Ganapatrao Kulkarni <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24mm, page_alloc: fix check for NULL preferred_zoneVlastimil Babka2-2/+6
Patch series "fix premature OOM regression in 4.7+ due to cpuset races". This is v2 of my attempt to fix the recent report based on LTP cpuset stress test [1]. The intention is to go to stable 4.9 LTSS with this, as triggering repeated OOMs is not nice. That's why the patches try to be not too intrusive. Unfortunately why investigating I found that modifying the testcase to use per-VMA policies instead of per-task policies will bring the OOM's back, but that seems to be much older and harder to fix problem. I have posted a RFC [2] but I believe that fixing the recent regressions has a higher priority. Longer-term we might try to think how to fix the cpuset mess in a better and less error prone way. I was for example very surprised to learn, that cpuset updates change not only task->mems_allowed, but also nodemask of mempolicies. Until now I expected the parameter to alloc_pages_nodemask() to be stable. I wonder why do we then treat cpusets specially in get_page_from_freelist() and distinguish HARDWALL etc, when there's unconditional intersection between mempolicy and cpuset. I would expect the nodemask adjustment for saving overhead in g_p_f(), but that clearly doesn't happen in the current form. So we have both crazy complexity and overhead, AFAICS. [1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com [2] https://lkml.kernel.org/r/[email protected] This patch (of 4): Since commit c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") we have a wrong check for NULL preferred_zone, which can theoretically happen due to concurrent cpuset modification. We check the zoneref pointer which is never NULL and we should check the zone pointer. Also document this in first_zones_zonelist() comment per Michal Hocko. Fixes: c33d6c06f60f ("mm, page_alloc: avoid looking up the first zone in a zonelist twice") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Mel Gorman <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Ganapatrao Kulkarni <[email protected]> Cc: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24kernel/panic.c: add missing \nJiri Slaby1-1/+1
When a system panics, the "Rebooting in X seconds.." message is never printed because it lacks a new line. Fix it. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jiri Slaby <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24fbdev: color map copying bounds checkingKees Cook1-12/+14
Copying color maps to userspace doesn't check the value of to->start, which will cause kernel heap buffer OOB read due to signedness wraps. CVE-2016-8405 Link: http://lkml.kernel.org/r/20170105224249.GA50925@beast Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kees Cook <[email protected]> Reported-by: Peter Pi (@heisecode) of Trend Micro Cc: Min Chong <[email protected]> Cc: Dan Carpenter <[email protected]> Cc: Tomi Valkeinen <[email protected]> Cc: Bartlomiej Zolnierkiewicz <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24frv: add atomic64_add_unless()Sudip Mukherjee1-0/+16
The build of frv allmodconfig was failing with the error: lib/atomic64_test.c:209:9: error: implicit declaration of function 'atomic64_add_unless' All the atomic64 operations were defined in frv, but atomic64_add_unless() was not done. Implement atomic64_add_unless() as done in other arches. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Sudip Mukherjee <[email protected]> Cc: David Howells <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24mm/mempolicy.c: do not put mempolicy before using its nodemaskVlastimil Babka1-1/+1
Since commit be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to alloc_pages_vma") alloc_pages_vma() can potentially free a mempolicy by mpol_cond_put() before accessing the embedded nodemask by __alloc_pages_nodemask(). The commit log says it's so "we can use a single exit path within the function" but that's clearly wrong. We can still do that when doing mpol_cond_put() after the allocation attempt. Make sure the mempolicy is not freed prematurely, otherwise __alloc_pages_nodemask() can end up using a bogus nodemask, which could lead e.g. to premature OOM. Fixes: be97a41b291e ("mm/mempolicy.c: merge alloc_hugepage_vma to alloc_pages_vma") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Vlastimil Babka <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Andrea Arcangeli <[email protected]> Cc: <[email protected]> [4.0+] Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24radix-tree: fix private list warningsMatthew Wilcox1-1/+1
The newly introduced warning in radix_tree_free_nodes() was testing the wrong variable; it should have been 'old' instead of 'node'. Fixes: ea07b862ac8e ("mm: workingset: fix use-after-free in shadow node shrinker") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Signed-off-by: Johannes Weiner <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24Documentation/filesystems/proc.txt: add VmPinFabian Frederick1-2/+3
Commit bc3e53f682d9 ("mm: distinguish between mlocked and pinned pages") added VmPin in /proc/<pid>/status. Report that in Documentation/filesystems/proc.txt Also move Umask after Name to keep correct order. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Fabian Frederick <[email protected]> Cc: Christoph Lameter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24mm, memcg: do not retry precharge chargesDavid Rientjes1-2/+2
When memory.move_charge_at_immigrate is enabled and precharges are depleted during move, mem_cgroup_move_charge_pte_range() will attempt to increase the size of the precharge. Prevent precharges from ever looping by setting __GFP_NORETRY. This was probably the intention of the GFP_KERNEL & ~__GFP_NORETRY, which is pointless as written. Fixes: 0029e19ebf84 ("mm: memcontrol: remove explicit OOM parameter in charge path") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: David Rientjes <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24proc: add a schedule point in proc_pid_readdir()Eric Dumazet1-0/+2
We have seen proc_pid_readdir() invocations holding cpu for more than 50 ms. Add a cond_resched() to be gentle with other tasks. [[email protected]: coding style fix] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24mm: alloc_contig: re-allow CMA to compact FS pagesLucas Stach1-0/+1
Commit 73e64c51afc5 ("mm, compaction: allow compaction for GFP_NOFS requests") changed compation to skip FS pages if not explicitly allowed to touch them, but missed to update the CMA compact_control. This leads to a very high isolation failure rate, crippling performance of CMA even on a lightly loaded system. Re-allow CMA to compact FS pages by setting the correct GFP flags, restoring CMA behavior and performance to the kernel 4.9 level. Fixes: 73e64c51afc5 (mm, compaction: allow compaction for GFP_NOFS requests) Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Lucas Stach <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Vlastimil Babka <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Mel Gorman <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24mm/slub.c: trace free objects at KERN_INFODaniel Thompson1-10/+13
Currently when trace is enabled (e.g. slub_debug=T,kmalloc-128 ) the trace messages are mostly output at KERN_INFO. However the trace code also calls print_section() to hexdump the head of a free object. This is hard coded to use KERN_ERR, meaning the console is deluged with trace messages even if we've asked for quiet. Fix this the obvious way but adding a level parameter to print_section(), allowing calls from the trace code to use the same trace level as other trace messages. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Daniel Thompson <[email protected]> Acked-by: Christoph Lameter <[email protected]> Acked-by: David Rientjes <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: Joonsoo Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24userfaultfd: fix SIGBUS resulting from false rwsem wakeupsAndrea Arcangeli1-2/+35
With >=32 CPUs the userfaultfd selftest triggered a graceful but unexpected SIGBUS because VM_FAULT_RETRY was returned by handle_userfault() despite the UFFDIO_COPY wasn't completed. This seems caused by rwsem waking the thread blocked in handle_userfault() and we can't run up_read() before the wait_event sequence is complete. Keeping the wait_even sequence identical to the first one, would require running userfaultfd_must_wait() again to know if the loop should be repeated, and it would also require retaking the rwsem and revalidating the whole vma status. It seems simpler to wait the targeted wakeup so that if false wakeups materialize we still wait for our specific wakeup event, unless of course there are signals or the uffd was released. Debug code collecting the stack trace of the wakeup showed this: $ ./userfaultfd 100 99999 nr_pages: 25600, nr_pages_per_cpu: 800 bounces: 99998, mode: racing ver poll, userfaults: 32 35 90 232 30 138 69 82 34 30 139 40 40 31 20 19 43 13 15 28 27 38 21 43 56 22 1 17 31 8 4 2 bounces: 99997, mode: rnd ver poll, Bus error (core dumped) save_stack_trace+0x2b/0x50 try_to_wake_up+0x2a6/0x580 wake_up_q+0x32/0x70 rwsem_wake+0xe0/0x120 call_rwsem_wake+0x1b/0x30 up_write+0x3b/0x40 vm_mmap_pgoff+0x9c/0xc0 SyS_mmap_pgoff+0x1a9/0x240 SyS_mmap+0x22/0x30 entry_SYSCALL_64_fastpath+0x1f/0xbd 0xffffffffffffffff FAULT_FLAG_ALLOW_RETRY missing 70 CPU: 24 PID: 1054 Comm: userfaultfd Tainted: G W 4.8.0+ #30 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014 Call Trace: dump_stack+0xb8/0x112 handle_userfault+0x572/0x650 handle_mm_fault+0x12cb/0x1520 __do_page_fault+0x175/0x500 trace_do_page_fault+0x61/0x270 do_async_page_fault+0x19/0x90 async_page_fault+0x25/0x30 This always happens when the main userfault selftest thread is running clone() while glibc runs either mprotect or mmap (both taking mmap_sem down_write()) to allocate the thread stack of the background threads, while locking/userfault threads already run at full throttle and are susceptible to false wakeups that may cause handle_userfault() to return before than expected (which results in graceful SIGBUS at the next attempt). This was reproduced only with >=32 CPUs because the loop to start the thread where clone() is too quick with fewer CPUs, while with 32 CPUs there's already significant activity on ~32 locking and userfault threads when the last background threads are started with clone(). This >=32 CPUs SMP race condition is likely reproducible only with the selftest because of the much heavier userfault load it generates if compared to real apps. We'll have to allow "one more" VM_FAULT_RETRY for the WP support and a patch floating around that provides it also hidden this problem but in reality only is successfully at hiding the problem. False wakeups could still happen again the second time handle_userfault() is invoked, even if it's a so rare race condition that getting false wakeups twice in a row is impossible to reproduce. This full fix is needed for correctness, the only alternative would be to allow VM_FAULT_RETRY to be returned infinitely. With this fix the WP support can stick to a strict "one more" VM_FAULT_RETRY logic (no need of returning it infinite times to avoid the SIGBUS). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andrea Arcangeli <[email protected]> Reported-by: Shubham Kumar Sharma <[email protected]> Tested-by: Mike Kravetz <[email protected]> Acked-by: Hillf Danton <[email protected]> Cc: Michael Rapoport <[email protected]> Cc: "Dr. David Alan Gilbert" <[email protected]> Cc: Pavel Emelyanov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24drivers/memstick/core/memstick.c: avoid -Wnonnull warningArnd Bergmann1-1/+1
gcc-7 produces a harmless false-postive warning about a possible NULL pointer access: drivers/memstick/core/memstick.c: In function 'h_memstick_read_dev_id': drivers/memstick/core/memstick.c:309:3: error: argument 2 null where non-null expected [-Werror=nonnull] memcpy(mrq->data, buf, mrq->data_len); This can't happen because the caller sets the command to 'MS_TPC_READ_REG', which causes the data direction to be 'READ' and the NULL pointer not accessed. As a simple workaround for the warning, we can pass a pointer to the data that we actually want to read into. This is not needed here, but also harmless, and lets the compiler know that the access is ok. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnd Bergmann <[email protected]> Cc: Alex Dubov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24kernel/watchdog: prevent false hardlockup on overloaded systemDon Zickus3-0/+13
On an overloaded system, it is possible that a change in the watchdog threshold can be delayed long enough to trigger a false positive. This can easily be achieved by having a cpu spinning indefinitely on a task, while another cpu updates watchdog threshold. What happens is while trying to park the watchdog threads, the hrtimers on the other cpus trigger and reprogram themselves with the new slower watchdog threshold. Meanwhile, the nmi watchdog is still programmed with the old faster threshold. Because the one cpu is blocked, it prevents the thread parking on the other cpus from completing, which is needed to shutdown the nmi watchdog and reprogram it correctly. As a result, a false positive from the nmi watchdog is reported. Fix this by setting a park_in_progress flag to block all lockups until the parking is complete. Fix provided by Ulrich Obergfell. [[email protected]: s/park_in_progress/watchdog_park_in_progress/] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Don Zickus <[email protected]> Reviewed-by: Aaron Tomlin <[email protected]> Cc: Ulrich Obergfell <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24dax: fix build warnings with FS_DAX and !FS_IOMAPRoss Zwisler4-4/+1
As reported by Arnd: https://lkml.org/lkml/2017/1/10/756 Compiling with the following configuration: # CONFIG_EXT2_FS is not set # CONFIG_EXT4_FS is not set # CONFIG_XFS_FS is not set # CONFIG_FS_IOMAP depends on the above filesystems, as is not set CONFIG_FS_DAX=y generates build warnings about unused functions in fs/dax.c: fs/dax.c:878:12: warning: `dax_insert_mapping' defined but not used [-Wunused-function] static int dax_insert_mapping(struct address_space *mapping, ^~~~~~~~~~~~~~~~~~ fs/dax.c:572:12: warning: `copy_user_dax' defined but not used [-Wunused-function] static int copy_user_dax(struct block_device *bdev, sector_t sector, size_t size, ^~~~~~~~~~~~~ fs/dax.c:542:12: warning: `dax_load_hole' defined but not used [-Wunused-function] static int dax_load_hole(struct address_space *mapping, void **entry, ^~~~~~~~~~~~~ fs/dax.c:312:14: warning: `grab_mapping_entry' defined but not used [-Wunused-function] static void *grab_mapping_entry(struct address_space *mapping, pgoff_t index, ^~~~~~~~~~~~~~~~~~ Now that the struct buffer_head based DAX fault paths and I/O path have been removed we really depend on iomap support being present for DAX. Make this explicit by selecting FS_IOMAP if we compile in DAX support. This allows us to remove conditional selections of FS_IOMAP when FS_DAX was present for ext2 and ext4, and to remove an #ifdef in fs/dax.c. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ross Zwisler <[email protected]> Reported-by: Arnd Bergmann <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thpKeno Fischer1-1/+17
In commit 19be0eaffa3a ("mm: remove gup_flags FOLL_WRITE games from __get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE after a COW was resolved to setting the (newly introduced) FOLL_COW instead. Simultaneously, the check in gup.c was updated to still allow writes with FOLL_FORCE set if FOLL_COW had also been set. However, a similar check in huge_memory.c was forgotten. As a result, remote memory writes to ro regions of memory backed by transparent huge pages cause an infinite loop in the kernel (handle_mm_fault sets FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is true. While in this state the process is stil SIGKILLable, but little else works (e.g. no ptrace attach, no other signals). This is easily reproduced with the following code (assuming thp are set to always): #include <assert.h> #include <fcntl.h> #include <stdint.h> #include <stdio.h> #include <string.h> #include <sys/mman.h> #include <sys/stat.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #define TEST_SIZE 5 * 1024 * 1024 int main(void) { int status; pid_t child; int fd = open("/proc/self/mem", O_RDWR); void *addr = mmap(NULL, TEST_SIZE, PROT_READ, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); assert(addr != MAP_FAILED); pid_t parent_pid = getpid(); if ((child = fork()) == 0) { void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, 0, 0); assert(addr2 != MAP_FAILED); memset(addr2, 'a', TEST_SIZE); pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr); return 0; } assert(child == waitpid(child, &status, 0)); assert(WIFEXITED(status) && WEXITSTATUS(status) == 0); return 0; } Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously to the update in gup.c in the original commit. The same pattern exists in follow_devmap_pmd. However, we should not be able to reach that check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we ever do. [[email protected]: coding-style fixes] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Keno Fischer <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Greg Thelen <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Willy Tarreau <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Kees Cook <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Hugh Dickins <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-24memory_hotplug: make zone_can_shift() return a boolean valueYasuaki Ishimatsu3-15/+21
online_{kernel|movable} is used to change the memory zone to ZONE_{NORMAL|MOVABLE} and online the memory. To check that memory zone can be changed, zone_can_shift() is used. Currently the function returns minus integer value, plus integer value and 0. When the function returns minus or plus integer value, it means that the memory zone can be changed to ZONE_{NORNAL|MOVABLE}. But when the function returns 0, there are two meanings. One of the meanings is that the memory zone does not need to be changed. For example, when memory is in ZONE_NORMAL and onlined by online_kernel the memory zone does not need to be changed. Another meaning is that the memory zone cannot be changed. When memory is in ZONE_NORMAL and onlined by online_movable, the memory zone may not be changed to ZONE_MOVALBE due to memory online limitation(see Documentation/memory-hotplug.txt). In this case, memory must not be onlined. The patch changes the return type of zone_can_shift() so that memory online operation fails when memory zone cannot be changed as follows: Before applying patch: # grep -A 35 "Node 2" /proc/zoneinfo Node 2, zone Normal <snip> node_scanned 0 spanned 8388608 present 7864320 managed 7864320 # echo online_movable > memory4097/state # grep -A 35 "Node 2" /proc/zoneinfo Node 2, zone Normal <snip> node_scanned 0 spanned 8388608 present 8388608 managed 8388608 online_movable operation succeeded. But memory is onlined as ZONE_NORMAL, not ZONE_MOVABLE. After applying patch: # grep -A 35 "Node 2" /proc/zoneinfo Node 2, zone Normal <snip> node_scanned 0 spanned 8388608 present 7864320 managed 7864320 # echo online_movable > memory4097/state bash: echo: write error: Invalid argument # grep -A 35 "Node 2" /proc/zoneinfo Node 2, zone Normal <snip> node_scanned 0 spanned 8388608 present 7864320 managed 7864320 online_movable operation failed because of failure of changing the memory zone from ZONE_NORMAL to ZONE_MOVABLE Fixes: df429ac03936 ("memory-hotplug: more general validation of zone during online") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yasuaki Ishimatsu <[email protected]> Reviewed-by: Reza Arbab <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2017-01-25vring: Force use of DMA API for ARM-based systems with legacy devicesWill Deacon1-0/+7
Booting Linux on an ARM fastmodel containing an SMMU emulation results in an unexpected I/O page fault from the legacy virtio-blk PCI device: [ 1.211721] arm-smmu-v3 2b400000.smmu: event 0x10 received: [ 1.211800] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010 [ 1.211880] arm-smmu-v3 2b400000.smmu: 0x0000020800000000 [ 1.211959] arm-smmu-v3 2b400000.smmu: 0x00000008fa081002 [ 1.212075] arm-smmu-v3 2b400000.smmu: 0x0000000000000000 [ 1.212155] arm-smmu-v3 2b400000.smmu: event 0x10 received: [ 1.212234] arm-smmu-v3 2b400000.smmu: 0x00000000fffff010 [ 1.212314] arm-smmu-v3 2b400000.smmu: 0x0000020800000000 [ 1.212394] arm-smmu-v3 2b400000.smmu: 0x00000008fa081000 [ 1.212471] arm-smmu-v3 2b400000.smmu: 0x0000000000000000 <system hangs failing to read partition table> This is because the legacy virtio-blk device is behind an SMMU, so we have consequently swizzled its DMA ops and configured the SMMU to translate accesses. This then requires the vring code to use the DMA API to establish translations, otherwise all transactions will result in fatal faults and termination. Given that ARM-based systems only see an SMMU if one is really present (the topology is all described by firmware tables such as device-tree or IORT), then we can safely use the DMA API for all legacy virtio devices. Modern devices can advertise the prescense of an IOMMU using the VIRTIO_F_IOMMU_PLATFORM feature flag. Cc: Andy Lutomirski <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Cc: <[email protected]> Fixes: 876945dbf649 ("arm64: Hook up IOMMU dma_ops") Signed-off-by: Will Deacon <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Marc Zyngier <[email protected]>
2017-01-25virtio_mmio: Set DMA masks appropriatelyRobin Murphy1-1/+19
Once DMA API usage is enabled, it becomes apparent that virtio-mmio is inadvertently relying on the default 32-bit DMA mask, which leads to problems like rapidly exhausting SWIOTLB bounce buffers. Ensure that we set the appropriate 64-bit DMA mask whenever possible, with the coherent mask suitably limited for the legacy vring as per a0be1db4304f ("virtio_pci: Limit DMA mask to 44 bits for legacy virtio devices"). Cc: Andy Lutomirski <[email protected]> Cc: Michael S. Tsirkin <[email protected]> Reported-by: Jean-Philippe Brucker <[email protected]> Fixes: b42111382f0e ("virtio_mmio: Use the DMA API if enabled") Signed-off-by: Robin Murphy <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2017-01-25vhost/vsock: handle vhost_vq_init_access() errorStefan Hajnoczi1-4/+9
Propagate the error when vhost_vq_init_access() fails and set vq->private_data to NULL. Signed-off-by: Stefan Hajnoczi <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2017-01-24ARCv2: smp-boot: wake_flag polling by non-Masters needs to be uncachedVineet Gupta1-3/+16
This is needed on HS38 cores, for setting up IO-Coherency aperture properly The polling could perturb the caches and coherecy fabric which could be wrong in the small window when Master is setting up IOC aperture etc in arc_cache_init() We do it only for ARCv2 based builds to not affect EZChip ARCompact based platform. Signed-off-by: Vineet Gupta <[email protected]>
2017-01-24Merge branch 'lwt-module-unload'David S. Miller7-1/+13
Robert Shearman says: ==================== net: Fix oops on state free after lwt module unload An oops is seen in lwtstate_free after an lwt ops module has been unloaded. This patchset fixes this by preventing modules implementing lwtunnel ops from being unloaded whilst there's state alive using those ops. The first patch adds fills in a new owner field in all lwt ops and the second patch makes use of this to reference count the modules as state is built and destroyed using them. Changes in v3: - don't put module reference if try_module_get fails on building state Changes in v2: - specify module owner for all modules as suggested by DaveM - reference count all modules building lwt state, not just those ops implementing destroy_state, as also suggested by DaveM. - rebased on top of David Ahern's lwtunnel changes ==================== Signed-off-by: David S. Miller <[email protected]>