aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-07-28mm, oom_reaper: do not attempt to reap a task more than twiceMichal Hocko2-0/+20
oom_reaper relies on the mmap_sem for read to do its job. Many places which might block readers have been converted to use down_write_killable and that has reduced chances of the contention a lot. Some paths where the mmap_sem is held for write can take other locks and they might either be not prepared to fail due to fatal signal pending or too impractical to be changed. This patch introduces MMF_OOM_NOT_REAPABLE flag which gets set after the first attempt to reap a task's mm fails. If the flag is present after the failure then we set MMF_OOM_REAPED to hide this mm from the oom killer completely so it can go and chose another victim. As a result a risk of OOM deadlock when the oom victim would be blocked indefinetly and so the oom killer cannot make any progress should be mitigated considerably while we still try really hard to perform all reclaim attempts and stay predictable in the behavior. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: David Rientjes <[email protected]> Cc: Tetsuo Handa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-28mm, oom: task_will_free_mem should skip oom_reaped tasksMichal Hocko1-0/+10
The 0-day robot has encountered the following: Out of memory: Kill process 3914 (trinity-c0) score 167 or sacrifice child Killed process 3914 (trinity-c0) total-vm:55864kB, anon-rss:1512kB, file-rss:1088kB, shmem-rss:25616kB oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26488kB oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26900kB oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:26900kB oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:27296kB oom_reaper: reaped process 3914 (trinity-c0), now anon-rss:0kB, file-rss:0kB, shmem-rss:28148kB oom_reaper is trying to reap the same task again and again. This is possible only when the oom killer is bypassed because of task_will_free_mem because we skip over tasks with MMF_OOM_REAPED already set during select_bad_process. Teach task_will_free_mem to skip over MMF_OOM_REAPED tasks as well because they will be unlikely to free anything more. Analyzed by Tetsuo Handa. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Cc: Tetsuo Handa <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: David Rientjes <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-28mm, oom: fortify task_will_free_mem()Michal Hocko3-78/+85
task_will_free_mem is rather weak. It doesn't really tell whether the task has chance to drop its mm. 98748bd72200 ("oom: consider multi-threaded tasks in task_will_free_mem") made a first step into making it more robust for multi-threaded applications so now we know that the whole process is going down and probably drop the mm. This patch builds on top for more complex scenarios where mm is shared between different processes - CLONE_VM without CLONE_SIGHAND, or in kernel use_mm(). Make sure that all processes sharing the mm are killed or exiting. This will allow us to replace try_oom_reaper by wake_oom_reaper because task_will_free_mem implies the task is reapable now. Therefore all paths which bypass the oom killer are now reapable and so they shouldn't lock up the oom killer. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: David Rientjes <[email protected]> Cc: Tetsuo Handa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-28mm, oom: kill all tasks sharing the mmMichal Hocko1-2/+1
Currently oom_kill_process skips both the oom reaper and SIG_KILL if a process sharing the same mm is unkillable via OOM_ADJUST_MIN. After "mm, oom_adj: make sure processes sharing mm have same view of oom_score_adj" all such processes are sharing the same value so we shouldn't see such a task at all (oom_badness would rule them out). We can still encounter oom disabled vforked task which has to be killed as well if we want to have other tasks sharing the mm reapable because it can access the memory before doing exec. Killing such a task should be acceptable because it is highly unlikely it has done anything useful because it cannot modify any memory before it calls exec. An alternative would be to keep the task alive and skip the oom reaper and risk all the weird corner cases where the OOM killer cannot make forward progress because the oom victim hung somewhere on the way to exit. [[email protected] - drop printk when OOM_SCORE_ADJ_MIN killed task the setting is inherently racy and we cannot do much about it without introducing locks in hot paths] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: David Rientjes <[email protected]> Cc: Tetsuo Handa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-28mm, oom: skip vforked tasks from being selectedMichal Hocko2-2/+30
vforked tasks are not really sitting on any memory. They are sharing the mm with parent until they exec into a new code. Until then it is just pinning the address space. OOM killer will kill the vforked task along with its parent but we still can end up selecting vforked task when the parent wouldn't be selected. E.g. init doing vfork to launch a task or vforked being a child of oom unkillable task with an updated oom_score_adj to be killable. Add a new helper to check whether a task is in the vfork sharing memory with its parent and use it in oom_badness to skip over these tasks. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: David Rientjes <[email protected]> Cc: Tetsuo Handa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-28mm, oom_adj: make sure processes sharing mm have same view of oom_score_adjMichal Hocko3-1/+49
oom_score_adj is shared for the thread groups (via struct signal) but this is not sufficient to cover processes sharing mm (CLONE_VM without CLONE_SIGHAND) and so we can easily end up in a situation when some processes update their oom_score_adj and confuse the oom killer. In the worst case some of those processes might hide from the oom killer altogether via OOM_SCORE_ADJ_MIN while others are eligible. OOM killer would then pick up those eligible but won't be allowed to kill others sharing the same mm so the mm wouldn't release the mm and so the memory. It would be ideal to have the oom_score_adj per mm_struct because that is the natural entity OOM killer considers. But this will not work because some programs are doing vfork() set_oom_adj() exec() We can achieve the same though. oom_score_adj write handler can set the oom_score_adj for all processes sharing the same mm if the task is not in the middle of vfork. As a result all the processes will share the same oom_score_adj. The current implementation is rather pessimistic and checks all the existing processes by default if there is more than 1 holder of the mm but we do not have any reliable way to check for external users yet. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: David Rientjes <[email protected]> Cc: Tetsuo Handa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-28proc, oom_adj: extract oom_score_adj setting into a helperMichal Hocko1-51/+43
Currently we have two proc interfaces to set oom_score_adj. The legacy /proc/<pid>/oom_adj and /proc/<pid>/oom_score_adj which both have their specific handlers. Big part of the logic is duplicated so extract the common code into __set_oom_adj helper. Legacy knob still expects some details slightly different so make sure those are handled same way - e.g. the legacy mode ignores oom_score_adj_min and it warns about the usage. This patch shouldn't introduce any functional changes. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: David Rientjes <[email protected]> Cc: Tetsuo Handa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-28proc, oom: drop bogus sighand lockMichal Hocko1-34/+17
Oleg has pointed out that can simplify both oom_adj_{read,write} and oom_score_adj_{read,write} even further and drop the sighand lock. The main purpose of the lock was to protect p->signal from going away but this will not happen since ea6d290ca34c ("signals: make task_struct->signal immutable/refcountable"). The other role of the lock was to synchronize different writers, especially those with CAP_SYS_RESOURCE. Introduce a mutex for this purpose. Later patches will need this lock anyway. Suggested-by: Oleg Nesterov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Cc: Vladimir Davydov <[email protected]> Cc: David Rientjes <[email protected]> Cc: Tetsuo Handa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-28proc, oom: drop bogus task_lock and mm checkMichal Hocko1-18/+4
Series "Handle oom bypass more gracefully", V5 The following 10 patches should put some order to very rare cases of mm shared between processes and make the paths which bypass the oom killer oom reapable and therefore much more reliable finally. Even though mm shared outside of thread group is rare (either vforked tasks for a short period, use_mm by kernel threads or exotic thread model of clone(CLONE_VM) without CLONE_SIGHAND) it is better to cover them. Not only it makes the current oom killer logic quite hard to follow and reason about it can lead to weird corner cases. E.g. it is possible to select an oom victim which shares the mm with unkillable process or bypass the oom killer even when other processes sharing the mm are still alive and other weird cases. Patch 1 drops bogus task_lock and mm check from oom_{score_}adj_write. This can be considered a bug fix with a low impact as nobody has noticed for years. Patch 2 drops sighand lock because it is not needed anymore as pointed by Oleg. Patch 3 is a clean up of oom_score_adj handling and a preparatory work for later patches. Patch 4 enforces oom_adj_score to be consistent between processes sharing the mm to behave consistently with the regular thread groups. This can be considered a user visible behavior change because one thread group updating oom_score_adj will affect others which share the same mm via clone(CLONE_VM). I argue that this should be acceptable because we already have the same behavior for threads in the same thread group and sharing the mm without signal struct is just a different model of threading. This is probably the most controversial part of the series, I would like to find some consensus here. There were some suggestions to hook some counter/oom_score_adj into the mm_struct but I feel that this is not necessary right now and we can rely on proc handler + oom_kill_process to DTRT. I can be convinced otherwise but I strongly think that whatever we do the userspace has to have a way to see the current oom priority as consistently as possible. Patch 5 makes sure that no vforked task is selected if it is sharing the mm with oom unkillable task. Patch 6 ensures that all user tasks sharing the mm are killed which in turn makes sure that all oom victims are oom reapable. Patch 7 guarantees that task_will_free_mem will always imply reapable bypass of the oom killer. Patch 8 is new in this version and it addresses an issue pointed out by 0-day OOM report where an oom victim was reaped several times. Patch 9 puts an upper bound on how many times oom_reaper tries to reap a task and hides it from the oom killer to move on when no progress can be made. This will give an upper bound to how long an oom_reapable task can block the oom killer from selecting another victim if the oom_reaper is not able to reap the victim. Patch 10 tries to plug the (hopefully) last hole when we can still lock up when the oom victim is shared with oom unkillable tasks (kthreads and global init). We just try to be best effort in that case and rather fallback to kill something else than risk a lockup. This patch (of 10): Both oom_adj_write and oom_score_adj_write are using task_lock, check for task->mm and fail if it is NULL. This is not needed because the oom_score_adj is per signal struct so we do not need mm at all. The code has been introduced by 3d5992d2ac7d ("oom: add per-mm oom disable count") but we do not do per-mm oom disable since c9f01245b6a7 ("oom: remove oom_disable_count"). The task->mm check is even not correct because the current thread might have exited but the thread group might be still alive - e.g. thread group leader would lead that echo $VAL > /proc/pid/oom_score_adj would always fail with EINVAL while /proc/pid/task/$other_tid/oom_score_adj would succeed. This is unexpected at best. Remove the lock along with the check to fix the unexpected behavior and also because there is not real need for the lock in the first place. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reviewed-by: Vladimir Davydov <[email protected]> Acked-by: Oleg Nesterov <[email protected]> Cc: David Rientjes <[email protected]> Cc: Tetsuo Handa <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-28Merge tag 'dmaengine-4.8-rc1' of git://git.infradead.org/users/vkoul/slave-dmaLinus Torvalds53-380/+3231
Pull dmaengine updates from Vinod Koul: "This time we have bit of largish changes: two new drivers, bunch of updates and cleanups to existing set. Nothing super exciting though. New drivers: - Xilinx zynqmp dma engine driver - Marvell xor2 driver Updates: - dmatest sg support - updates and enhancements to Xilinx drivers, adding of cyclic mode - clock handling fixes across drivers - removal of OOM messages on kzalloc across subsystem - interleaved transfers support in omap driver - runtime pm support in qcom bam dma - tasklet kill freeup across drivers - irq cleanup on remove across drivers" * tag 'dmaengine-4.8-rc1' of git://git.infradead.org/users/vkoul/slave-dma: (94 commits) dmaengine: k3dma: add missing clk_disable_unprepare() on error in k3_dma_probe() dmaengine: zynqmp_dma: add missing MODULE_LICENSE dmaengine: qcom_hidma: use for_each_matching_node() macro dmaengine: zynqmp_dma: Fix static checker warning dmaengine: omap-dma: Support for interleaved transfer dmaengine: ioat: statify symbol dmaengine: pxa_dma: implement device_synchronize dmaengine: imx-sdma: remove assignment never used dmaengine: imx-sdma: remove dummy assignment dmaengine: cppi: remove unused and bogus check dmaengine: qcom_hidma_lli: kill the tasklets upon exit dmaengine: pxa_dma: remove owner assignment dmaengine: fsl_raid: remove owner assignment dmaengine: coh901318: remove owner assignment dmaengine: qcom_hidma: kill the tasklets upon exit dmaengine: txx9dmac: explicitly freeup irq dmaengine: sirf-dma: kill the tasklets upon exit dmaengine: s3c24xx: kill the tasklets upon exit dmaengine: s3c24xx: explicitly freeup irq dmaengine: pl330: explicitly freeup irq ...
2016-07-28Merge tag 'hwlock-v4.8' of git://github.com/andersson/remoteprocLinus Torvalds2-1/+3
Pull hwspinlock updates from Bjorn Andersson: "Add missing of_node_put() in the Qualcomm driver and update MAINTAINERS to make sure all hwspinlock related files have a maintainer listed" * tag 'hwlock-v4.8' of git://github.com/andersson/remoteproc: MAINTAINERS: Update hwspinlock paths hwspinlock: qcom_hwspinlock: add missing of_node_put after calling of_parse_phandle
2016-07-28Merge tag 'rproc-v4.8' of git://github.com/andersson/remoteprocLinus Torvalds8-7/+1264
Pull remoteproc updates from Bjorn Andersson: "Introduce remoteproc driver for controlling the modem/DSP Hexagon CPU found in a multitude of Qualcomm platform. Also cleans up a race condition/potential leak during registration of remoteprocs and includes devicetree bindings in the MAINTAINERS entry" * tag 'rproc-v4.8' of git://github.com/andersson/remoteproc: remoteproc: qcom: hexagon: Clean up mpss validation remoteproc: qcom: remove redundant dev_err call in q6v5_init_mem() remoteproc: qcom: Driver for the self-authenticating Hexagon v5 dt-binding: remoteproc: Introduce Hexagon loader binding remoteproc: Fix potential race condition in rproc_add MAINTAINERS: Add file patterns for remoteproc device tree bindings
2016-07-28vfs: ioctl: prevent double-fetch in dedupe ioctlScott Bauer1-0/+1
This prevents a double-fetch from user space that can lead to to an undersized allocation and heap overflow. Fixes: 54dbc1517237 ("vfs: hoist the btrfs deduplication ioctl to the vfs") Signed-off-by: Scott Bauer <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-28Merge branch 'for-linus' of ↵Linus Torvalds15-569/+1250
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid Pull HID updates from Jiri Kosina: - new hid-alps driver for ALPS Touchpad-Stick device, from Masaki Ota - much improved and generalized HID led handling, and merge of specialized hid-thingm driver into this generic hid-led one, from Heiner Kallweit - i2c-hid power management improvements from Fu Zhonghui and Guohua Zhong - uhid initialization race fix from Roderick Colenbrander * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (21 commits) HID: add usb device id for Apple Magic Keyboard HID: hid-led: fix Delcom support on big endian systems HID: hid-led: add support for Greynut Luxafor HID: hid-led: add support for Delcom Visual Signal Indicator G2 HID: hid-led: remove report id from struct hidled_config HID: alps: a few cleanups HID: remove ThingM blink(1) driver HID: hid-led: add support for ThingM blink(1) HID: hid-led: add support for reading from LED devices HID: hid-led: add support for devices with multiple independent LEDs HID: i2c-hid: set power sleep before shutdown HID: alps: match alps devices in core HID: thingm: simplify debug output code HID: alps: pass correct sizes to hid_hw_raw_request() HID: alps: struct u1_dev *priv is internal to the driver HID: add Alps I2C HID Touchpad-Stick support HID: led: fix config usb: misc: remove outdated USB LED driver HID: migrate USB LED driver from usb misc to hid HID: i2c_hid: enable i2c-hid devices to suspend/resume asynchronously ...
2016-07-28Merge branch 'for-linus' of ↵Linus Torvalds6-9/+6
git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial Pull trivial tree updates from Jiri Kosina. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: fat: fix error message for bogus number of directory entries fat: fix typo s/supeblock/superblock/ ASoC: max9877: Remove unused function declaration dw2102: don't output spurious blank lines to the kernel log init: fix Kconfig text ARM: io: fix comment grammar ocfs: fix ocfs2_xattr_user_get() argument name scsi/qla2xxx: Remove erroneous unused macro qla82xx_get_temp_val1()
2016-07-28Merge branch 'for_linus' of ↵Linus Torvalds3-11/+11
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull quota update from Jan Kara: "time64 support for quota" * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: use time64_t internally
2016-07-28Merge tag 'random_for_linus_stable' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull random driver fix from Ted Ts'o: "Fix a boot failure on systems with non-contiguous NUMA id's" * tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: use for_each_online_node() to iterate over NUMA nodes
2016-07-28Merge branch 'work.misc' of ↵Linus Torvalds61-413/+224
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs updates from Al Viro: "Assorted cleanups and fixes. Probably the most interesting part long-term is ->d_init() - that will have a bunch of followups in (at least) ceph and lustre, but we'll need to sort the barrier-related rules before it can get used for really non-trivial stuff. Another fun thing is the merge of ->d_iput() callers (dentry_iput() and dentry_unlink_inode()) and a bunch of ->d_compare() ones (all except the one in __d_lookup_lru())" * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (26 commits) fs/dcache.c: avoid soft-lockup in dput() vfs: new d_init method vfs: Update lookup_dcache() comment bdev: get rid of ->bd_inodes Remove last traces of ->sync_page new helper: d_same_name() dentry_cmp(): use lockless_dereference() instead of smp_read_barrier_depends() vfs: clean up documentation vfs: document ->d_real() vfs: merge .d_select_inode() into .d_real() unify dentry_iput() and dentry_unlink_inode() binfmt_misc: ->s_root is not going anywhere drop redundant ->owner initializations ufs: get rid of redundant checks orangefs: constify inode_operations missed comment updates from ->direct_IO() prototype change file_inode(f)->i_mapping is f->f_mapping trim fsnotify hooks a bit 9p: new helper - v9fs_parent_fid() debugfs: ->d_parent is never NULL or negative ...
2016-07-28ARC: mm: don't loose PTE_SPECIAL in pte_modify()Vineet Gupta1-1/+1
LTP madvise05 was generating mm splat | [ARCLinux]# /sd/ltp/testcases/bin/madvise05 | BUG: Bad page map in process madvise05 pte:80e08211 pmd:9f7d4000 | page:9fdcfc90 count:1 mapcount:-1 mapping: (null) index:0x0 flags: 0x404(referenced|reserved) | page dumped because: bad pte | addr:200b8000 vm_flags:00000070 anon_vma: (null) mapping: (null) index:1005c | file: (null) fault: (null) mmap: (null) readpage: (null) | CPU: 2 PID: 6707 Comm: madvise05 And for newer kernels, the system was rendered unusable afterwards. The problem was mprotect->pte_modify() clearing PTE_SPECIAL (which is set to identify the special zero page wired to the pte). When pte was finally unmapped, special casing for zero page was not done, and instead it was treated as a "normal" page, tripping on the map counts etc. This fixes ARC STAR 9001053308 Cc: <[email protected]> Signed-off-by: Vineet Gupta <[email protected]>
2016-07-28Merge branch 'salted-string-hash'Linus Torvalds39-95/+107
This changes the vfs dentry hashing to mix in the parent pointer at the _beginning_ of the hash, rather than at the end. That actually improves both the hash and the code generation, because we can move more of the computation to the "static" part of the dcache setup, and do less at lookup runtime. It turns out that a lot of other hash users also really wanted to mix in a base pointer as a 'salt' for the hash, and so the slightly extended interface ends up working well for other cases too. Users that want a string hash that is purely about the string pass in a 'salt' pointer of NULL. * merge branch 'salted-string-hash': fs/dcache.c: Save one 32-bit multiply in dcache lookup vfs: make the string hashes salt the hash
2016-07-28pNFS: Actively set attributes as invalid if LAYOUTCOMMIT is outstandingBenjamin Coddington1-3/+5
A LAYOUTCOMMIT then subsequent GETATTR may both return the same attributes, and in that case NFS_INO_INVALID_ATTR is never set on the second pass through nfs_update_inode(). The existing check to skip the clearing of NFS_INO_INVALID_ATTR if a LAYOUTCOMMIT is outstanding does not help in this case (see commit 10b7e9ad4488: "pNFS: Don't mark the inode as revalidated if a LAYOUTCOMMIT is outstanding"). We know that if a LAYOUTCOMMIT is outstanding then attributes will need upating, so always set NFS_INO_INVALID_ATTR. Signed-off-by: Benjamin Coddington <[email protected]> Signed-off-by: Trond Myklebust <[email protected]>
2016-07-28timers/core: Correct callback order during CPU hot plugRichard Cochran2-6/+11
On the tear-down path, the dead CPU callback for the timers was misplaced within the 'cpuhp_state' enumeration. There is a hidden dependency between the timers and block multiqueue. The timers callback must happen before the block multiqueue callback otherwise a RCU stall occurs. Move the timers callback to the proper place in the state machine. Reported-and-tested-by: Jon Hunter <[email protected]> Reported-by: kbuild test robot <[email protected]> Fixes: 24f73b99716a ("timers/core: Convert to hotplug state machine") Signed-off-by: Richard Cochran <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Rasmus Villemoes <[email protected]> Cc: John Stultz <[email protected]> Cc: [email protected] Cc: Oleg Nesterov <[email protected]> Cc: Linus Torvalds <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2016-07-28Merge branch 'mymd/for-next' into mymd/for-linusShaohua Li7-213/+328
2016-07-28MD: fix null pointer deferenceShaohua Li1-2/+4
The md device might not have personality (for example, ddf raid array). The issue is introduced by 8430e7e0af9a15(md: disconnect device from personality before trying to remove it) Reported-by: kernel test robot <[email protected]> Signed-off-by: Shaohua Li <[email protected]>
2016-07-28mailbox: Fix format and type mismatches in Broadcom PDC driverRob Rice1-4/+4
Fix format and type mismatches in a couple debug prints in the Broadcom PDC driver. Use %pad for dma_addr_t and %pa for resource_size_t. Signed-off-by: Rob Rice <[email protected]> Reported-by: Fengguang Wu <[email protected]> Signed-off-by: Jassi Brar <[email protected]>
2016-07-28Merge branches 'cpuidle', 'fixes' and 'misc' into for-linusRussell King283-1685/+2851
2016-07-28drm/amdgpu: fix firmware info version checksAlex Deucher1-17/+8
Some of the checks didn't handle frev 2 tables properly. amdgpu doesn't support any tables pre-frev 2, so drop the checks. Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2016-07-28drm/radeon: fix firmware info version checksAlex Deucher1-2/+2
Some of the checks didn't handle frev 2 tables properly. Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2016-07-28ceph: fix symbol versioning for ceph_monc_do_statfsArnd Bergmann1-1/+2
The genksyms helper in the kernel cannot parse a type definition like "typeof(((type *)0)->keyfld)" that is used in the DEFINE_RB_FUNCS helper, causing the following EXPORT_SYMBOL() statement to be ignored when computing the crcs, and triggering a warning about this: WARNING: "ceph_monc_do_statfs" [fs/ceph/ceph.ko] has no CRC To work around the problem, we can rewrite the type to reference an undefined 'extern' symbol instead of a NULL pointer. This is evidently ok for genksyms, and it no longer complains about the line when calling it with 'genksyms -w'. I've looked briefly into extending genksyms instead, but it seems really hard to do. Jan Beulich introduced basic support for 'typeof' a while ago in dc53324060f3 ("genksyms: fix typeof() handling"), but that is not sufficient for the expression we have here. Signed-off-by: Arnd Bergmann <[email protected]> Fixes: fcd00b68bbe2 ("libceph: DEFINE_RB_FUNCS macro") Cc: Jan Beulich <[email protected]> Cc: Michal Marek <[email protected]> Signed-off-by: Ilya Dryomov <[email protected]>
2016-07-28drm/arm: mali-dp: Fix error return code in malidp_bind()Wei Yongjun1-1/+3
Fix to return error code -EINVAL from the error handling case instead of 0, as done elsewhere in this function. Fixes: 3c31760e760c ('drm/arm: mali-dp: Set crtc.port to the port instead of the endpoint') Signed-off-by: Wei Yongjun <[email protected]> Acked-by: Brian Starkey <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2016-07-28drm/arm: mali-dp: Remove redundant dev_err call in malidp_bind()Wei Yongjun1-3/+1
There is a error message within devm_ioremap_resource already, so remove the DRM_ERROR call to avoid redundant error message. Signed-off-by: Wei Yongjun <[email protected]> Acked-by: Liviu Dudau <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2016-07-28Merge branch 'for-4.8/hid-led' into for-linusJiri Kosina10-555/+554
Conflicts: drivers/hid/hid-thingm.c
2016-07-28Merge branches 'for-4.8/alps', 'for-4.8/apple', 'for-4.8/i2c-hid', ↵Jiri Kosina7080-133149/+355984
'for-4.8/uhid-offload-hid-device-add' and 'for-4.8/upstream' into for-linus
2016-07-28drm/gma500: remove unnecessary stub for fb_ioctl()Stefan Christ1-9/+0
Stub implementation of fb_ioctl can be omitted, because function do_fb_ioctl already returns -ENOTTY when fb_ioctl is not assigned. Signed-off-by: Stefan Christ <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/[email protected]
2016-07-28apple-gmux: Sphinxify docsLukas Wunner1-26/+29
Convert asciidoc-formatted docs to rst in accordance with Jonathan's and Jani's effort to use sphinx for kernel-doc rendering in 4.8. Cc: Jonathan Corbet <[email protected]> Cc: Jani Nikula <[email protected]> Signed-off-by: Lukas Wunner <[email protected]> Acked-by: Darren Hart <[email protected]> Signed-off-by: Daniel Vetter <[email protected]> Link: http://patchwork.freedesktop.org/patch/msgid/4c1b29986fa77772156b1af0c965d3799e43a47b.1467628307.git.lukas@wunner.de
2016-07-28KVM: PPC: Book3S HV: Save/restore TM state in H_CEDEPaul Mackerras1-0/+13
It turns out that if the guest does a H_CEDE while the CPU is in a transactional state, and the H_CEDE does a nap, and the nap loses the architected state of the CPU (which is is allowed to do), then we lose the checkpointed state of the virtual CPU. In addition, the transactional-memory state recorded in the MSR gets reset back to non-transactional, and when we try to return to the guest, we take a TM bad thing type of program interrupt because we are trying to transition from non-transactional to transactional with a hrfid instruction, which is not permitted. The result of the program interrupt occurring at that point is that the host CPU will hang in an infinite loop with interrupts disabled. Thus this is a denial of service vulnerability in the host which can be triggered by any guest (and depending on the guest kernel, it can potentially triggered by unprivileged userspace in the guest). This vulnerability has been assigned the ID CVE-2016-5412. To fix this, we save the TM state before napping and restore it on exit from the nap, when handling a H_CEDE in real mode. The case where H_CEDE exits to host virtual mode is already OK (as are other hcalls which exit to host virtual mode) because the exit path saves the TM state. Cc: [email protected] # v3.15+ Signed-off-by: Paul Mackerras <[email protected]>
2016-07-28KVM: PPC: Book3S HV: Pull out TM state save/restore into separate proceduresPaul Mackerras1-212/+237
This moves the transactional memory state save and restore sequences out of the guest entry/exit paths into separate procedures. This is so that these sequences can be used in going into and out of nap in a subsequent patch. The only code changes here are (a) saving and restore LR on the stack, since these new procedures get called with a bl instruction, (b) explicitly saving r1 into the PACA instead of assuming that HSTATE_HOST_R1(r13) is already set, and (c) removing an unnecessary and redundant setting of MSR[TM] that should have been removed by commit 9d4d0bdd9e0a ("KVM: PPC: Book3S HV: Add transactional memory support", 2013-09-24) but wasn't. Cc: [email protected] # v3.15+ Signed-off-by: Paul Mackerras <[email protected]>
2016-07-27sparc: serial: sunhv: fix a double lock bugDan Carpenter1-6/+0
We accidentally take the "port->lock" twice in a row. This old code was supposed to be deleted. Fixes: e58e241c1788 ('sparc: serial: Clean up the locking for -rt') Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-07-27sparc32: off by ones in BUG_ON()Dan Carpenter1-2/+2
Smatch complains that these tests are off by one, which is true but not life threatening. arch/sparc/kernel/irq_32.c:169 irq_link() error: buffer overflow 'irq_map' 384 <= 384 Signed-off-by: Dan Carpenter <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-07-28crypto: marvell - Update cache with input sg only when it is unmappedRomain Perier1-6/+6
So far, the cache of the ahash requests was updated from the 'complete' operation. This complete operation is called from mv_cesa_tdma_process before the cleanup operation, which means that the content of req->src can be read and copied when it is still mapped. This commit fixes the issue by moving this cache update from mv_cesa_ahash_complete to mv_cesa_ahash_req_cleanup, so the copy is done once the sglist is unmapped. Fixes: 1bf6682cb31d ("crypto: marvell - Add a complete operation for..") Signed-off-by: Romain Perier <[email protected]> Acked-by: Boris Brezillon <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2016-07-28crypto: marvell - Don't chain at DMA level when backlog is disabledRomain Perier1-3/+4
The flag CRYPTO_TFM_REQ_MAY_BACKLOG is optional and can be set from the user to put requests into the backlog queue when the main cryptographic queue is full. Before calling mv_cesa_tdma_chain we must check the value of the return status to be sure that the current request has been correctly queued or added to the backlog. Fixes: 85030c5168f1 ("crypto: marvell - Add support for chaining...") Signed-off-by: Romain Perier <[email protected]> Acked-by: Boris Brezillon <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2016-07-28crypto: marvell - Fix memory leaks in TDMA chain for cipher requestsRomain Perier1-8/+6
So far in mv_cesa_ablkcipher_dma_req_init, if an error is thrown while the tdma chain is built there is a memory leak. This issue exists because the chain is assigned later at the end of the function, so the cleanup function is called with the wrong version of the chain. Fixes: db509a45339f ("crypto: marvell/cesa - add TDMA support") Signed-off-by: Romain Perier <[email protected]> Acked-by: Boris Brezillon <[email protected]> Signed-off-by: Herbert Xu <[email protected]>
2016-07-28Merge branch 'topic/dmaengine_cleanups' into for-linusVinod Koul23-23/+222
2016-07-28mailbox: Add Broadcom PDC mailbox driverRob Rice4-0/+1598
The Broadcom PDC mailbox driver is a mailbox controller that manages data transfers to and from one or more offload engines. Signed-off-by: Rob Rice <[email protected]> Reviewed-by: Scott Branden <[email protected]> Reviewed-by: Ray Jui <[email protected]> Signed-off-by: Jassi Brar <[email protected]>
2016-07-28dt-bindings: add bindings documentation for PDC driver.Rob Rice1-0/+23
Add the device tree binding documentation for the PDC hardware in Broadcom iProc SoCs. Signed-off-by: Rob Rice <[email protected]> Acked-by: Rob Herring <[email protected]> Reviewed-by: Ray Jui <[email protected]> Reviewed-by: Anup Patel <[email protected]> Reviewed-by: Scott Branden <[email protected]> Signed-off-by: Jassi Brar <[email protected]>
2016-07-27CIFS: Fix a possible invalid memory access in smb2_query_symlink()Pavel Shilovsky1-1/+29
During following a symbolic link we received err_buf from SMB2_open(). While the validity of SMB2 error response is checked previously in smb2_check_message() a symbolic link payload is not checked at all. Fix it by adding such checks. Cc: Dan Carpenter <[email protected]> CC: Stable <[email protected]> Signed-off-by: Pavel Shilovsky <[email protected]> Signed-off-by: Steve French <[email protected]>
2016-07-27fs/cifs: make share unaccessible at root level mountableAurelien Aptel5-5/+104
if, when mounting //HOST/share/sub/dir/foo we can query /sub/dir/foo but not any of the path components above: - store the /sub/dir/foo prefix in the cifs super_block info - in the superblock, set root dentry to the subpath dentry (instead of the share root) - set a flag in the superblock to remember it - use prefixpath when building path from a dentry fixes bso#8950 Signed-off-by: Aurelien Aptel <[email protected]> CC: Stable <[email protected]> Reviewed-by: Pavel Shilovsky <[email protected]> Signed-off-by: Steve French <[email protected]>
2016-07-27random: use for_each_online_node() to iterate over NUMA nodesTheodore Ts'o1-2/+1
This fixes a crash on s390 with fake NUMA enabled. Reported-by: Heiko Carstens <[email protected]> Fixes: 1e7f583af67b ("random: make /dev/urandom scalable for silly userspace programs") Signed-off-by: Theodore Ts'o <[email protected]>
2016-07-27Add braces to avoid "ambiguous ‘else’" compiler warningsLinus Torvalds3-3/+6
Some of our "for_each_xyz()" macro constructs make gcc unhappy about lack of braces around if-statements inside or outside the loop, because the loop construct itself has a "if-then-else" statement inside of it. The resulting warnings look something like this: drivers/gpu/drm/i915/i915_debugfs.c: In function ‘i915_dump_lrc’: drivers/gpu/drm/i915/i915_debugfs.c:2103:6: warning: suggest explicit braces to avoid ambiguous ‘else’ [-Wparentheses] if (ctx != dev_priv->kernel_context) ^ even if the code itself is fine. Since the warning is fairly easy to avoid by adding a braces around the if-statement near the for_each_xyz() construct, do so, rather than disabling the otherwise potentially useful warning. (The if-then-else statements used in the "for_each_xyz()" constructs are designed to be inherently safe even with no braces, but in this case it's quite understandable that gcc isn't really able to tell that). This finally leaves the standard "allmodconfig" build with just a handful of remaining warnings, so new and valid warnings hopefully will stand out. Signed-off-by: Linus Torvalds <[email protected]>
2016-07-28powerpc/mm: Parenthesise IS_ENABLED() in if conditionStephen Rothwell1-1/+1
Currently IS_ENABLED() produces an expression surrounded by parentheses, which allows this code to compile, generating eg: else if (1 || 0) hpte_init_native(); However a change to the macro in the kbuild tree will break this in future by removing the parentheses. Fixes: 7353644fa9df ("powerpc/mm: Fix build break when PPC_NATIVE=n") Signed-off-by: Stephen Rothwell <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>