aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2016-07-24Merge tag 'staging-4.8-rc1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging Pull staging and IIO driver updates from Greg KH: "Here is the big Staging and IIO driver update for 4.8-rc1. We ended up adding more code than removing, again, but it's not all that bad. Lots of cleanups all over the staging tree, and new IIO drivers, full details in the shortlog. All of these have been in linux-next for a while with no reported issues" * tag 'staging-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (417 commits) drivers:iio:accel:mma8452: removed unwanted return statements drivers:iio:accel:mma8452: added cleanup provision in case of failure. iio: Add iio.git tree to MAINTAINERS iio:st_pressure: clean useless static channel initializers iio:st_pressure:lps22hb: temperature support iio:st_pressure:lps22hb: open drain support iio:st_pressure: temperature triggered buffering iio:st_pressure: document sampling gains iio:st_pressure: align storagebits on power of 2 iio:st_sensors: align on storagebits boundaries staging:iio:lis3l02dq drop separate driver iio: accel: st_accel: Add lis3l02dq support iio: adc: add missing of_node references to iio_dev iio: adc: ti-ads1015: add indio_dev->dev.of_node reference iio: potentiometer: Fix typo in Kconfig iio: potentiometer: mcp4531: Add device tree binding iio: potentiometer: mcp4531: Add device tree binding documentation iio: potentiometer: mcp4531: Add support for MCP454x, MCP456x, MCP464x and MCP466x iio:imu:mpu6050: icm20608 initial support iio: adc: max1363: Add device tree binding ...
2016-07-16Merge branch 'for-4.7-fixes' of ↵Linus Torvalds1-5/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: "The optimization for setting unbound worker affinity masks collided with recent scheduler changes triggering warning messages. This late pull request fixes the bug by removing the optimization" * 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Fix setting affinity of unbound worker threads
2016-07-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-1/+1
Merge misc fixes from Andrew Morton: "20 fixes" * emailed patches from Andrew Morton <[email protected]>: m32r: fix build warning about putc mm: workingset: printk missing log level, use pr_info() mm: thp: refix false positive BUG in page_move_anon_rmap() mm: rmap: call page_check_address() with sync enabled to avoid racy check mm: thp: move pmd check inside ptl for freeze_page() vmlinux.lds: account for destructor sections gcov: add support for gcc version >= 6 mm, meminit: ensure node is online before checking whether pages are uninitialised mm, meminit: always return a valid node from early_pfn_to_nid kasan/quarantine: fix bugs on qlist_move_cache() uapi: export lirc.h header madvise_free, thp: fix madvise_free_huge_pmd return value after splitting Revert "scripts/gdb: add documentation example for radix tree" Revert "scripts/gdb: add a Radix Tree Parser" scripts/gdb: Perform path expansion to lx-symbol's arguments scripts/gdb: add constants.py to .gitignore scripts/gdb: rebuild constants.py on dependancy change scripts/gdb: silence 'nothing to do' message kasan: add newline to messages mm, compaction: prevent VM_BUG_ON when terminating freeing scanner
2016-07-15Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds3-7/+9
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Ingo Molnar: "Fix a CPU hotplug related corruption of the load average that got introduced in this merge window" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Correct off by one bug in load migration calculation
2016-07-15gcov: add support for gcc version >= 6Florian Meier1-1/+1
Link: http://lkml.kernel.org/r/20160701130914.GA23225@styxhp Signed-off-by: Florian Meier <[email protected]> Reviewed-by: Peter Oberparleiter <[email protected]> Tested-by: Peter Oberparleiter <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-07-14Merge tag 'iio-for-4.8c' of ↵Greg Kroah-Hartman1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-next Jonathan writes: Third set of IIO new device support, features and cleanups for the 4.8 cycle. New core features - Selection of the clock source for IIO timestamps. This is done per device as it makes little sense to have events in one timebase and data timestamped on another. Biggest reason for this is that we currently use a clock source which is non monotonic which can result in 'interesting' data sets. (Includes export for get_monotonic_corse64 which Thomas Gleixner didn't mind in an earlier version.) - MAINTAINERS add the git tree to the list for IIO. New device support + a kind of indirect staging graduation. * Broadcom iproc-static-adc - new driver * mcp4531 - support for MCP454x, MCP456x, MCP464x and MCP466x potentiometers * mpu6050 - support the IC20608 6 axis motion tracking device * st-sensors - support the lis3l02dq + drop the lis3l02dq driver from staging. The general purpose driver is missing event support, but good to get rid of this driver which was rather long in the tooth. New driver features * ak8975 - Add vid regulator support and refactor handling in general. - Allow a delay after enabling regulators. - Runtime and system PM. * bmg160 - filter frequency control support. * bmp280 - SPI device support. - EOC interrupt support for the BMP085 - power management support. - supply regulator support. - reset gpio support - dt bindings for reset gpio and regulators. - of table to support device tree registration * max1363 - Device tree bindings. * mcp4531 - Device tree bindings. * st-pressure - temperature channels as part of triggered buffer (previously not due probably to alignment issues - see below). - lps22hb open drain interrupt support. - lps22hb temperature channel support Cleanups and reworkings. * numerous ADC drivers - ensure the iio_dev->dev.of_node is set to the parent dev.of_node so as to allow client bindings to find the device. * ak8975 - Fix incorrect handling of missing regulator - make sure power is down and remove. * bmp280 - read the calibration data only once as it doesn't change. * isl29125 - Use a few macros to make code a touch more readable. * mma8452 - fix a memory leak on error. - drop an unecessary bit of return value handling. * potentiometer kconfig - typo fix. * st-pressure - drop some uninformative default assignments of elements of the channel array structure (aids readability). * st-sensors - Harden interrupt handling considerably. These are actually all using level interrupts, but at least two known boards have them wired to edge only interrupt chips. Hence a slightly interesting bit of handling is needed in which we first allow for the easy option (level triggered) and secondly check the status registers before reenabling edge interrupts and fall back to a tight loop in the thread until we successfully clear the interrupt. No harm is done if we never succeed in doing so. It's an odd patch that has been through a lot of revisions to reach a consensus on how to handle what is basically broken hardware (which the previous defaults allowed to kind of work). - Fix alignment to defined storagebytes boundaries. - Ensure alignment of power of 2 byte boundaries. This has always in theory been part of the ABI of IIO, but we missed a few that snuck in that need fixing. The effect was minor as they were only followed by timestamp channels which were correctly aligned, - Add some docs to explain the gain calculations.
2016-07-14Merge branches 'perf-urgent-for-linus' and 'timers-urgent-for-linus' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf and timer fixes from Ingo Molnar: "A fix for a posix CPU timers bug, and a perf printk message fix" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86: Fix bogus kernel printk, again * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: posix_cpu_timer: Exit early when process has been reaped
2016-07-13sched/core: Correct off by one bug in load migration calculationThomas Gleixner3-7/+9
The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into account that the function is now called from a thread running on the outgoing CPU. As a result a cpu unplug leakes a load of 1 into the global load accounting mechanism. Fix it by adjusting for the currently running thread which calls calc_load_migrate(). Reported-by: Anton Blanchard <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Peter Zijlstra <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Vaidyanathan Srinivasan <[email protected]> Cc: [email protected] Cc: [email protected] Fixes: e9cd8fa4fcfd: ("sched/migration: Move calc_load_migrate() into CPU_DYING") Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607121744350.4083@nanos Signed-off-by: Ingo Molnar <[email protected]>
2016-07-13cpu/hotplug: Keep enough storage space if SMP=n to avoid array out of bounds ↵Thomas Gleixner1-0/+2
scribble Xiaolong Ye reported lock debug warnings triggered by the following commit: 8de4a0066106 ("perf/x86: Convert the core to the hotplug state machine") The bug is the following: the cpuhp_bp_states[] array is cut short when CONFIG_SMP=n, but the dynamically registered callbacks are stored nevertheless and happily scribble outside of the array bounds... We need to store them in case that the state is unregistered so we can invoke the teardown function. That's independent of CONFIG_SMP. Make sure the array is large enough. Reported-by: kernel test robot <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: Adam Borowski <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Anna-Maria Gleixner <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Sebastian Andrzej Siewior <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: cff7d378d3fd "cpu/hotplug: Convert to a state machine for the control processor" Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1607122144560.4083@nanos Signed-off-by: Ingo Molnar <[email protected]>
2016-07-11posix_cpu_timer: Exit early when process has been reapedAlexey Dobriyan1-0/+1
Variable "now" seems to be genuinely used unintialized if branch if (CPUCLOCK_PERTHREAD(timer->it_clock)) { is not taken and branch if (unlikely(sighand == NULL)) { is taken. In this case the process has been reaped and the timer is marked as disarmed anyway. So none of the postprocessing of the sample is required. Return right away. Signed-off-by: Alexey Dobriyan <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-07-08Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds1-22/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two load-balancing fixes for cgroups-intense workloads" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/fair: Fix calc_cfs_shares() fixed point arithmetics width confusion sched/fair: Fix effective_load() to consistently use smoothed load
2016-07-07perf/core: Fix pmu::filter_match for SW-led groupsMark Rutland1-1/+22
The following commit: 66eb579e66ec ("perf: allow for PMU-specific event filtering") added the pmu::filter_match() callback. This was intended to avoid HW constraints on events from resulting in extremely pessimistic scheduling. However, pmu::filter_match() is only called for the leader of each event group. When the leader is a SW event, we do not filter the groups, and may fail at pmu::add() time, and when this happens we'll give up on scheduling any event groups later in the list until they are rotated ahead of the failing group. This can result in extremely sub-optimal event scheduling behaviour, e.g. if running the following on a big.LITTLE platform: $ taskset -c 0 ./perf stat \ -e 'a57{context-switches,armv8_cortex_a57/config=0x11/}' \ -e 'a53{context-switches,armv8_cortex_a53/config=0x11/}' \ ls <not counted> context-switches (0.00%) <not counted> armv8_cortex_a57/config=0x11/ (0.00%) 24 context-switches (37.36%) 57589154 armv8_cortex_a53/config=0x11/ (37.36%) Here the 'a53' event group was always eligible to be scheduled, but the 'a57' group never eligible to be scheduled, as the task was always affine to a Cortex-A53 CPU. The SW (group leader) event in the 'a57' group was eligible, but the HW event failed at pmu::add() time, resulting in ctx_flexible_sched_in giving up on scheduling further groups with HW events. One way of avoiding this is to check pmu::filter_match() on siblings as well as the group leader. If any of these fail their pmu::filter_match() call, we must skip the entire group before attempting to add any events. Signed-off-by: Mark Rutland <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Will Deacon <[email protected]> Fixes: 66eb579e66ec ("perf: allow for PMU-specific event filtering") Link: http://lkml.kernel.org/r/[email protected] [ Small readability edits. ] Signed-off-by: Ingo Molnar <[email protected]>
2016-06-30timekeeping: export get_monotonic_coarse64 symbolGregor Boirie1-0/+1
EXPORT_SYMBOL() get_monotonic_coarse64 for new IIO timestamping clock selection usage. This provides user apps the ability to request a particular IIO device to timestamp samples using a monotonic coarse clock granularity. Signed-off-by: Gregor Boirie <[email protected]> Signed-off-by: Jonathan Cameron <[email protected]>
2016-06-29Merge branch 'stable-4.7' of git://git.infradead.org/users/pcmoore/auditLinus Torvalds3-4/+25
Pull audit fixes from Paul Moore: "Two small patches to fix audit problems in 4.7-rcX: the first fixes a potential kref leak, the second removes some header file noise. The first is an important bug fix that really should go in before 4.7 is released, the second is not critical, but falls into the very-nice- to-have category so I'm including in the pull request. Both patches are straightforward, self-contained, and pass our testsuite without problem" * 'stable-4.7' of git://git.infradead.org/users/pcmoore/audit: audit: move audit_get_tty to reduce scope and kabi changes audit: move calcs after alloc and check when logging set loginuid
2016-06-29Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds3-37/+16
Pull networking fixes from David Miller: "I've been traveling so this accumulates more than week or so of bug fixing. It perhaps looks a little worse than it really is. 1) Fix deadlock in ath10k driver, from Ben Greear. 2) Increase scan timeout in iwlwifi, from Luca Coelho. 3) Unbreak STP by properly reinjecting STP packets back into the stack. Regression fix from Ido Schimmel. 4) Mediatek driver fixes (missing malloc failure checks, leaking of scratch memory, wrong indexing when mapping TX buffers, etc.) from John Crispin. 5) Fix endianness bug in icmpv6_err() handler, from Hannes Frederic Sowa. 6) Fix hashing of flows in UDP in the ruseport case, from Xuemin Su. 7) Fix netlink notifications in ovs for tunnels, delete link messages are never emitted because of how the device registry state is handled. From Nicolas Dichtel. 8) Conntrack module leaks kmemcache on unload, from Florian Westphal. 9) Prevent endless jump loops in nft rules, from Liping Zhang and Pablo Neira Ayuso. 10) Not early enough spinlock initialization in mlx4, from Eric Dumazet. 11) Bind refcount leak in act_ipt, from Cong WANG. 12) Missing RCU locking in HTB scheduler, from Florian Westphal. 13) Several small MACSEC bug fixes from Sabrina Dubroca (missing RCU barrier, using heap for SG and IV, and erroneous use of async flag when allocating AEAD conext.) 14) RCU handling fix in TIPC, from Ying Xue. 15) Pass correct protocol down into ipv4_{update_pmtu,redirect}() in SIT driver, from Simon Horman. 16) Socket timer deadlock fix in TIPC from Jon Paul Maloy. 17) Fix potential deadlock in team enslave, from Ido Schimmel. 18) Memory leak in KCM procfs handling, from Jiri Slaby. 19) ESN generation fix in ipv4 ESP, from Herbert Xu. 20) Fix GFP_KERNEL allocations with locks held in act_ife, from Cong WANG. 21) Use after free in netem, from Eric Dumazet. 22) Uninitialized last assert time in multicast router code, from Tom Goff. 23) Skip raw sockets in sock_diag destruction broadcast, from Willem de Bruijn. 24) Fix link status reporting in thunderx, from Sunil Goutham. 25) Limit resegmentation of retransmit queue so that we do not retransmit too large GSO frames. From Eric Dumazet. 26) Delay bpf program release after grace period, from Daniel Borkmann" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (141 commits) openvswitch: fix conntrack netlink event delivery qed: Protect the doorbell BAR with the write barriers. neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit() e1000e: keep VLAN interfaces functional after rxvlan off cfg80211: fix proto in ieee80211_data_to_8023 for frames without LLC header qlcnic: use the correct ring in qlcnic_83xx_process_rcv_ring_diag() bpf, perf: delay release of BPF prog after grace period net: bridge: fix vlan stats continue counter tcp: do not send too big packets at retransmit time ibmvnic: fix to use list_for_each_safe() when delete items net: thunderx: Fix TL4 configuration for secondary Qsets net: thunderx: Fix link status reporting net/mlx5e: Reorganize ethtool statistics net/mlx5e: Fix number of PFC counters reported to ethtool net/mlx5e: Prevent adding the same vxlan port net/mlx5e: Check for BlueFlame capability before allocating SQ uar net/mlx5e: Change enum to better reflect usage net/mlx5: Add ConnectX-5 PCIe 4.0 to list of supported devices net/mlx5: Update command strings net: marvell: Add separate config ANEG function for Marvell 88E1111 ...
2016-06-29Merge branch 'for-4.7-fixes' of ↵Linus Torvalds1-72/+76
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Three fix patches. Two are for cgroup / css init failure path. The last one makes css_set_lock irq-safe as the deadline scheduler ends up calling put_css_set() from irq context" * 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: Disable IRQs while holding css_set_lock cgroup: set css->id to -1 during init cgroup: remove redundant cleanup in css_create
2016-06-29bpf, perf: delay release of BPF prog after grace periodDaniel Borkmann1-1/+1
Commit dead9f29ddcc ("perf: Fix race in BPF program unregister") moved destruction of BPF program from free_event_rcu() callback to __free_event(), which is problematic if used with tail calls: if prog A is attached as trace event directly, but at the same time present in a tail call map used by another trace event program elsewhere, then we need to delay destruction via RCU grace period since it can still be in use by the program doing the tail call (the prog first needs to be dropped from the tail call map, then trace event with prog A attached destroyed, so we get immediate destruction). Fixes: dead9f29ddcc ("perf: Fix race in BPF program unregister") Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Alexei Starovoitov <[email protected]> Cc: Jann Horn <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-06-28audit: move audit_get_tty to reduce scope and kabi changesRichard Guy Briggs3-1/+21
The only users of audit_get_tty and audit_put_tty are internal to audit, so move it out of include/linux/audit.h to kernel.h and create a proper function rather than inlining it. This also reduces kABI changes. Suggested-by: Paul Moore <[email protected]> Signed-off-by: Richard Guy Briggs <[email protected]> [PM: line wrapped description] Signed-off-by: Paul Moore <[email protected]>
2016-06-28audit: move calcs after alloc and check when logging set loginuidRichard Guy Briggs1-3/+4
Move the calculations of values after the allocation in case the allocation fails. This avoids wasting effort in the rare case that it fails, but more importantly saves us extra logic to release the tty ref. Signed-off-by: Richard Guy Briggs <[email protected]> Signed-off-by: Paul Moore <[email protected]>
2016-06-27sched/fair: Fix calc_cfs_shares() fixed point arithmetics width confusionPeter Zijlstra1-16/+11
Commit: fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities") did something non-obvious but also did it buggy yet latent. The problem was exposed for real by a later commit in the v4.7 merge window: 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels") ... after which tg->load_avg and cfs_rq->load.weight had different units (10 bit fixed point and 20 bit fixed point resp.). Add a comment to explain the use of cfs_rq->load.weight over the 'natural' cfs_rq->avg.load_avg and add scale_load_down() to correct for the difference in unit. Since this is (now, as per a previous commit) the only user of calc_tg_weight(), collapse it. The effects of this bug should be randomly inconsistent SMP-balancing of cgroups workloads. Reported-by: Jirka Hladky <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 2159197d6677 ("sched/core: Enable increased load resolution on 64-bit kernels") Fixes: fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities") Signed-off-by: Ingo Molnar <[email protected]>
2016-06-27sched/fair: Fix effective_load() to consistently use smoothed loadPeter Zijlstra1-6/+9
Starting with the following commit: fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities") calc_tg_weight() doesn't compute the right value as expected by effective_load(). The difference is in the 'correction' term. In order to ensure \Sum rw_j >= rw_i we cannot use tg->load_avg directly, since that might be lagging a correction on the current cfs_rq->avg.load_avg value. Therefore we use tg->load_avg - cfs_rq->tg_load_avg_contrib + cfs_rq->avg.load_avg. Now, per the referenced commit, calc_tg_weight() doesn't use cfs_rq->avg.load_avg, as is later used in @w, but uses cfs_rq->load.weight instead. So stop using calc_tg_weight() and do it explicitly. The effects of this bug are wake_affine() making randomly poor choices in cgroup-intense workloads. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> # v4.3+ Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: fde7d22e01aa ("sched/fair: Fix overly small weight for interactive group entities") Signed-off-by: Ingo Molnar <[email protected]>
2016-06-25Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds3-21/+66
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Thomas Gleixner: "A couple of scheduler fixes: - force watchdog reset while processing sysrq-w - fix a deadlock when enabling trace events in the scheduler - fixes to the throttled next buddy logic - fixes for the average accounting (missing serialization and underflow handling) - allow kernel threads for fallback to online but not active cpus" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/core: Allow kthreads to fall back to online && !active cpus sched/fair: Do not announce throttled next buddy in dequeue_task_fair() sched/fair: Initialize throttle_count for new task-groups lazily sched/fair: Fix cfs_rq avg tracking underflow kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while processing sysrq-w sched/debug: Fix deadlock when enabling sched events sched/fair: Fix post_init_entity_util_avg() serialization
2016-06-25Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds1-3/+33
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fix from Thomas Gleixner: "A single fix to address a race in the static key logic" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/static_key: Fix concurrent static_key_slow_inc()
2016-06-25Fix build break in fork.c when THREAD_SIZE < PAGE_SIZEMichael Ellerman1-2/+2
Commit b235beea9e99 ("Clarify naming of thread info/stack allocators") breaks the build on some powerpc configs, where THREAD_SIZE < PAGE_SIZE: kernel/fork.c:235:2: error: implicit declaration of function 'free_thread_stack' kernel/fork.c:355:8: error: assignment from incompatible pointer type stack = alloc_thread_stack_node(tsk, node); ^ Fix it by renaming free_stack() to free_thread_stack(), and updating the return type of alloc_thread_stack_node(). Fixes: b235beea9e99 ("Clarify naming of thread info/stack allocators") Signed-off-by: Michael Ellerman <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds1-0/+12
Merge misc fixes from Andrew Morton: "Two weeks worth of fixes here" * emailed patches from Andrew Morton <[email protected]>: (41 commits) init/main.c: fix initcall_blacklisted on ia64, ppc64 and parisc64 autofs: don't get stuck in a loop if vfs_write() returns an error mm/page_owner: avoid null pointer dereference tools/vm/slabinfo: fix spelling mistake: "Ocurrences" -> "Occurrences" fs/nilfs2: fix potential underflow in call to crc32_le oom, suspend: fix oom_reaper vs. oom_killer_disable race ocfs2: disable BUG assertions in reading blocks mm, compaction: abort free scanner if split fails mm: prevent KASAN false positives in kmemleak mm/hugetlb: clear compound_mapcount when freeing gigantic pages mm/swap.c: flush lru pvecs on compound page arrival memcg: css_alloc should return an ERR_PTR value on error memcg: mem_cgroup_migrate() may be called with irq disabled hugetlb: fix nr_pmds accounting with shared page tables Revert "mm: disable fault around on emulated access bit architecture" Revert "mm: make faultaround produce old ptes" mailmap: add Boris Brezillon's email mailmap: add Antoine Tenart's email mm, sl[au]b: add __GFP_ATOMIC to the GFP reclaim mask mm: mempool: kasan: don't poot mempool objects in quarantine ...
2016-06-24oom, suspend: fix oom_reaper vs. oom_killer_disable raceMichal Hocko1-0/+12
Tetsuo has reported the following potential oom_killer_disable vs. oom_reaper race: (1) freeze_processes() starts freezing user space threads. (2) Somebody (maybe a kenrel thread) calls out_of_memory(). (3) The OOM killer calls mark_oom_victim() on a user space thread P1 which is already in __refrigerator(). (4) oom_killer_disable() sets oom_killer_disabled = true. (5) P1 leaves __refrigerator() and enters do_exit(). (6) The OOM reaper calls exit_oom_victim(P1) before P1 can call exit_oom_victim(P1). (7) oom_killer_disable() returns while P1 not yet finished (8) P1 perform IO/interfere with the freezer. This situation is unfortunate. We cannot move oom_killer_disable after all the freezable kernel threads are frozen because the oom victim might depend on some of those kthreads to make a forward progress to exit so we could deadlock. It is also far from trivial to teach the oom_reaper to not call exit_oom_victim() because then we would lose a guarantee of the OOM killer and oom_killer_disable forward progress because exit_mm->mmput might block and never call exit_oom_victim. It seems the easiest way forward is to workaround this race by calling try_to_freeze_tasks again after oom_killer_disable. This will make sure that all the tasks are frozen or it bails out. Fixes: 449d777d7ad6 ("mm, oom_reaper: clear TIF_MEMDIE for all tasks queued for oom_reaper") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Michal Hocko <[email protected]> Reported-by: Tetsuo Handa <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24Clarify naming of thread info/stack allocatorsLinus Torvalds1-25/+25
We've had the thread info allocated together with the thread stack for most architectures for a long time (since the thread_info was split off from the task struct), but that is about to change. But the patches that move the thread info to be off-stack (and a part of the task struct instead) made it clear how confused the allocator and freeing functions are. Because the common case was that we share an allocation with the thread stack and the thread_info, the two pointers were identical. That identity then meant that we would have things like ti = alloc_thread_info_node(tsk, node); ... tsk->stack = ti; which certainly _worked_ (since stack and thread_info have the same value), but is rather confusing: why are we assigning a thread_info to the stack? And if we move the thread_info away, the "confusing" code just gets to be entirely bogus. So remove all this confusion, and make it clear that we are doing the stack allocation by renaming and clarifying the function names to be about the stack. The fact that the thread_info then shares the allocation is an implementation detail, and not really about the allocation itself. This is a pure renaming and type fix: we pass in the same pointer, it's just that we clarify what the pointer means. The ia64 code that actually only has one single allocation (for all of task_struct, thread_info and kernel thread stack) now looks a bit odd, but since "tsk->stack" is actually not even used there, that oddity doesn't matter. It would be a separate thing to clean that up, I intentionally left the ia64 changes as a pure brute-force renaming and type change. Acked-by: Andy Lutomirski <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-24sched/core: Allow kthreads to fall back to online && !active cpusTejun Heo1-1/+3
During CPU hotplug, CPU_ONLINE callbacks are run while the CPU is online but not active. A CPU_ONLINE callback may create or bind a kthread so that its cpus_allowed mask only allows the CPU which is being brought online. The kthread may start executing before the CPU is made active and can end up in select_fallback_rq(). In such cases, the expected behavior is selecting the CPU which is coming online; however, because select_fallback_rq() only chooses from active CPUs, it determines that the task doesn't have any viable CPU in its allowed mask and ends up overriding it to cpu_possible_mask. CPU_ONLINE callbacks should be able to put kthreads on the CPU which is coming online. Update select_fallback_rq() so that it follows cpu_online() rather than cpu_active() for kthreads. Reported-by: Gautham R Shenoy <[email protected]> Tested-by: Gautham R. Shenoy <[email protected]> Signed-off-by: Tejun Heo <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Abdul Haleem <[email protected]> Cc: Aneesh Kumar <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-06-24sched/fair: Do not announce throttled next buddy in dequeue_task_fair()Konstantin Khlebnikov1-5/+4
Hierarchy could be already throttled at this point. Throttled next buddy could trigger a NULL pointer dereference in pick_next_task_fair(). Signed-off-by: Konstantin Khlebnikov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Ben Segall <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lkml.kernel.org/r/146608183552.21905.15924473394414832071.stgit@buzz Signed-off-by: Ingo Molnar <[email protected]>
2016-06-24sched/fair: Initialize throttle_count for new task-groups lazilyKonstantin Khlebnikov2-1/+21
Cgroup created inside throttled group must inherit current throttle_count. Broken throttle_count allows to nominate throttled entries as a next buddy, later this leads to null pointer dereference in pick_next_task_fair(). This patch initialize cfs_rq->throttle_count at first enqueue: laziness allows to skip locking all rq at group creation. Lazy approach also allows to skip full sub-tree scan at throttling hierarchy (not in this patch). Signed-off-by: Konstantin Khlebnikov <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/146608182119.21870.8439834428248129633.stgit@buzz Signed-off-by: Ingo Molnar <[email protected]>
2016-06-24locking/static_key: Fix concurrent static_key_slow_inc()Paolo Bonzini1-3/+33
The following scenario is possible: CPU 1 CPU 2 static_key_slow_inc() atomic_inc_not_zero() -> key.enabled == 0, no increment jump_label_lock() atomic_inc_return() -> key.enabled == 1 now static_key_slow_inc() atomic_inc_not_zero() -> key.enabled == 1, inc to 2 return ** static key is wrong! jump_label_update() jump_label_unlock() Testing the static key at the point marked by (**) will follow the wrong path for jumps that have not been patched yet. This can actually happen when creating many KVM virtual machines with userspace LAPIC emulation; just run several copies of the following program: #include <fcntl.h> #include <unistd.h> #include <sys/ioctl.h> #include <linux/kvm.h> int main(void) { for (;;) { int kvmfd = open("/dev/kvm", O_RDONLY); int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0); close(ioctl(vmfd, KVM_CREATE_VCPU, 1)); close(vmfd); close(kvmfd); } return 0; } Every KVM_CREATE_VCPU ioctl will attempt a static_key_slow_inc() call. The static key's purpose is to skip NULL pointer checks and indeed one of the processes eventually dereferences NULL. As explained in the commit that introduced the bug: 706249c222f6 ("locking/static_keys: Rework update logic") jump_label_update() needs key.enabled to be true. The solution adopted here is to temporarily make key.enabled == -1, and use go down the slow path when key.enabled <= 0. Reported-by: Dmitry Vyukov <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: <[email protected]> # v4.3+ Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Fixes: 706249c222f6 ("locking/static_keys: Rework update logic") Link: http://lkml.kernel.org/r/[email protected] [ Small stylistic edits to the changelog and the code. ] Signed-off-by: Ingo Molnar <[email protected]>
2016-06-23cgroup: Disable IRQs while holding css_set_lockDaniel Bristot de Oliveira1-68/+74
While testing the deadline scheduler + cgroup setup I hit this warning. [ 132.612935] ------------[ cut here ]------------ [ 132.612951] WARNING: CPU: 5 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0x6b/0x80 [ 132.612952] Modules linked in: (a ton of modules...) [ 132.612981] CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.7.0-rc2 #2 [ 132.612981] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.2-20150714_191134- 04/01/2014 [ 132.612982] 0000000000000086 45c8bb5effdd088b ffff88013fd43da0 ffffffff813d229e [ 132.612984] 0000000000000000 0000000000000000 ffff88013fd43de0 ffffffff810a652b [ 132.612985] 00000096811387b5 0000000000000200 ffff8800bab29d80 ffff880034c54c00 [ 132.612986] Call Trace: [ 132.612987] <IRQ> [<ffffffff813d229e>] dump_stack+0x63/0x85 [ 132.612994] [<ffffffff810a652b>] __warn+0xcb/0xf0 [ 132.612997] [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170 [ 132.612999] [<ffffffff810a665d>] warn_slowpath_null+0x1d/0x20 [ 132.613000] [<ffffffff810aba5b>] __local_bh_enable_ip+0x6b/0x80 [ 132.613008] [<ffffffff817d6c8a>] _raw_write_unlock_bh+0x1a/0x20 [ 132.613010] [<ffffffff817d6c9e>] _raw_spin_unlock_bh+0xe/0x10 [ 132.613015] [<ffffffff811388ac>] put_css_set+0x5c/0x60 [ 132.613016] [<ffffffff8113dc7f>] cgroup_free+0x7f/0xa0 [ 132.613017] [<ffffffff810a3912>] __put_task_struct+0x42/0x140 [ 132.613018] [<ffffffff810e776a>] dl_task_timer+0xca/0x250 [ 132.613027] [<ffffffff810e76a0>] ? push_dl_task.part.32+0x170/0x170 [ 132.613030] [<ffffffff8111371e>] __hrtimer_run_queues+0xee/0x270 [ 132.613031] [<ffffffff81113ec8>] hrtimer_interrupt+0xa8/0x190 [ 132.613034] [<ffffffff81051a58>] local_apic_timer_interrupt+0x38/0x60 [ 132.613035] [<ffffffff817d9b0d>] smp_apic_timer_interrupt+0x3d/0x50 [ 132.613037] [<ffffffff817d7c5c>] apic_timer_interrupt+0x8c/0xa0 [ 132.613038] <EOI> [<ffffffff81063466>] ? native_safe_halt+0x6/0x10 [ 132.613043] [<ffffffff81037a4e>] default_idle+0x1e/0xd0 [ 132.613044] [<ffffffff810381cf>] arch_cpu_idle+0xf/0x20 [ 132.613046] [<ffffffff810e8fda>] default_idle_call+0x2a/0x40 [ 132.613047] [<ffffffff810e92d7>] cpu_startup_entry+0x2e7/0x340 [ 132.613048] [<ffffffff81050235>] start_secondary+0x155/0x190 [ 132.613049] ---[ end trace f91934d162ce9977 ]--- The warn is the spin_(lock|unlock)_bh(&css_set_lock) in the interrupt context. Converting the spin_lock_bh to spin_lock_irq(save) to avoid this problem - and other problems of sharing a spinlock with an interrupt. Cc: Tejun Heo <[email protected]> Cc: Li Zefan <[email protected]> Cc: Johannes Weiner <[email protected]> Cc: Juri Lelli <[email protected]> Cc: Steven Rostedt <[email protected]> Cc: [email protected] Cc: [email protected] # 4.5+ Cc: [email protected] Reviewed-by: Rik van Riel <[email protected]> Reviewed-by: "Luis Claudio R. Goncalves" <[email protected]> Signed-off-by: Daniel Bristot de Oliveira <[email protected]> Acked-by: Zefan Li <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2016-06-23locking: avoid passing around 'thread_info' in mutex debugging codeLinus Torvalds4-12/+12
None of the code actually wants a thread_info, it all wants a task_struct, and it's just converting back and forth between the two ("ti->task" to get the task_struct from the thread_info, and "task_thread_info(task)" to go the other way). No semantic change. Acked-by: Peter Zijlstra <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-20Merge tag 'trace-v4.7-rc3' of ↵Linus Torvalds1-1/+6
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: "Two fixes for the tracing system: - When trace_printk() is used with a non constant format descriptor, it adds a NULL pointer into the trace format section, and the code isn't prepared to deal with it. This bug appeared by a change that was added in v3.5. - The ftracetest (selftests section) can't handle testing histograms when histograms are not configured. Currently it shows that they fail the test, when they should state that they are unsupported. This bug was added in the 4.7 merge window with the addition of the historgram code" * tag 'trace-v4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: ftracetest: Fix hist unsupported result in hist selftests tracing: Handle NULL formats in hold_module_trace_bprintk_format()
2016-06-20tracing: Handle NULL formats in hold_module_trace_bprintk_format()Steven Rostedt (Red Hat)1-1/+6
If a task uses a non constant string for the format parameter in trace_printk(), then the trace_printk_fmt variable is set to NULL. This variable is then saved in the __trace_printk_fmt section. The function hold_module_trace_bprintk_format() checks to see if duplicate formats are used by modules, and reuses them if so (saves them to the list if it is new). But this function calls lookup_format() that does a strcmp() to the value (which is now NULL) and can cause a kernel oops. This wasn't an issue till 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()") which added "__used" to the trace_printk_fmt variable, and before that, the kernel simply optimized it out (no NULL value was saved). The fix is simply to handle the NULL pointer in lookup_format() and have the caller ignore the value if it was NULL. Link: http://lkml.kernel.org/r/[email protected] Reported-by: xingzhen <[email protected]> Acked-by: Namhyung Kim <[email protected]> Fixes: 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()") Cc: [email protected] # v3.5+ Signed-off-by: Steven Rostedt <[email protected]>
2016-06-20sched/fair: Fix cfs_rq avg tracking underflowPeter Zijlstra1-8/+25
As per commit: b7fa30c9cc48 ("sched/fair: Fix post_init_entity_util_avg() serialization") > the code generated from update_cfs_rq_load_avg(): > > if (atomic_long_read(&cfs_rq->removed_load_avg)) { > s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); > sa->load_avg = max_t(long, sa->load_avg - r, 0); > sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); > removed_load = 1; > } > > turns into: > > ffffffff81087064: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax > ffffffff8108706b: 48 85 c0 test %rax,%rax > ffffffff8108706e: 74 40 je ffffffff810870b0 <update_blocked_averages+0xc0> > ffffffff81087070: 4c 89 f8 mov %r15,%rax > ffffffff81087073: 49 87 85 98 00 00 00 xchg %rax,0x98(%r13) > ffffffff8108707a: 49 29 45 70 sub %rax,0x70(%r13) > ffffffff8108707e: 4c 89 f9 mov %r15,%rcx > ffffffff81087081: bb 01 00 00 00 mov $0x1,%ebx > ffffffff81087086: 49 83 7d 70 00 cmpq $0x0,0x70(%r13) > ffffffff8108708b: 49 0f 49 4d 70 cmovns 0x70(%r13),%rcx > > Which you'll note ends up with sa->load_avg -= r in memory at > ffffffff8108707a. So I _should_ have looked at other unserialized users of ->load_avg, but alas. Luckily nikbor reported a similar /0 from task_h_load() which instantly triggered recollection of this here problem. Aside from the intermediate value hitting memory and causing problems, there's another problem: the underflow detection relies on the signed bit. This reduces the effective width of the variables, IOW its effectively the same as having these variables be of signed type. This patch changes to a different means of unsigned underflow detection to not rely on the signed bit. This allows the variables to use the 'full' unsigned range. And it does so with explicit LOAD - STORE to ensure any intermediate value will never be visible in memory, allowing these unserialized loads. Note: GCC generates crap code for this, might warrant a look later. Note2: I say 'full' above, if we end up at U*_MAX we'll still explode; maybe we should do clamping on add too. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Yuyang Du <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: 9d89c257dfb9 ("sched/fair: Rewrite runnable load and utilization average tracking") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-06-16cgroup: set css->id to -1 during initTejun Heo1-0/+1
If percpu_ref initialization fails during css_create(), the free path can end up trying to free css->id of zero. As ID 0 is unused, it doesn't cause a critical breakage but it does trigger a warning message. Fix it by setting css->id to -1 from init_and_link_css(). Signed-off-by: Tejun Heo <[email protected]> Cc: Wenwei Tao <[email protected]> Fixes: 01e586598b22 ("cgroup: release css->id after css_free") Cc: [email protected] # v4.0+ Signed-off-by: Tejun Heo <[email protected]>
2016-06-16workqueue: Fix setting affinity of unbound worker threadsPeter Zijlstra1-5/+1
With commit e9d867a67fd03ccc ("sched: Allow per-cpu kernel threads to run on online && !active"), __set_cpus_allowed_ptr() expects that only strict per-cpu kernel threads can have affinity to an online CPU which is not yet active. This assumption is currently broken in the CPU_ONLINE notification handler for the workqueues where restore_unbound_workers_cpumask() calls set_cpus_allowed_ptr() when the first cpu in the unbound worker's pool->attr->cpumask comes online. Since set_cpus_allowed_ptr() is called with pool->attr->cpumask in which only one CPU is online which is not yet active, we get the following WARN_ON during an CPU online operation. ------------[ cut here ]------------ WARNING: CPU: 40 PID: 248 at kernel/sched/core.c:1166 __set_cpus_allowed_ptr+0x228/0x2e0 Modules linked in: CPU: 40 PID: 248 Comm: cpuhp/40 Not tainted 4.6.0-autotest+ #4 <..snip..> Call Trace: [c000000f273ff920] [c00000000010493c] __set_cpus_allowed_ptr+0x2cc/0x2e0 (unreliable) [c000000f273ffac0] [c0000000000ed4b0] workqueue_cpu_up_callback+0x2c0/0x470 [c000000f273ffb70] [c0000000000f5c58] notifier_call_chain+0x98/0x100 [c000000f273ffbc0] [c0000000000c5ed0] __cpu_notify+0x70/0xe0 [c000000f273ffc00] [c0000000000c6028] notify_online+0x38/0x50 [c000000f273ffc30] [c0000000000c5214] cpuhp_invoke_callback+0x84/0x250 [c000000f273ffc90] [c0000000000c562c] cpuhp_up_callbacks+0x5c/0x120 [c000000f273ffce0] [c0000000000c64d4] cpuhp_thread_fun+0x184/0x1c0 [c000000f273ffd20] [c0000000000fa050] smpboot_thread_fn+0x290/0x2a0 [c000000f273ffd80] [c0000000000f45b0] kthread+0x110/0x130 [c000000f273ffe30] [c000000000009570] ret_from_kernel_thread+0x5c/0x6c ---[ end trace 00f1456578b2a3b2 ]--- This patch fixes this by limiting the mask to the intersection of the pool affinity and online CPUs. Changelog-cribbed-from: Gautham R. Shenoy <[email protected]> Reported-by: Abdul Haleem <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2016-06-15bpf, trace: check event type in bpf_perf_event_readAlexei Starovoitov1-0/+4
similar to bpf_perf_event_output() the bpf_perf_event_read() helper needs to check the type of the perf_event before reading the counter. Fixes: a43eec304259 ("bpf: introduce bpf_perf_event_output() helper") Reported-by: Daniel Borkmann <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-06-15bpf: fix matching of data/data_end in verifierAlexei Starovoitov2-36/+11
The ctx structure passed into bpf programs is different depending on bpf program type. The verifier incorrectly marked ctx->data and ctx->data_end access based on ctx offset only. That caused loads in tracing programs int bpf_prog(struct pt_regs *ctx) { .. ctx->ax .. } to be incorrectly marked as PTR_TO_PACKET which later caused verifier to reject the program that was actually valid in tracing context. Fix this by doing program type specific matching of ctx offsets. Fixes: 969bf05eb3ce ("bpf: direct packet access") Reported-by: Sasha Goldshtein <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Daniel Borkmann <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2016-06-15kernel/kcov: unproxify debugfs file's fopsNicolai Stange1-1/+6
Since commit 49d200deaa68 ("debugfs: prevent access to removed files' private data"), a debugfs file's file_operations methods get proxied through lifetime aware wrappers. However, only a certain subset of the file_operations members is supported by debugfs and ->mmap isn't among them -- it appears to be NULL from the VFS layer's perspective. This behaviour breaks the /sys/kernel/debug/kcov file introduced concurrently with commit 5c9a8750a640 ("kernel: add kcov code coverage"). Since that file never gets removed, there is no file removal race and thus, a lifetime checking proxy isn't needed. Avoid the proxying for /sys/kernel/debug/kcov by creating it via debugfs_create_file_unsafe() rather than debugfs_create_file(). Fixes: 49d200deaa68 ("debugfs: prevent access to removed files' private data") Fixes: 5c9a8750a640 ("kernel: add kcov code coverage") Reported-by: Sasha Levin <[email protected]> Signed-off-by: Nicolai Stange <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2016-06-14kernel/sysrq, watchdog, sched/core: Reset watchdog on all CPUs while ↵Andrey Ryabinin1-2/+4
processing sysrq-w Lengthy output of sysrq-w may take a lot of time on slow serial console. Currently we reset NMI-watchdog on the current CPU to avoid spurious lockup messages. Sometimes this doesn't work since softlockup watchdog might trigger on another CPU which is waiting for an IPI to proceed. We reset softlockup watchdogs on all CPUs, but we do this only after listing all tasks, and this may be too late on a busy system. So, reset watchdogs CPUs earlier, in for_each_process_thread() loop. Signed-off-by: Andrey Ryabinin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-06-14sched/debug: Fix deadlock when enabling sched eventsJosh Poimboeuf1-1/+1
I see a hang when enabling sched events: echo 1 > /sys/kernel/debug/tracing/events/sched/enable The printk buffer shows: BUG: spinlock recursion on CPU#1, swapper/1/0 lock: 0xffff88007d5d8c00, .magic: dead4ead, .owner: swapper/1/0, .owner_cpu: 1 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.7.0-rc2+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.8.1-20150318_183358- 04/01/2014 ... Call Trace: <IRQ> [<ffffffff8143d663>] dump_stack+0x85/0xc2 [<ffffffff81115948>] spin_dump+0x78/0xc0 [<ffffffff81115aea>] do_raw_spin_lock+0x11a/0x150 [<ffffffff81891471>] _raw_spin_lock+0x61/0x80 [<ffffffff810e5466>] ? try_to_wake_up+0x256/0x4e0 [<ffffffff810e5466>] try_to_wake_up+0x256/0x4e0 [<ffffffff81891a0a>] ? _raw_spin_unlock_irqrestore+0x4a/0x80 [<ffffffff810e5705>] wake_up_process+0x15/0x20 [<ffffffff810cebb4>] insert_work+0x84/0xc0 [<ffffffff810ced7f>] __queue_work+0x18f/0x660 [<ffffffff810cf9a6>] queue_work_on+0x46/0x90 [<ffffffffa00cd95b>] drm_fb_helper_dirty.isra.11+0xcb/0xe0 [drm_kms_helper] [<ffffffffa00cdac0>] drm_fb_helper_sys_imageblit+0x30/0x40 [drm_kms_helper] [<ffffffff814babcd>] soft_cursor+0x1ad/0x230 [<ffffffff814ba379>] bit_cursor+0x649/0x680 [<ffffffff814b9d30>] ? update_attr.isra.2+0x90/0x90 [<ffffffff814b5e6a>] fbcon_cursor+0x14a/0x1c0 [<ffffffff81555ef8>] hide_cursor+0x28/0x90 [<ffffffff81558b6f>] vt_console_print+0x3bf/0x3f0 [<ffffffff81122c63>] call_console_drivers.constprop.24+0x183/0x200 [<ffffffff811241f4>] console_unlock+0x3d4/0x610 [<ffffffff811247f5>] vprintk_emit+0x3c5/0x610 [<ffffffff81124bc9>] vprintk_default+0x29/0x40 [<ffffffff811e965b>] printk+0x57/0x73 [<ffffffff810f7a9e>] enqueue_entity+0xc2e/0xc70 [<ffffffff810f7b39>] enqueue_task_fair+0x59/0xab0 [<ffffffff8106dcd9>] ? kvm_sched_clock_read+0x9/0x20 [<ffffffff8103fb39>] ? sched_clock+0x9/0x10 [<ffffffff810e3fcc>] activate_task+0x5c/0xa0 [<ffffffff810e4514>] ttwu_do_activate+0x54/0xb0 [<ffffffff810e5cea>] sched_ttwu_pending+0x7a/0xb0 [<ffffffff810e5e51>] scheduler_ipi+0x61/0x170 [<ffffffff81059e7f>] smp_trace_reschedule_interrupt+0x4f/0x2a0 [<ffffffff81893ba6>] trace_reschedule_interrupt+0x96/0xa0 <EOI> [<ffffffff8106e0d6>] ? native_safe_halt+0x6/0x10 [<ffffffff8110fb1d>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff81040ac0>] default_idle+0x20/0x1a0 [<ffffffff8104147f>] arch_cpu_idle+0xf/0x20 [<ffffffff81102f8f>] default_idle_call+0x2f/0x50 [<ffffffff8110332e>] cpu_startup_entry+0x37e/0x450 [<ffffffff8105af70>] start_secondary+0x160/0x1a0 Note the hang only occurs when echoing the above from a physical serial console, not from an ssh session. The bug is caused by a deadlock where the task is trying to grab the rq lock twice because printk()'s aren't safe in sched code. Signed-off-by: Josh Poimboeuf <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Matt Fleming <[email protected]> Cc: Mel Gorman <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Srikar Dronamraju <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Fixes: cb2517653fcc ("sched/debug: Make schedstats a runtime tunable that is disabled by default") Link: http://lkml.kernel.org/r/20160613073209.gdvdybiruljbkn3p@treble Signed-off-by: Ingo Molnar <[email protected]>
2016-06-14sched/fair: Fix post_init_entity_util_avg() serializationPeter Zijlstra2-3/+8
Chris Wilson reported a divide by 0 at: post_init_entity_util_avg(): > 725 if (cfs_rq->avg.util_avg != 0) { > 726 sa->util_avg = cfs_rq->avg.util_avg * se->load.weight; > -> 727 sa->util_avg /= (cfs_rq->avg.load_avg + 1); > 728 > 729 if (sa->util_avg > cap) > 730 sa->util_avg = cap; > 731 } else { Which given the lack of serialization, and the code generated from update_cfs_rq_load_avg() is entirely possible: if (atomic_long_read(&cfs_rq->removed_load_avg)) { s64 r = atomic_long_xchg(&cfs_rq->removed_load_avg, 0); sa->load_avg = max_t(long, sa->load_avg - r, 0); sa->load_sum = max_t(s64, sa->load_sum - r * LOAD_AVG_MAX, 0); removed_load = 1; } turns into: ffffffff81087064: 49 8b 85 98 00 00 00 mov 0x98(%r13),%rax ffffffff8108706b: 48 85 c0 test %rax,%rax ffffffff8108706e: 74 40 je ffffffff810870b0 ffffffff81087070: 4c 89 f8 mov %r15,%rax ffffffff81087073: 49 87 85 98 00 00 00 xchg %rax,0x98(%r13) ffffffff8108707a: 49 29 45 70 sub %rax,0x70(%r13) ffffffff8108707e: 4c 89 f9 mov %r15,%rcx ffffffff81087081: bb 01 00 00 00 mov $0x1,%ebx ffffffff81087086: 49 83 7d 70 00 cmpq $0x0,0x70(%r13) ffffffff8108708b: 49 0f 49 4d 70 cmovns 0x70(%r13),%rcx Which you'll note ends up with 'sa->load_avg - r' in memory at ffffffff8108707a. By calling post_init_entity_util_avg() under rq->lock we're sure to be fully serialized against PELT updates and cannot observe intermediate state like this. Reported-by: Chris Wilson <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Cc: Andrey Ryabinin <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Mike Galbraith <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Yuyang Du <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Fixes: 2b8c41daba32 ("sched/fair: Initiate a new task's util avg to a bounded value") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2016-06-10Merge branch 'stacking-fixes' (vfs stacking fixes from Jann)Linus Torvalds1-1/+2
Merge filesystem stacking fixes from Jann Horn. * emailed patches from Jann Horn <[email protected]>: sched: panic on corrupted stack end ecryptfs: forbid opening files without mmap handler proc: prevent stacking filesystems on top
2016-06-10sched: panic on corrupted stack endJann Horn1-1/+2
Until now, hitting this BUG_ON caused a recursive oops (because oops handling involves do_exit(), which calls into the scheduler, which in turn raises an oops), which caused stuff below the stack to be overwritten until a panic happened (e.g. via an oops in interrupt context, caused by the overwritten CPU index in the thread_info). Just panic directly. Signed-off-by: Jann Horn <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-06-10Merge branch 'sched-urgent-for-linus' of ↵Linus Torvalds3-16/+28
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Ingo Molnar: "Two scheduler debugging fixes" * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/debug: Fix 'schedstats=enable' cmdline option sched/debug: Fix /proc/sched_debug regression
2016-06-10Merge branch 'perf-urgent-for-linus' of ↵Linus Torvalds1-4/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Ingo Molnar: "A handful of tooling fixes, two PMU driver fixes and a cleanup of redundant code that addresses a security analyzer false positive" * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/core: Remove a redundant check perf/x86/intel/uncore: Remove SBOX support for Broadwell server perf ctf: Convert invalid chars in a string before set value perf record: Fix crash when kptr is restricted perf symbols: Check kptr_restrict for root perf/x86/intel/rapl: Fix pmus free during cleanup
2016-06-10Merge branch 'locking-urgent-for-linus' of ↵Linus Torvalds3-6/+77
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: "Misc fixes: - a file-based futex fix - one more spin_unlock_wait() fix - a ww-mutex deadlock detection improvement/fix - and a raw_read_seqcount_latch() barrier fix" * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Calculate the futex key based on a tail page for file-based futexes locking/qspinlock: Fix spin_unlock_wait() some more locking/ww_mutex: Report recursive ww_mutex locking early locking/seqcount: Re-fix raw_read_seqcount_latch()
2016-06-10Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds1-2/+2
Pull networking fixes from David Miller: 1) nfnetlink timestamp taken from wrong skb, fix from Florian Westphal. 2) Revert some msleep conversions in rtlwifi as these spots are in atomic context, from Larry Finger. 3) Validate that NFTA_SET_TABLE attribute is actually specified when we call nf_tables_getset(). From Phil Turnbull. 4) Don't do mdio_reset in stmmac driver with spinlock held as that can sleep, from Vincent Palatin. 5) sk_filter() does things other than run a BPF filter, so we should not elide it's call just because sk->sk_filter is NULL. Fix from Eric Dumazet. 6) Fix missing backlog updates in several packet schedulers, from Cong Wang. 7) bnx2x driver should allow VLAN add/remove while the interface is down, from Michal Schmidt. 8) Several RDS/TCP race fixes from Sowmini Varadhan. 9) fq_codel scheduler doesn't return correct queue length in dumps, from Eric Dumazet. 10) Fix TCP stats for tail loss probe and early retransmit in ipv6, from Yuchung Cheng. 11) Properly initialize udp_tunnel_socket_cfg in l2tp_tunnel_create(), from Guillaume Nault. 12) qfq scheduler leaks SKBs if a kzalloc fails, fix from Florian Westphal. 13) sock_fprog passed into PACKET_FANOUT_DATA needs compat handling, from Willem de Bruijn. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits) vmxnet3: segCnt can be 1 for LRO packets packet: compat support for sock_fprog stmmac: fix parameter to dwmac4_set_umac_addr() net/mlx5e: Fix blue flame quota logic net/mlx5e: Use ndo_stop explicitly at shutdown flow net/mlx5: E-Switch, always set mc_promisc for allmulti vports net/mlx5: E-Switch, Modify node guid on vf set MAC net/mlx5: E-Switch, Fix vport enable flow net/mlx5: E-Switch, Use the correct error check on returned pointers net/mlx5: E-Switch, Use the correct free() function net/mlx5: Fix E-Switch flow steering capabilities check net/mlx5: Fix flow steering NIC capabilities check net/mlx5: Fix root flow table update net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly net/mlx5: Fix masking of reserved bits in XRCD number net/mlx5: Fix the size of modify QP mailbox mlxsw: spectrum: Don't sleep during ndo_get_phys_port_name() mlxsw: spectrum: Make split flow match firmware requirements wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel cfg80211: remove get/set antenna and tx power warnings ...