aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2020-01-29tracing/boot: Include required headers and sort it alphabeticallyMasami Hiramatsu1-1/+8
Include some required (but currently indirectly included) headers and sort it alphabetically. Link: http://lkml.kernel.org/r/158029059514.12381.6597832266860248781.stgit@devnote2 Signed-off-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-01-28tracing: Add 'hist:' to hist trigger error log error stringTom Zanussi1-1/+2
The 'hist:' prefix gets stripped from the command text during command processing, but should be added back when displaying the command during error processing. Not only because it's what should be displayed but also because not having it means the test cases fail because the caret is miscalculated by the length of the prefix string. Link: http://lkml.kernel.org/r/449df721f560042e22382f67574bcc5b4d830d3d.1561743018.git.zanussi@kernel.org Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-01-28tracing: Add hist trigger error messages for sort specificationTom Zanussi1-4/+17
Add error codes and messages for all the error paths leading to sort specification parsing errors. Link: http://lkml.kernel.org/r/237830dc05e583fbb53664d817a784297bf961be.1561743018.git.zanussi@kernel.org Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-01-28tracing: Simplify assignment parsing for hist triggersTom Zanussi1-43/+27
In the process of adding better error messages for sorting, I realized that strsep was being used incorrectly and some of the error paths I was expecting to be hit weren't and just fell through to the common invalid key error case. It also became obvious that for keyword assignments, it wasn't necessary to save the full assignment and reparse it later, and having a common empty-assignment check would also make more sense in terms of error processing. Change the code to fix these problems and simplify it for new error message changes in a subsequent patch. Link: http://lkml.kernel.org/r/1c3ef0b6655deaf345f6faee2584a0298ac2d743.1561743018.git.zanussi@kernel.org Fixes: e62347d24534 ("tracing: Add hist trigger support for user-defined sorting ('sort=' param)") Fixes: 7ef224d1d0e3 ("tracing: Add 'hist' event trigger command") Fixes: a4072fe85ba3 ("tracing: Add a clock attribute for hist triggers") Reported-by: Masami Hiramatsu <[email protected]> Reviewed-by: Masami Hiramatsu <[email protected]> Signed-off-by: Tom Zanussi <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-01-28Merge tag 'for-linus-5.6-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Anton Ivanov: "I am sending this on behalf of Richard who is traveling. This contains the following changes for UML: - Fix for time travel mode - Disable CONFIG_CONSTRUCTORS again - A new command line option to have an non-raw serial line - Preparations to remove obsolete UML network drivers" * tag 'for-linus-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: Fix time-travel=inf-cpu with xor/raid6 Revert "um: Enable CONFIG_CONSTRUCTORS" um: Mark non-vector net transports as obsolete um: Add an option to make serial driver non-raw
2020-01-28Merge tag 'trace-v5.5-rc7' of ↵Linus Torvalds1-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Kprobe events added 'ustring' to distinguish reading strings from kernel space or user space. But the creating of the event format file only checks for 'string' to display string formats. 'ustring' must also be handled" * tag 'trace-v5.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing/kprobes: Have uname use __get_str() in print_fmt
2020-01-28Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextLinus Torvalds21-621/+2798
Pull networking updates from David Miller: 1) Add WireGuard 2) Add HE and TWT support to ath11k driver, from John Crispin. 3) Add ESP in TCP encapsulation support, from Sabrina Dubroca. 4) Add variable window congestion control to TIPC, from Jon Maloy. 5) Add BCM84881 PHY driver, from Russell King. 6) Start adding netlink support for ethtool operations, from Michal Kubecek. 7) Add XDP drop and TX action support to ena driver, from Sameeh Jubran. 8) Add new ipv4 route notifications so that mlxsw driver does not have to handle identical routes itself. From Ido Schimmel. 9) Add BPF dynamic program extensions, from Alexei Starovoitov. 10) Support RX and TX timestamping in igc, from Vinicius Costa Gomes. 11) Add support for macsec HW offloading, from Antoine Tenart. 12) Add initial support for MPTCP protocol, from Christoph Paasch, Matthieu Baerts, Florian Westphal, Peter Krystad, and many others. 13) Add Octeontx2 PF support, from Sunil Goutham, Geetha sowjanya, Linu Cherian, and others. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1469 commits) net: phy: add default ARCH_BCM_IPROC for MDIO_BCM_IPROC udp: segment looped gso packets correctly netem: change mailing list qed: FW 8.42.2.0 debug features qed: rt init valid initialization changed qed: Debug feature: ilt and mdump qed: FW 8.42.2.0 Add fw overlay feature qed: FW 8.42.2.0 HSI changes qed: FW 8.42.2.0 iscsi/fcoe changes qed: Add abstraction for different hsi values per chip qed: FW 8.42.2.0 Additional ll2 type qed: Use dmae to write to widebus registers in fw_funcs qed: FW 8.42.2.0 Parser offsets modified qed: FW 8.42.2.0 Queue Manager changes qed: FW 8.42.2.0 Expose new registers and change windows qed: FW 8.42.2.0 Internal ram offsets modifications MAINTAINERS: Add entry for Marvell OcteonTX2 Physical Function driver Documentation: net: octeontx2: Add RVU HW and drivers overview octeontx2-pf: ethtool RSS config support octeontx2-pf: Add basic ethtool support ...
2020-01-28Merge branch 'linus' of ↵Linus Torvalds1-192/+194
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto updates from Herbert Xu: "API: - Removed CRYPTO_TFM_RES flags - Extended spawn grabbing to all algorithm types - Moved hash descsize verification into API code Algorithms: - Fixed recursive pcrypt dead-lock - Added new 32 and 64-bit generic versions of poly1305 - Added cryptogams implementation of x86/poly1305 Drivers: - Added support for i.MX8M Mini in caam - Added support for i.MX8M Nano in caam - Added support for i.MX8M Plus in caam - Added support for A33 variant of SS in sun4i-ss - Added TEE support for Raven Ridge in ccp - Added in-kernel API to submit TEE commands in ccp - Added AMD-TEE driver - Added support for BCM2711 in iproc-rng200 - Added support for AES256-GCM based ciphers for chtls - Added aead support on SEC2 in hisilicon" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (244 commits) crypto: arm/chacha - fix build failured when kernel mode NEON is disabled crypto: caam - add support for i.MX8M Plus crypto: x86/poly1305 - emit does base conversion itself crypto: hisilicon - fix spelling mistake "disgest" -> "digest" crypto: chacha20poly1305 - add back missing test vectors and test chunking crypto: x86/poly1305 - fix .gitignore typo tee: fix memory allocation failure checks on drv_data and amdtee crypto: ccree - erase unneeded inline funcs crypto: ccree - make cc_pm_put_suspend() void crypto: ccree - split overloaded usage of irq field crypto: ccree - fix PM race condition crypto: ccree - fix FDE descriptor sequence crypto: ccree - cc_do_send_request() is void func crypto: ccree - fix pm wrongful error reporting crypto: ccree - turn errors to debug msgs crypto: ccree - fix AEAD decrypt auth fail crypto: ccree - fix typo in comment crypto: ccree - fix typos in error msgs crypto: atmel-{aes,sha,tdes} - Retire crypto_platform_data crypto: x86/sha - Eliminate casts on asm implementations ...
2020-01-28sched/fair: Prevent unlimited runtime on throttled groupVincent Guittot1-1/+8
When a running task is moved on a throttled task group and there is no other task enqueued on the CPU, the task can keep running using 100% CPU whatever the allocated bandwidth for the group and although its cfs rq is throttled. Furthermore, the group entity of the cfs_rq and its parents are not enqueued but only set as curr on their respective cfs_rqs. We have the following sequence: sched_move_task -dequeue_task: dequeue task and group_entities. -put_prev_task: put task and group entities. -sched_change_group: move task to new group. -enqueue_task: enqueue only task but not group entities because cfs_rq is throttled. -set_next_task : set task and group_entities as current sched_entity of their cfs_rq. Another impact is that the root cfs_rq runnable_load_avg at root rq stays null because the group_entities are not enqueued. This situation will stay the same until an "external" event triggers a reschedule. Let trigger it immediately instead. Signed-off-by: Vincent Guittot <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Acked-by: Ben Segall <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-28sched/nohz: Optimize get_nohz_timer_target()Wanpeng Li1-7/+12
On a machine, CPU 0 is used for housekeeping, the other 39 CPUs in the same socket are in nohz_full mode. We can observe huge time burn in the loop for seaching nearest busy housekeeper cpu by ftrace. 2) | get_nohz_timer_target() { 2) 0.240 us | housekeeping_test_cpu(); 2) 0.458 us | housekeeping_test_cpu(); ... 2) 0.292 us | housekeeping_test_cpu(); 2) 0.240 us | housekeeping_test_cpu(); 2) 0.227 us | housekeeping_any_cpu(); 2) + 43.460 us | } This patch optimizes the searching logic by finding a nearest housekeeper CPU in the housekeeping cpumask, it can minimize the worst searching time from ~44us to < 10us in my testing. In addition, the last iterated busy housekeeper can become a random candidate while current CPU is a better fallback if it is a housekeeper. Signed-off-by: Wanpeng Li <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Frederic Weisbecker <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-28sched/uclamp: Reject negative values in cpu_uclamp_write()Qais Yousef1-1/+1
The check to ensure that the new written value into cpu.uclamp.{min,max} is within range, [0:100], wasn't working because of the signed comparison 7301 if (req.percent > UCLAMP_PERCENT_SCALE) { 7302 req.ret = -ERANGE; 7303 return req; 7304 } # echo -1 > cpu.uclamp.min # cat cpu.uclamp.min 42949671.96 Cast req.percent into u64 to force the comparison to be unsigned and work as intended in capacity_from_percent(). # echo -1 > cpu.uclamp.min sh: write error: Numerical result out of range Fixes: 2480c093130f ("sched/uclamp: Extend CPU's cgroup controller") Signed-off-by: Qais Yousef <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-28sched/fair: Allow a small load imbalance between low utilisation SD_NUMA domainsMel Gorman1-12/+29
The CPU load balancer balances between different domains to spread load and strives to have equal balance everywhere. Communicating tasks can migrate so they are topologically close to each other but these decisions are independent. On a lightly loaded NUMA machine, two communicating tasks pulled together at wakeup time can be pushed apart by the load balancer. In isolation, the load balancer decision is fine but it ignores the tasks data locality and the wakeup/LB paths continually conflict. NUMA balancing is also a factor but it also simply conflicts with the load balancer. This patch allows a fixed degree of imbalance of two tasks to exist between NUMA domains regardless of utilisation levels. In many cases, this prevents communicating tasks being pulled apart. It was evaluated whether the imbalance should be scaled to the domain size. However, no additional benefit was measured across a range of workloads and machines and scaling adds the risk that lower domains have to be rebalanced. While this could change again in the future, such a change should specify the use case and benefit. The most obvious impact is on netperf TCP_STREAM -- two simple communicating tasks with some softirq offload depending on the transmission rate. 2-socket Haswell machine 48 core, HT enabled netperf-tcp -- mmtests config config-network-netperf-unbound baseline lbnuma-v3 Hmean 64 568.73 ( 0.00%) 577.56 * 1.55%* Hmean 128 1089.98 ( 0.00%) 1128.06 * 3.49%* Hmean 256 2061.72 ( 0.00%) 2104.39 * 2.07%* Hmean 1024 7254.27 ( 0.00%) 7557.52 * 4.18%* Hmean 2048 11729.20 ( 0.00%) 13350.67 * 13.82%* Hmean 3312 15309.08 ( 0.00%) 18058.95 * 17.96%* Hmean 4096 17338.75 ( 0.00%) 20483.66 * 18.14%* Hmean 8192 25047.12 ( 0.00%) 27806.84 * 11.02%* Hmean 16384 27359.55 ( 0.00%) 33071.88 * 20.88%* Stddev 64 2.16 ( 0.00%) 2.02 ( 6.53%) Stddev 128 2.31 ( 0.00%) 2.19 ( 5.05%) Stddev 256 11.88 ( 0.00%) 3.22 ( 72.88%) Stddev 1024 23.68 ( 0.00%) 7.24 ( 69.43%) Stddev 2048 79.46 ( 0.00%) 71.49 ( 10.03%) Stddev 3312 26.71 ( 0.00%) 57.80 (-116.41%) Stddev 4096 185.57 ( 0.00%) 96.15 ( 48.19%) Stddev 8192 245.80 ( 0.00%) 100.73 ( 59.02%) Stddev 16384 207.31 ( 0.00%) 141.65 ( 31.67%) In this case, there was a sizable improvement to performance and a general reduction in variance. However, this is not univeral. For most machines, the impact was roughly a 3% performance gain. Ops NUMA base-page range updates 19796.00 292.00 Ops NUMA PTE updates 19796.00 292.00 Ops NUMA PMD updates 0.00 0.00 Ops NUMA hint faults 16113.00 143.00 Ops NUMA hint local faults % 8407.00 142.00 Ops NUMA hint local percent 52.18 99.30 Ops NUMA pages migrated 4244.00 1.00 Without the patch, only 52.18% of sampled accesses are local. In an earlier changelog, 100% of sampled accesses are local and indeed on most machines, this was still the case. In this specific case, the local sampled rates was 99.3% but note the "base-page range updates" and "PTE updates". The activity with the patch is negligible as were the number of faults. The small number of pages migrated were related to shared libraries. A 2-socket Broadwell showed better results on average but are not presented for brevity as the performance was similar except it showed 100% of the sampled NUMA hints were local. The patch holds up for a 4-socket Haswell, an AMD EPYC and AMD Epyc 2 machine. For dbench, the impact depends on the filesystem used and the number of clients. On XFS, there is little difference as the clients typically communicate with workqueues which have a separate class of scheduler problem at the moment. For ext4, performance is generally better, particularly for small numbers of clients as NUMA balancing activity is negligible with the patch applied. A more interesting example is the Facebook schbench which uses a number of messaging threads to communicate with worker threads. In this configuration, one messaging thread is used per NUMA node and the number of worker threads is varied. The 50, 75, 90, 95, 99, 99.5 and 99.9 percentiles for response latency is then reported. Lat 50.00th-qrtle-1 44.00 ( 0.00%) 37.00 ( 15.91%) Lat 75.00th-qrtle-1 53.00 ( 0.00%) 41.00 ( 22.64%) Lat 90.00th-qrtle-1 57.00 ( 0.00%) 42.00 ( 26.32%) Lat 95.00th-qrtle-1 63.00 ( 0.00%) 43.00 ( 31.75%) Lat 99.00th-qrtle-1 76.00 ( 0.00%) 51.00 ( 32.89%) Lat 99.50th-qrtle-1 89.00 ( 0.00%) 52.00 ( 41.57%) Lat 99.90th-qrtle-1 98.00 ( 0.00%) 55.00 ( 43.88%) Lat 50.00th-qrtle-2 42.00 ( 0.00%) 42.00 ( 0.00%) Lat 75.00th-qrtle-2 48.00 ( 0.00%) 47.00 ( 2.08%) Lat 90.00th-qrtle-2 53.00 ( 0.00%) 52.00 ( 1.89%) Lat 95.00th-qrtle-2 55.00 ( 0.00%) 53.00 ( 3.64%) Lat 99.00th-qrtle-2 62.00 ( 0.00%) 60.00 ( 3.23%) Lat 99.50th-qrtle-2 63.00 ( 0.00%) 63.00 ( 0.00%) Lat 99.90th-qrtle-2 68.00 ( 0.00%) 66.00 ( 2.94% For higher worker threads, the differences become negligible but it's interesting to note the difference in wakeup latency at low utilisation and mpstat confirms that activity was almost all on one node until the number of worker threads increase. Hackbench generally showed neutral results across a range of machines. This is different to earlier versions of the patch which allowed imbalances for higher degrees of utilisation. perf bench pipe showed negligible differences in overall performance as the differences are very close to the noise. An earlier prototype of the patch showed major regressions for NAS C-class when running with only half of the available CPUs -- 20-30% performance hits were measured at the time. With this version of the patch, the impact is negligible with small gains/losses within the noise measured. This is because the number of threads far exceeds the small imbalance the aptch cares about. Similarly, there were report of regressions for the autonuma benchmark against earlier versions but again, normal load balancing now applies for that workload. In general, the patch simply seeks to avoid unnecessary cross-node migrations in the basic case where imbalances are very small. For low utilisation communicating workloads, this patch generally behaves better with less NUMA balancing activity. For high utilisation, there is no change in behaviour. Signed-off-by: Mel Gorman <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Valentin Schneider <[email protected]> Reviewed-by: Vincent Guittot <[email protected]> Reviewed-by: Srikar Dronamraju <[email protected]> Acked-by: Phil Auld <[email protected]> Tested-by: Phil Auld <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-28timers/nohz: Update NOHZ load in remote tickPeter Zijlstra (Intel)2-11/+26
The way loadavg is tracked during nohz only pays attention to the load upon entering nohz. This can be particularly noticeable if full nohz is entered while non-idle, and then the cpu goes idle and stays that way for a long time. Use the remote tick to ensure that full nohz cpus report their deltas within a reasonable time. [ swood: Added changelog and removed recheck of stopped tick. ] Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Scott Wood <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-28sched/core: Don't skip remote tick for idle CPUsScott Wood1-8/+10
This will be used in the next patch to get a loadavg update from nohz cpus. The delta check is skipped because idle_sched_class doesn't update se.exec_start. Signed-off-by: Scott Wood <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-28perf/cgroups: Install cgroup events to correct cpuctxSong Liu1-3/+4
cgroup events are always installed in the cpuctx. However, when it is not installed via IPI, list_update_cgroup_event() adds it to cpuctx of current CPU, which triggers list corruption: [] list_add double add: new=ffff888ff7cf0db0, prev=ffff888ff7ce82f0, next=ffff888ff7cf0db0. To reproduce this, we can simply run: # perf stat -e cs -a & # perf stat -e cs -G anycgroup Fix this by installing it to cpuctx that contains event->ctx, and the proper cgrp_cpuctx_list. Fixes: db0503e4f675 ("perf/core: Optimize perf_install_in_event()") Suggested-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Song Liu <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-28perf/core: Fix mlock accounting in perf_mmap()Song Liu1-1/+9
Decreasing sysctl_perf_event_mlock between two consecutive perf_mmap()s of a perf ring buffer may lead to an integer underflow in locked memory accounting. This may lead to the undesired behaviors, such as failures in BPF map creation. Address this by adjusting the accounting logic to take into account the possibility that the amount of already locked memory may exceed the current limit. Fixes: c4b75479741c ("perf/core: Make the mlock accounting simple again") Suggested-by: Alexander Shishkin <[email protected]> Signed-off-by: Song Liu <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: <[email protected]> Acked-by: Alexander Shishkin <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-01-28Merge branch 'sched-core-for-linus' of ↵Linus Torvalds19-191/+315
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "These were the main changes in this cycle: - More -rt motivated separation of CONFIG_PREEMPT and CONFIG_PREEMPTION. - Add more low level scheduling topology sanity checks and warnings to filter out nonsensical topologies that break scheduling. - Extend uclamp constraints to influence wakeup CPU placement - Make the RT scheduler more aware of asymmetric topologies and CPU capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y - Make idle CPU selection more consistent - Various fixes, smaller cleanups, updates and enhancements - please see the git log for details" * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits) sched/fair: Define sched_idle_cpu() only for SMP configurations sched/topology: Assert non-NUMA topology masks don't (partially) overlap idle: fix spelling mistake "iterrupts" -> "interrupts" sched/fair: Remove redundant call to cpufreq_update_util() sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP sched/fair: calculate delta runnable load only when it's needed sched/cputime: move rq parameter in irqtime_account_process_tick stop_machine: Make stop_cpus() static sched/debug: Reset watchdog on all CPUs while processing sysrq-t sched/core: Fix size of rq::uclamp initialization sched/uclamp: Fix a bug in propagating uclamp value in new cgroups sched/fair: Load balance aggressively for SCHED_IDLE CPUs sched/fair : Improve update_sd_pick_busiest for spare capacity case watchdog: Remove soft_lockup_hrtimer_cnt and related code sched/rt: Make RT capacity-aware sched/fair: Make EAS wakeup placement consider uclamp restrictions sched/fair: Make task_fits_capacity() consider uclamp restrictions sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with() sched/uclamp: Make uclamp util helpers use and return UL values ...
2020-01-28Merge branch 'perf-core-for-linus' of ↵Linus Torvalds10-236/+185
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf updates from Ingo Molnar: "Kernel side changes: - Ftrace is one of the last W^X violators (after this only KLP is left). These patches move it over to the generic text_poke() interface and thereby get rid of this oddity. This requires a surprising amount of surgery, by Peter Zijlstra. - x86/AMD PMUs: add support for 'Large Increment per Cycle Events' to count certain types of events that have a special, quirky hw ABI (by Kim Phillips) - kprobes fixes by Masami Hiramatsu Lots of tooling updates as well, the following subcommands were updated: annotate/report/top, c2c, clang, record, report/top TUI, sched timehist, tests; plus updates were done to the gtk ui, libperf, headers and the parser" * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits) perf/x86/amd: Add support for Large Increment per Cycle Events perf/x86/amd: Constrain Large Increment per Cycle events perf/x86/intel/rapl: Add Comet Lake support tracing: Initialize ret in syscall_enter_define_fields() perf header: Use last modification time for timestamp perf c2c: Fix return type for histogram sorting comparision functions perf beauty sockaddr: Fix augmented syscall format warning perf/ui/gtk: Fix gtk2 build perf ui gtk: Add missing zalloc object perf tools: Use %define api.pure full instead of %pure-parser libperf: Setup initial evlist::all_cpus value perf report: Fix no libunwind compiled warning break s390 issue perf tools: Support --prefix/--prefix-strip perf report: Clarify in help that --children is default tools build: Fix test-clang.cpp with Clang 8+ perf clang: Fix build with Clang 9 kprobes: Fix optimize_kprobe()/unoptimize_kprobe() cancellation logic tools lib: Fix builds when glibc contains strlcpy() perf report/top: Make 'e' visible in the help and make it toggle showing callchains perf report/top: Do not offer annotation for symbols without samples ...
2020-01-28Merge branch 'locking-core-for-linus' of ↵Linus Torvalds3-21/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Just a handful of changes in this cycle: an ARM64 performance optimization, a comment fix and a debug output fix" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/osq: Use optimized spinning loop for arm64 locking/qspinlock: Fix inaccessible URL of MCS lock paper locking/lockdep: Fix lockdep_stats indentation problem
2020-01-28Merge branch 'core-rcu-for-linus' of ↵Linus Torvalds17-385/+778
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull RCU updates from Ingo Molnar: "The RCU changes in this cycle were: - Expedited grace-period updates - kfree_rcu() updates - RCU list updates - Preemptible RCU updates - Torture-test updates - Miscellaneous fixes - Documentation updates" * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (69 commits) rcu: Remove unused stop-machine #include powerpc: Remove comment about read_barrier_depends() .mailmap: Add entries for old [email protected] addresses srcu: Apply *_ONCE() to ->srcu_last_gp_end rcu: Switch force_qs_rnp() to for_each_leaf_node_cpu_mask() rcu: Move rcu_{expedited,normal} definitions into rcupdate.h rcu: Move gp_state_names[] and gp_state_getname() to tree_stall.h rcu: Remove the declaration of call_rcu() in tree.h rcu: Fix tracepoint tracking RCU CPU kthread utilization rcu: Fix harmless omission of "CONFIG_" from #if condition rcu: Avoid tick_dep_set_cpu() misordering rcu: Provide wrappers for uses of ->rcu_read_lock_nesting rcu: Use READ_ONCE() for ->expmask in rcu_read_unlock_special() rcu: Clear ->rcu_read_unlock_special only once rcu: Clear .exp_hint only when deferred quiescent state has been reported rcu: Rename some instance of CONFIG_PREEMPTION to CONFIG_PREEMPT_RCU rcu: Remove kfree_call_rcu_nobatch() rcu: Remove kfree_rcu() special casing and lazy-callback handling rcu: Add support for debug_objects debugging for kfree_rcu() rcu: Add multiple in-flight batches of kfree_rcu() work ...
2020-01-28smp: Remove superfluous cond_func check in smp_call_function_many_cond()Sebastian Andrzej Siewior1-1/+1
It was requested to remove the cond_func check but the follow up patch was overlooked. Remove it now. Fixes: 67719ef25eeb ("smp: Add a smp_cond_func_t argument to smp_call_function_many()") Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-01-28prctl: PR_{G,S}ET_IO_FLUSHER to support controlling memory reclaimMike Christie1-0/+25
There are several storage drivers like dm-multipath, iscsi, tcmu-runner, amd nbd that have userspace components that can run in the IO path. For example, iscsi and nbd's userspace deamons may need to recreate a socket and/or send IO on it, and dm-multipath's daemon multipathd may need to send SG IO or read/write IO to figure out the state of paths and re-set them up. In the kernel these drivers have access to GFP_NOIO/GFP_NOFS and the memalloc_*_save/restore functions to control the allocation behavior, but for userspace we would end up hitting an allocation that ended up writing data back to the same device we are trying to allocate for. The device is then in a state of deadlock, because to execute IO the device needs to allocate memory, but to allocate memory the memory layers want execute IO to the device. Here is an example with nbd using a local userspace daemon that performs network IO to a remote server. We are using XFS on top of the nbd device, but it can happen with any FS or other modules layered on top of the nbd device that can write out data to free memory. Here a nbd daemon helper thread, msgr-worker-1, is performing a write/sendmsg on a socket to execute a request. This kicks off a reclaim operation which results in a WRITE to the nbd device and the nbd thread calling back into the mm layer. [ 1626.609191] msgr-worker-1 D 0 1026 1 0x00004000 [ 1626.609193] Call Trace: [ 1626.609195] ? __schedule+0x29b/0x630 [ 1626.609197] ? wait_for_completion+0xe0/0x170 [ 1626.609198] schedule+0x30/0xb0 [ 1626.609200] schedule_timeout+0x1f6/0x2f0 [ 1626.609202] ? blk_finish_plug+0x21/0x2e [ 1626.609204] ? _xfs_buf_ioapply+0x2e6/0x410 [ 1626.609206] ? wait_for_completion+0xe0/0x170 [ 1626.609208] wait_for_completion+0x108/0x170 [ 1626.609210] ? wake_up_q+0x70/0x70 [ 1626.609212] ? __xfs_buf_submit+0x12e/0x250 [ 1626.609214] ? xfs_bwrite+0x25/0x60 [ 1626.609215] xfs_buf_iowait+0x22/0xf0 [ 1626.609218] __xfs_buf_submit+0x12e/0x250 [ 1626.609220] xfs_bwrite+0x25/0x60 [ 1626.609222] xfs_reclaim_inode+0x2e8/0x310 [ 1626.609224] xfs_reclaim_inodes_ag+0x1b6/0x300 [ 1626.609227] xfs_reclaim_inodes_nr+0x31/0x40 [ 1626.609228] super_cache_scan+0x152/0x1a0 [ 1626.609231] do_shrink_slab+0x12c/0x2d0 [ 1626.609233] shrink_slab+0x9c/0x2a0 [ 1626.609235] shrink_node+0xd7/0x470 [ 1626.609237] do_try_to_free_pages+0xbf/0x380 [ 1626.609240] try_to_free_pages+0xd9/0x1f0 [ 1626.609245] __alloc_pages_slowpath+0x3a4/0xd30 [ 1626.609251] ? ___slab_alloc+0x238/0x560 [ 1626.609254] __alloc_pages_nodemask+0x30c/0x350 [ 1626.609259] skb_page_frag_refill+0x97/0xd0 [ 1626.609274] sk_page_frag_refill+0x1d/0x80 [ 1626.609279] tcp_sendmsg_locked+0x2bb/0xdd0 [ 1626.609304] tcp_sendmsg+0x27/0x40 [ 1626.609307] sock_sendmsg+0x54/0x60 [ 1626.609308] ___sys_sendmsg+0x29f/0x320 [ 1626.609313] ? sock_poll+0x66/0xb0 [ 1626.609318] ? ep_item_poll.isra.15+0x40/0xc0 [ 1626.609320] ? ep_send_events_proc+0xe6/0x230 [ 1626.609322] ? hrtimer_try_to_cancel+0x54/0xf0 [ 1626.609324] ? ep_read_events_proc+0xc0/0xc0 [ 1626.609326] ? _raw_write_unlock_irq+0xa/0x20 [ 1626.609327] ? ep_scan_ready_list.constprop.19+0x218/0x230 [ 1626.609329] ? __hrtimer_init+0xb0/0xb0 [ 1626.609331] ? _raw_spin_unlock_irq+0xa/0x20 [ 1626.609334] ? ep_poll+0x26c/0x4a0 [ 1626.609337] ? tcp_tsq_write.part.54+0xa0/0xa0 [ 1626.609339] ? release_sock+0x43/0x90 [ 1626.609341] ? _raw_spin_unlock_bh+0xa/0x20 [ 1626.609342] __sys_sendmsg+0x47/0x80 [ 1626.609347] do_syscall_64+0x5f/0x1c0 [ 1626.609349] ? prepare_exit_to_usermode+0x75/0xa0 [ 1626.609351] entry_SYSCALL_64_after_hwframe+0x44/0xa9 This patch adds a new prctl command that daemons can use after they have done their initial setup, and before they start to do allocations that are in the IO path. It sets the PF_MEMALLOC_NOIO and PF_LESS_THROTTLE flags so both userspace block and FS threads can use it to avoid the allocation recursion and try to prevent from being throttled while writing out data to free up memory. Signed-off-by: Mike Christie <[email protected]> Acked-by: Michal Hocko <[email protected]> Tested-by: Masato Suzuki <[email protected]> Reviewed-by: Damien Le Moal <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2020-01-28Merge branch 'core/kprobes' into perf/core, to pick up fixesIngo Molnar2-25/+45
Signed-off-by: Ingo Molnar <[email protected]>
2020-01-27Merge tag 'irq-core-2020-01-28' of ↵Linus Torvalds6-5/+87
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq updates from Thomas Gleixner: "The interrupt departement provides: - A mechanism to shield isolated tasks from managed interrupts: The affinity of managed interrupts is completely controlled by the kernel and user space has no influence on them. The reason is that the automatically assigned affinity correlates to the multi-queue CPU handling of block devices. If the generated affinity mask spaws both housekeeping and isolated CPUs the interrupt could be routed to an isolated CPU which would then be disturbed by I/O submitted by a housekeeping CPU. The new mechamism ensures that as long as one housekeeping CPU is online in the assigned affinity mask the interrupt is routed to a housekeeping CPU. If there is no online housekeeping CPU in the affinity mask, then the interrupt is routed to an isolated CPU to keep the device queue intact, but unless the isolated CPU submits I/O by itself these interrupts are not raised. - A small addon to the device tree irqdomain core code to avoid duplication in irq chip drivers - Conversion of the SiFive PLIC to hierarchical domains - The usual pile of new irq chip drivers: SiFive GPIO, Aspeed SCI, NXP INTMUX, Meson A1 GPIO - The first cut of support for the new ARM GICv4.1 - The usual pile of fixes and improvements in core and driver code" * tag 'irq-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (33 commits) genirq, sched/isolation: Isolate from handling managed interrupts irqchip/gic-v4.1: Allow direct invalidation of VLPIs irqchip/gic-v4.1: Suppress per-VLPI doorbell irqchip/gic-v4.1: Add VPE INVALL callback irqchip/gic-v4.1: Add VPE eviction callback irqchip/gic-v4.1: Add VPE residency callback irqchip/gic-v4.1: Add mask/unmask doorbell callbacks irqchip/gic-v4.1: Plumb skeletal VPE irqchip irqchip/gic-v4.1: Implement the v4.1 flavour of VMOVP irqchip/gic-v4.1: Don't use the VPE proxy if RVPEID is set irqchip/gic-v4.1: Implement the v4.1 flavour of VMAPP irqchip/gic-v4.1: VPE table (aka GICR_VPROPBASER) allocation irqchip/gic-v3: Add GICv4.1 VPEID size discovery irqchip/gic-v3: Detect GICv4.1 supporting RVPEID irqchip/gic-v3-its: Fix get_vlpi_map() breakage with doorbells irqdomain: Fix a memory leak in irq_domain_push_irq() irqchip: Add NXP INTMUX interrupt multiplexer support dt-bindings: interrupt-controller: Add binding for NXP INTMUX interrupt multiplexer irqchip: Define EXYNOS_IRQ_COMBINER irqchip/meson-gpio: Add support for meson a1 SoCs ...
2020-01-27Merge tag 'smp-core-2020-01-28' of ↵Linus Torvalds2-63/+48
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core SMP updates from Thomas Gleixner: "A small set of SMP core code changes: - Rework the smp function call core code to avoid the allocation of an additional cpumask - Remove the not longer required GFP argument from on_each_cpu_cond() and on_each_cpu_cond_mask() and fixup the callers" * tag 'smp-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: smp: Remove allocation mask from on_each_cpu_cond.*() smp: Add a smp_cond_func_t argument to smp_call_function_many() smp: Use smp_cond_func_t as type for the conditional function
2020-01-27Merge tag 'timers-core-2020-01-27' of ↵Linus Torvalds13-117/+703
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer updates from Thomas Gleixner: "The timekeeping and timers departement provides: - Time namespace support: If a container migrates from one host to another then it expects that clocks based on MONOTONIC and BOOTTIME are not subject to disruption. Due to different boot time and non-suspended runtime these clocks can differ significantly on two hosts, in the worst case time goes backwards which is a violation of the POSIX requirements. The time namespace addresses this problem. It allows to set offsets for clock MONOTONIC and BOOTTIME once after creation and before tasks are associated with the namespace. These offsets are taken into account by timers and timekeeping including the VDSO. Offsets for wall clock based clocks (REALTIME/TAI) are not provided by this mechanism. While in theory possible, the overhead and code complexity would be immense and not justified by the esoteric potential use cases which were discussed at Plumbers '18. The overhead for tasks in the root namespace (ie where host time offsets = 0) is in the noise and great effort was made to ensure that especially in the VDSO. If time namespace is disabled in the kernel configuration the code is compiled out. Kudos to Andrei Vagin and Dmitry Sofanov who implemented this feature and kept on for more than a year addressing review comments, finding better solutions. A pleasant experience. - Overhaul of the alarmtimer device dependency handling to ensure that the init/suspend/resume ordering is correct. - A new clocksource/event driver for Microchip PIT64 - Suspend/resume support for the Hyper-V clocksource - The usual pile of fixes, updates and improvements mostly in the driver code" * tag 'timers-core-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits) alarmtimer: Make alarmtimer_get_rtcdev() a stub when CONFIG_RTC_CLASS=n alarmtimer: Use wakeup source from alarmtimer platform device alarmtimer: Make alarmtimer platform device child of RTC device alarmtimer: Update alarmtimer_get_rtcdev() docs to reflect reality hrtimer: Add missing sparse annotation for __run_timer() lib/vdso: Only read hrtimer_res when needed in __cvdso_clock_getres() MIPS: vdso: Define BUILD_VDSO32 when building a 32bit kernel clocksource/drivers/hyper-v: Set TSC clocksource as default w/ InvariantTSC clocksource/drivers/hyper-v: Untangle stimers and timesync from clocksources clocksource/drivers/timer-microchip-pit64b: Fix sparse warning clocksource/drivers/exynos_mct: Rename Exynos to lowercase clocksource/drivers/timer-ti-dm: Fix uninitialized pointer access clocksource/drivers/timer-ti-dm: Switch to platform_get_irq clocksource/drivers/timer-ti-dm: Convert to devm_platform_ioremap_resource clocksource/drivers/em_sti: Fix variable declaration in em_sti_probe clocksource/drivers/em_sti: Convert to devm_platform_ioremap_resource clocksource/drivers/bcm2835_timer: Fix memory leak of timer clocksource/drivers/cadence-ttc: Use ttc driver as platform driver clocksource/drivers/timer-microchip-pit64b: Add Microchip PIT64B support clocksource/drivers/hyper-v: Reserve PAGE_SIZE space for tsc page ...
2020-01-27Merge tag 'core-core-2020-01-28' of ↵Linus Torvalds1-24/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull watchdog updates from Thomas Gleixner: "A set of watchdog/softlockup related improvements: - Enforce that the watchdog timestamp is always valid on boot. The original implementation caused a watchdog disabled gap of one second in the boot process due to truncation of the underlying sched clock. The sched clock is divided by 1e9 to convert nanoseconds to seconds. So for the first second of the boot process the result is 0 which is at the same time the indicator to disable the watchdog. The trivial fix is to change the disabled indicator to ULONG_MAX. - Two cleanup patches removing unused and redundant code which got forgotten to be cleaned up in previous changes" * tag 'core-core-2020-01-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: watchdog/softlockup: Enforce that timestamp is valid on boot watchdog/softlockup: Remove obsolete check of last reported task watchdog: Remove soft_lockup_hrtimer_cnt and related code
2020-01-27Merge tag 'timers-urgent-2020-01-27' of ↵Linus Torvalds1-20/+17
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Thomas Gleixner: "Two fixes for the generic VDSO code which missed 5.5: - Make the update to the coarse timekeeper unconditional. This is required because the coarse timekeeper interfaces in the VDSO do not depend on a VDSO capable clocksource. If the system does not have a VDSO capable clocksource and the update is depending on the VDSO capable clocksource, the coarse VDSO interfaces would operate on stale data forever. - Invert the logic of __arch_update_vdso_data() to avoid further head scratching. Tripped over this several times while analyzing the update problem above" * tag 'timers-urgent-2020-01-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lib/vdso: Update coarse timekeeper unconditionally lib/vdso: Make __arch_update_vdso_data() logic understandable
2020-01-27Merge tag 'audit-pr-20200127' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit update from Paul Moore: "One small audit patch for the Linux v5.6 merge window, and unsurprisingly it passes our test suite with flying colors" * tag 'audit-pr-20200127' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: Add __rcu annotation to RCU pointer
2020-01-27Merge branch 'for-5.6' of ↵Linus Torvalds2-6/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - cgroup2 interface for hugetlb controller. I think this was the last remaining bit which was missing from cgroup2 - fixes for race and a spurious warning in threaded cgroup handling - other minor changes * 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: iocost: Fix iocost_monitor.py due to helper type mismatch cgroup: Prevent double killing of css when enabling threaded cgroup cgroup: fix function name in comment mm: hugetlb controller for cgroups v2
2020-01-27Merge branch 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wqLinus Torvalds1-1/+1
Pull workqueue updates from Tejun Heo: "Just a couple tracepoint patches" * 'for-5.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: remove workqueue_work event class workqueue: add worker function to workqueue_execute_end tracepoint
2020-01-27Merge tag 'pm-5.6-rc1' of ↵Linus Torvalds6-28/+69
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management updates from Rafael Wysocki: "These add ACPI support to the intel_idle driver along with an admin guide document for it, add support for CPR (Core Power Reduction) to the AVS (Adaptive Voltage Scaling) subsystem, add new hardware support in a few places, add some new sysfs attributes, debugfs files and tracepoints, fix bugs and clean up a bunch of things all over. Specifics: - Update the ACPI processor driver in order to export acpi_processor_evaluate_cst() to the code outside of it, add ACPI support to the intel_idle driver based on that and clean up that driver somewhat (Rafael Wysocki). - Add an admin guide document for the intel_idle driver (Rafael Wysocki). - Clean up cpuidle core and drivers, enable compilation testing for some of them (Benjamin Gaignard, Krzysztof Kozlowski, Rafael Wysocki, Yangtao Li). - Fix reference counting of OPP (operating performance points) table structures (Viresh Kumar). - Add support for CPR (Core Power Reduction) to the AVS (Adaptive Voltage Scaling) subsystem (Niklas Cassel, Colin Ian King, YueHaibing). - Add support for TigerLake Mobile and JasperLake to the Intel RAPL power capping driver (Zhang Rui). - Update cpufreq drivers: - Add i.MX8MP support to imx-cpufreq-dt (Anson Huang). - Fix usage of a macro in loongson2_cpufreq (Alexandre Oliva). - Fix cpufreq policy reference counting issues in s3c and brcmstb-avs (chenqiwu). - Fix ACPI table reference counting issue and HiSilicon quirk handling in the CPPC driver (Hanjun Guo). - Clean up spelling mistake in intel_pstate (Harry Pan). - Convert the kirkwood and tegra186 drivers to using devm_platform_ioremap_resource() (Yangtao Li). - Update devfreq core: - Add 'name' sysfs attribute for devfreq devices (Chanwoo Choi). - Clean up the handing of transition statistics and allow them to be reset by writing 0 to the 'trans_stat' devfreq device attribute in sysfs (Kamil Konieczny). - Add 'devfreq_summary' to debugfs (Chanwoo Choi). - Clean up kerneldoc comments and Kconfig indentation (Krzysztof Kozlowski, Randy Dunlap). - Update devfreq drivers: - Add dynamic scaling for the imx8m DDR controller and clean up imx8m-ddrc (Leonard Crestez, YueHaibing). - Fix DT node reference counting and nitialization error code path in rk3399_dmc and add COMPILE_TEST and HAVE_ARM_SMCCC dependency for it (Chanwoo Choi, Yangtao Li). - Fix DT node reference counting in rockchip-dfi and make it use devm_platform_ioremap_resource() (Yangtao Li). - Fix excessive stack usage in exynos-ppmu (Arnd Bergmann). - Fix initialization error code paths in exynos-bus (Yangtao Li). - Clean up exynos-bus and exynos somewhat (Artur Świgoń, Krzysztof Kozlowski). - Add tracepoints for tracking usage_count updates unrelated to status changes in PM-runtime (Michał Mirosław). - Add sysfs attribute to control the "sync on suspend" behavior during system-wide suspend (Jonas Meurer). - Switch system-wide suspend tests over to 64-bit time (Alexandre Belloni). - Make wakeup sources statistics in debugfs cover deleted ones which used to be the case some time ago (zhuguangqing). - Clean up computations carried out during hibernation, update messages related to hibernation and fix a spelling mistake in one of them (Wen Yang, Luigi Semenzato, Colin Ian King). - Add mailmap entry for maintainer e-mail address that has not been functional for several years (Rafael Wysocki)" * tag 'pm-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (83 commits) cpufreq: loongson2_cpufreq: adjust cpufreq uses of LOONGSON_CHIPCFG intel_idle: Clean up irtl_2_usec() intel_idle: Move 3 functions closer to their callers intel_idle: Annotate initialization code and data structures intel_idle: Move and clean up intel_idle_cpuidle_devices_uninit() intel_idle: Rearrange intel_idle_cpuidle_driver_init() intel_idle: Clean up NULL pointer check in intel_idle_init() intel_idle: Fold intel_idle_probe() into intel_idle_init() intel_idle: Eliminate __setup_broadcast_timer() cpuidle: fix cpuidle_find_deepest_state() kerneldoc warnings cpuidle: sysfs: fix warnings when compiling with W=1 cpuidle: coupled: fix warnings when compiling with W=1 cpufreq: brcmstb-avs: fix imbalance of cpufreq policy refcount PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior PM / devfreq: Add debugfs support with devfreq_summary file Documentation: admin-guide: PM: Add intel_idle document cpuidle: arm: Enable compile testing for some of drivers PM-runtime: add tracepoints for usage_count changes cpufreq: intel_pstate: fix spelling mistake: "Whethet" -> "Whether" PM: hibernate: fix spelling mistake "shapshot" -> "snapshot" ...
2020-01-27Merge tag 'arm64-upstream' of ↵Linus Torvalds4-1/+17
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "The changes are a real mixed bag this time around. The only scary looking one from the diffstat is the uapi change to asm-generic/mman-common.h, but this has been acked by Arnd and is actually just adding a pair of comments in an attempt to prevent allocation of some PROT values which tend to get used for arch-specific purposes. We'll be using them for Branch Target Identification (a CFI-like hardening feature), which is currently under review on the mailing list. New architecture features: - Support for Armv8.5 E0PD, which benefits KASLR in the same way as KPTI but without the overhead. This allows KPTI to be disabled on CPUs that are not affected by Meltdown, even is KASLR is enabled. - Initial support for the Armv8.5 RNG instructions, which claim to provide access to a high bandwidth, cryptographically secure hardware random number generator. As well as exposing these to userspace, we also use them as part of the KASLR seed and to seed the crng once all CPUs have come online. - Advertise a bunch of new instructions to userspace, including support for Data Gathering Hint, Matrix Multiply and 16-bit floating point. Kexec: - Cleanups in preparation for relocating with the MMU enabled - Support for loading crash dump kernels with kexec_file_load() Perf and PMU drivers: - Cleanups and non-critical fixes for a couple of system PMU drivers FPU-less (aka broken) CPU support: - Considerable fixes to support CPUs without the FP/SIMD extensions, including their presence in heterogeneous systems. Good luck finding a 64-bit userspace that handles this. Modern assembly function annotations: - Start migrating our use of ENTRY() and ENDPROC() over to the new-fangled SYM_{CODE,FUNC}_{START,END} macros, which are intended to aid debuggers Kbuild: - Cleanup detection of LSE support in the assembler by introducing 'as-instr' - Remove compressed Image files when building clean targets IP checksumming: - Implement optimised IPv4 checksumming routine when hardware offload is not in use. An IPv6 version is in the works, pending testing. Hardware errata: - Work around Cortex-A55 erratum #1530923 Shadow call stack: - Work around some issues with Clang's integrated assembler not liking our perfectly reasonable assembly code - Avoid allocating the X18 register, so that it can be used to hold the shadow call stack pointer in future ACPI: - Fix ID count checking in IORT code. This may regress broken firmware that happened to work with the old implementation, in which case we'll have to revert it and try something else - Fix DAIF corruption on return from GHES handler with pseudo-NMIs Miscellaneous: - Whitelist some CPUs that are unaffected by Spectre-v2 - Reduce frequency of ASID rollover when KPTI is compiled in but inactive - Reserve a couple of arch-specific PROT flags that are already used by Sparc and PowerPC and are planned for later use with BTI on arm64 - Preparatory cleanup of our entry assembly code in preparation for moving more of it into C later on - Refactoring and cleanup" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (73 commits) arm64: acpi: fix DAIF manipulation with pNMI arm64: kconfig: Fix alignment of E0PD help text arm64: Use v8.5-RNG entropy for KASLR seed arm64: Implement archrandom.h for ARMv8.5-RNG arm64: kbuild: remove compressed images on 'make ARCH=arm64 (dist)clean' arm64: entry: Avoid empty alternatives entries arm64: Kconfig: select HAVE_FUTEX_CMPXCHG arm64: csum: Fix pathological zero-length calls arm64: entry: cleanup sp_el0 manipulation arm64: entry: cleanup el0 svc handler naming arm64: entry: mark all entry code as notrace arm64: assembler: remove smp_dmb macro arm64: assembler: remove inherit_daif macro ACPI/IORT: Fix 'Number of IDs' handling in iort_id_map() mm: Reserve asm-generic prot flags 0x10 and 0x20 for arch use arm64: Use macros instead of hard-coded constants for MAIR_EL1 arm64: Add KRYO{3,4}XX CPU cores to spectre-v2 safe list arm64: kernel: avoid x18 in __cpu_soft_restart arm64: kvm: stop treating register x18 as caller save arm64/lib: copy_page: avoid x18 register in assembler code ...
2020-01-27tracing/kprobes: Have uname use __get_str() in print_fmtSteven Rostedt (VMware)1-2/+4
Thomas Richter reported: > Test case 66 'Use vfs_getname probe to get syscall args filenames' > is broken on s390, but works on x86. The test case fails with: > > [root@m35lp76 perf]# perf test -F 66 > 66: Use vfs_getname probe to get syscall args filenames > :Recording open file: > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.004 MB /tmp/__perf_test.perf.data.TCdYj\ > (20 samples) ] > Looking at perf.data file for vfs_getname records for the file we touched: > FAILED! > [root@m35lp76 perf]# The root cause was the print_fmt of the kprobe event that referenced the "ustring" > Setting up the kprobe event using perf command: > > # ./perf probe "vfs_getname=getname_flags:72 pathname=filename:ustring" > > generates this format file: > [root@m35lp76 perf]# cat /sys/kernel/debug/tracing/events/probe/\ > vfs_getname/format > name: vfs_getname > ID: 1172 > format: > field:unsigned short common_type; offset:0; size:2; signed:0; > field:unsigned char common_flags; offset:2; size:1; signed:0; > field:unsigned char common_preempt_count; offset:3; size:1; signed:0; > field:int common_pid; offset:4; size:4; signed:1; > > field:unsigned long __probe_ip; offset:8; size:8; signed:0; > field:__data_loc char[] pathname; offset:16; size:4; signed:1; > > print fmt: "(%lx) pathname=\"%s\"", REC->__probe_ip, REC->pathname Instead of using "__get_str(pathname)" it referenced it directly. Link: http://lkml.kernel.org/r/[email protected] Cc: [email protected] Fixes: 88903c464321 ("tracing/probe: Add ustring type for user-space string") Acked-by: Masami Hiramatsu <[email protected]> Reported-by: Thomas Richter <[email protected]> Tested-by: Thomas Richter <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-01-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-nextDavid S. Miller6-27/+112
Daniel Borkmann says: ==================== pull-request: bpf-next 2020-01-27 The following pull-request contains BPF updates for your *net-next* tree. We've added 20 non-merge commits during the last 5 day(s) which contain a total of 24 files changed, 433 insertions(+), 104 deletions(-). The main changes are: 1) Make BPF trampolines and dispatcher aware for the stack unwinder, from Jiri Olsa. 2) Improve handling of failed CO-RE relocations in libbpf, from Andrii Nakryiko. 3) Several fixes to BPF sockmap and reuseport selftests, from Lorenz Bauer. 4) Various cleanups in BPF devmap's XDP flush code, from John Fastabend. 5) Fix BPF flow dissector when used with port ranges, from Yoshiki Komachi. 6) Fix bpffs' map_seq_next callback to always inc position index, from Vasily Averin. 7) Allow overriding LLVM tooling for runqslower utility, from Andrey Ignatov. 8) Silence false-positive lockdep splats in devmap hash lookup, from Amol Grover. 9) Fix fentry/fexit selftests to initialize a variable before use, from John Sperbeck. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-01-27Merge branches 'pm-cpufreq' and 'pm-sleep'Rafael J. Wysocki6-28/+69
* pm-cpufreq: cpufreq: loongson2_cpufreq: adjust cpufreq uses of LOONGSON_CHIPCFG cpufreq: brcmstb-avs: fix imbalance of cpufreq policy refcount cpufreq: intel_pstate: fix spelling mistake: "Whethet" -> "Whether" cpufreq: s3c: fix unbalances of cpufreq policy refcount cpufreq: imx-cpufreq-dt: Add i.MX8MP support cpufreq: Use imx-cpufreq-dt for i.MX8MP's speed grading cpufreq: tegra186: convert to devm_platform_ioremap_resource cpufreq: kirkwood: convert to devm_platform_ioremap_resource cpufreq: CPPC: put ACPI table after using it cpufreq : CPPC: Break out if HiSilicon CPPC workaround is matched * pm-sleep: PM: suspend: Add sysfs attribute to control the "sync on suspend" behavior PM: hibernate: fix spelling mistake "shapshot" -> "snapshot" PM: hibernate: Add more logging on hibernation failure PM: hibernate: improve arithmetic division in preallocate_highmem_fraction() PM: wakeup: Show statistics for deleted wakeup sources again PM: sleep: Switch to rtc_time64_to_tm()/rtc_tm_to_time64()
2020-01-27bpf, xdp: Remove no longer required rcu_read_{un}lock()John Fastabend1-2/+3
Now that we depend on rcu_call() and synchronize_rcu() to also wait for preempt_disabled region to complete the rcu read critical section in __dev_map_flush() is no longer required. Except in a few special cases in drivers that need it for other reasons. These originally ensured the map reference was safe while a map was also being free'd. And additionally that bpf program updates via ndo_bpf did not happen while flush updates were in flight. But flush by new rules can only be called from preempt-disabled NAPI context. The synchronize_rcu from the map free path and the rcu_call from the delete path will ensure the reference there is safe. So lets remove the rcu_read_lock and rcu_read_unlock pair to avoid any confusion around how this is being protected. If the rcu_read_lock was required it would mean errors in the above logic and the original patch would also be wrong. Now that we have done above we put the rcu_read_lock in the driver code where it is needed in a driver dependent way. I think this helps readability of the code so we know where and why we are taking read locks. Most drivers will not need rcu_read_locks here and further XDP drivers already have rcu_read_locks in their code paths for reading xdp programs on RX side so this makes it symmetric where we don't have half of rcu critical sections define in driver and the other half in devmap. Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-01-27bpf, xdp: Update devmap comments to reflect napi/rcu usageJohn Fastabend1-10/+11
Now that we rely on synchronize_rcu and call_rcu waiting to exit perempt-disable regions (NAPI) lets update the comments to reflect this. Fixes: 0536b85239b84 ("xdp: Simplify devmap cleanup") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Björn Töpel <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-01-27bpf: map_seq_next should always increase position indexVasily Averin1-2/+1
If seq_file .next fuction does not change position index, read after some lseek can generate an unexpected output. See also: https://bugzilla.kernel.org/show_bug.cgi?id=206283 v1 -> v2: removed missed increment in end of function Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-01-26sched.h: Annotate sighand_struct with __rcuMadhuparna Bhowmik1-1/+1
This patch fixes the following sparse errors by annotating the sighand_struct with __rcu kernel/fork.c:1511:9: error: incompatible types in comparison expression kernel/exit.c:100:19: error: incompatible types in comparison expression kernel/signal.c:1370:27: error: incompatible types in comparison expression This fix introduces the following sparse error in signal.c due to checking the sighand pointer without rcu primitives: kernel/signal.c:1386:21: error: incompatible types in comparison expression This new sparse error is also fixed in this patch. Signed-off-by: Madhuparna Bhowmik <[email protected]> Acked-by: Paul E. McKenney <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2020-01-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netDavid S. Miller8-80/+168
Minor conflict in mlx5 because changes happened to code that has moved meanwhile. Signed-off-by: David S. Miller <[email protected]>
2020-01-25rcu: Forgive slow expedited grace periods at boot timePaul E. McKenney1-1/+0
Boot-time processing often loops in the kernel longer than one might prefer, which can prevent expedited grace periods from completing in a timely manner. This in turn triggers a splat In nohz_full CPUs One could argue that long-looping code should be fixed, but on the other hand, boot time is a bit special. This commit therefore removes the splat. Later commits will add the splat back in, but in a way that removes false positives. Reported-by: Borislav Petkov <[email protected]> Signed-off-by: Paul E. McKenney <[email protected]>
2020-01-25tracing: Use pr_err() instead of WARN() for memory failuresSteven Rostedt (VMware)3-11/+23
As warnings can trigger panics, especially when "panic_on_warn" is set, memory failure warnings can cause panics and fail fuzz testers that are stressing memory. Create a MEM_FAIL() macro to use instead of WARN() in the tracing code (perhaps this should be a kernel wide macro?), and use that for memory failure issues. This should stop failing fuzz tests due to warnings. Link: https://lore.kernel.org/r/CACT4Y+ZP-7np20GVRu3p+eZys9GPtbu+JpfV+HtsufAzvTgJrg@mail.gmail.com Suggested-by: Dmitry Vyukov <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-01-25bpf: Allow to resolve bpf trampoline and dispatcher in unwindJiri Olsa3-12/+79
When unwinding the stack we need to identify each address to successfully continue. Adding latch tree to keep trampolines for quick lookup during the unwind. The patch uses first 48 bytes for latch tree node, leaving 4048 bytes from the rest of the page for trampoline or dispatcher generated code. It's still enough not to affect trampoline and dispatcher progs maximum counts. Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-01-25bpf: Allow BTF ctx access for string pointersJiri Olsa1-0/+16
When accessing the context we allow access to arguments with scalar type and pointer to struct. But we deny access for pointer to scalar type, which is the case for many functions. Alexei suggested to take conservative approach and allow currently only string pointer access, which is the case for most functions now: Adding check if the pointer is to string type and allow access to it. Suggested-by: Alexei Starovoitov <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-01-25Merge branch 'for-mingo' of ↵Ingo Molnar17-385/+778
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu Pull RCU updates from Paul E. McKenney: - Expedited grace-period updates - kfree_rcu() updates - RCU list updates - Preemptible RCU updates - Torture-test updates - Miscellaneous fixes - Documentation updates Signed-off-by: Ingo Molnar <[email protected]>
2020-01-24tracing: Decrement trace_array when bootconfig creates an instanceSteven Rostedt (VMware)2-0/+5
The trace_array_get_by_name() creates a ftrace instance and trace_array_put() is used to remove the reference. Even though the trace_array_get_by_name() creates the instance, it also adds a reference count to it, that prevents user space from removing it. As the bootconfig just creates the instance on boot up, it should still be used where it can be deleted by user space after boot. A trace_array_put() is required to let that happen. Also, change the documentation on trace_array_get_by_name() to make this not be so confusing. Link: https://lore.kernel.org/r/[email protected] Fixes: 4f712a4d04a4e ("tracing/boot: Add instance node support") Acked-by: Masami Hiramatsu <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-01-24tracing: Remove unneeded NULL checkDan Carpenter1-1/+1
We checked "iter->trace" earlier so there is no need to check here. Link: http://lkml.kernel.org/r/20141122183012.GB6994@mwanda Signed-off-by: Dan Carpenter <[email protected]> [ Pulled from the archeological digging of my INBOX ] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-01-24tracing: Set kernel_stack's caller size properlyJosef Bacik1-1/+1
I noticed when trying to use the trace-cmd python interface that reading the raw buffer wasn't working for kernel_stack events. This is because it uses a stubbed version of __dynamic_array that doesn't do the __data_loc trick and encode the length of the array into the field. Instead it just shows up as a size of 0. So change this to __array and set the len to FTRACE_STACK_ENTRIES since this is what we actually do in practice and matches how user_stack_trace works. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Josef Bacik <[email protected]> [ Pulled from the archeological digging of my INBOX ] Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-01-24tracing: Fix tracing_stat return values in error handling pathsLuis Henriques1-4/+8
tracing_stat_init() was always returning '0', even on the error paths. It now returns -ENODEV if tracing_init_dentry() fails or -ENOMEM if it fails to created the 'trace_stat' debugfs directory. Link: http://lkml.kernel.org/r/[email protected] Fixes: ed6f1c996bfe4 ("tracing: Check return value of tracing_init_dentry()") Signed-off-by: Luis Henriques <[email protected]> [ Pulled from the archeological digging of my INBOX ] Signed-off-by: Steven Rostedt (VMware) <[email protected]>