aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-01-10perf report: Add --tasks option to display monitored tasksJiri Olsa2-2/+138
Add --tasks option to display monitored tasks stored in perf.data. Displaying pid/tid/ppid plus the command string aligned to distinguish parent and child tasks. $ perf record -a ... $ perf report --tasks # pid tid ppid comm 0 0 -1 |swapper 2 2 0 | kthreadd 14080 14080 2 | kworker/u17:1 4 4 2 | kworker/0:0H 6 6 2 | mm_percpu_wq ... 1 1 0 | systemd 23242 23242 1 | firefox 23242 23298 23242 | Cache2 I/O 23242 23304 23242 | GMPThread ... 1195 1195 1 | login 1611 1611 1195 | bash 1639 1639 1611 | startx 1663 1663 1639 | xinit 1673 1673 1663 | xmonad-x86_64-l 23939 23939 1673 | xterm 23941 23941 23939 | bash 23963 23963 23941 | mutt 24954 24954 23963 | offlineimap Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Make it --tasks, plural, --task works as well, as its unambiguous ] [ Use machine__find_thread(), not findnew(), as pointed out by Namhyung ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-10perf trace: Beautify 'gettid' syscall resultArnaldo Carvalho de Melo1-0/+1
Before: # trace -a -e gettid sleep 0.01 <SNIP> 4.863 ( 0.005 ms): Chrome_ChildIO/26241 gettid() = 26241 4.931 ( 0.004 ms): Chrome_IOThrea/26154 gettid() = 26154 4.942 ( 0.001 ms): Chrome_IOThrea/26154 gettid() = 26154 4.946 ( 0.001 ms): Chrome_IOThrea/26154 gettid() = 26154 4.970 ( 0.002 ms): Chrome_IOThrea/26154 gettid() = 26154 # After: # trace -a -e gettid sleep 0.01 0.000 ( 0.009 ms): Chrome_IOThrea/26154 gettid() = 26154 (Chrome_IOThread) <SNIP> 3.416 ( 0.002 ms): Chrome_ChildIO/26241 gettid() = 26241 (Chrome_ChildIOT) 3.424 ( 0.001 ms): Chrome_ChildIO/26241 gettid() = 26241 (Chrome_ChildIOT) 3.343 ( 0.002 ms): chrome/26116 gettid() = 26116 (chrome) 3.386 ( 0.002 ms): Chrome_IOThrea/26154 gettid() = 26154 (Chrome_IOThread) 4.003 ( 0.003 ms): Chrome_ChildIO/26241 gettid() = 26241 (Chrome_ChildIOT) 4.031 ( 0.002 ms): Chrome_IOThrea/26154 gettid() = 26154 (Chrome_IOThread) # Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Wang Nan <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-10perf report: Add --stats option to display quick data statisticsJiri Olsa2-1/+29
Add --stats option to display quick data statistics of event numbers, without any further processing, like the one at the end of the perf report -D command. $ perf report --stat Aggregated stats: TOTAL events: 4566 MMAP events: 113 LOST events: 19 COMM events: 3 FORK events: 400 SAMPLE events: 3315 MMAP2 events: 32 FINISHED_ROUND events: 681 THREAD_MAP events: 1 CPU_MAP events: 1 TIME_CONV events: 1 I found this useful when hunting lost events for another change. Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Rename it to --stats, plural ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-10perf tools: Make the tool's warning messages optionalJiri Olsa2-2/+5
I want to display the pure events status coming in the next patch and the tool's warnings are superfluous in the output. Making it optional, enabled by default. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-10perf script: Add support to display lost eventsJiri Olsa3-0/+39
Adding option to display lost events: $ perf script --show-lost-events ... mplayer 13810 [002] 468011.402396: 100 cycles:ppp: ff.. mplayer 13810 [002] 468011.402396: PERF_RECORD_LOST lost 3880 mplayer 13810 [002] 468011.402397: 100 cycles:ppp: ff.. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Use PRIu64 when printing u64 values, fixing the build in some arches ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf script: Add support to display sample misc fieldJiri Olsa4-12/+84
Adding support to display sample misc field in form of letter for each bit: # perf script -F +misc ... sched-messaging 1414 K 28690.636582: 4590 cycles ... sched-messaging 1407 U 28690.636600: 325620 cycles ... sched-messaging 1414 K 28690.636608: 19473 cycles ... misc field __________/ The misc bits are assigned to following letters: PERF_RECORD_MISC_KERNEL K PERF_RECORD_MISC_USER U PERF_RECORD_MISC_HYPERVISOR H PERF_RECORD_MISC_GUEST_KERNEL G PERF_RECORD_MISC_GUEST_USER g PERF_RECORD_MISC_MMAP_DATA* M PERF_RECORD_MISC_COMM_EXEC E PERF_RECORD_MISC_SWITCH_OUT S Signed-off-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf: Update PERF_RECORD_MISC_* comment for perf_event_header::misc bit 13Jiri Olsa2-6/+12
The perf_event_header::misc bit 13 is shared on different events and next patch is adding yet another bit 13 user. Updating the comment to make it more structured and clear which events use bit 13. Suggested-by: Peter Zijlstra <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Update the tools/include/uapi/linux copy ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf: Return empty callchain instead of NULLJiri Olsa1-18/+12
It simplifies the code a bit, because we dump the callchain Link: http://lkml.kernel.org/n/[email protected] even if it's empty. With 'empty' callchain we can remove all the NULL-checking code paths. Original-patch-from: Peter Zijlstra <[email protected]> Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf: Make perf_callchain function staticJiri Olsa3-19/+16
And move it to core.c, because there's no caller of this function other than the one in core.c Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf: Add sample_id to PERF_RECORD_ITRACE_START event commentJiri Olsa2-0/+2
Adding missing sample_id line into PERF_RECORD_ITRACE_START event comment. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Update the tools/include/uapi/linux copy ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf: Allocate context task_ctx_data for child eventJiri Olsa1-0/+14
Currently we use perf_event_context::task_ctx_data to save and restore the LBR status when the task is scheduled out and in. We don't allocate it for child contexts, which results in shorter task's LBR stack, because we don't save the history from previous run and start over every time we schedule the task in. I made a test to generate samples with LBR call stack and got higher numbers on bigger chain depths: before: after: LBR call chain: nr: 1 60561 498127 LBR call chain: nr: 2 0 0 LBR call chain: nr: 3 107030 2172 LBR call chain: nr: 4 466685 62758 LBR call chain: nr: 5 2307319 878046 LBR call chain: nr: 6 48713 495218 LBR call chain: nr: 7 1040 4551 LBR call chain: nr: 8 481 172 LBR call chain: nr: 9 878 120 LBR call chain: nr: 10 2377 6698 LBR call chain: nr: 11 28830 151487 LBR call chain: nr: 12 29347 339867 LBR call chain: nr: 13 4 22 LBR call chain: nr: 14 3 53 Signed-off-by: Jiri Olsa <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Fixes: 4af57ef28c2c ("perf: Add pmu specific data for perf task context") Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf tools: Display perf_event_attr::namespaces debug infoJiri Olsa1-0/+1
Display namespaces bit in -vv debug display: $ perf record -vv --namespaces ... ... perf_event_attr: size 112 ... namespaces 1 Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf tools: Enable LIBBABELTRACE by defaultJiri Olsa2-2/+2
There's no reason anymore to treat babel trace in a special way, because a) we no longer display its state b) the needed babeltrace library is now out and well adopted among distros. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: David Ahern <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf script: Support time percent and multiple time rangesJin Yao2-6/+44
perf script has a --time option to limit the time range of output. It only supports absolute time. Now this option is extended to support multiple time ranges and support the percent of time. For example: 1. Select the first and second 10% time slices: perf script --time 10%/1,10%/2 2. Select from 0% to 10% and 30% to 40% slices: perf script --time 0%-10%,30%-40% Changelog: v6: Fix the merge issue with latest perf/core branch. No functional changes. v5: Add checking of first/last sample time to detect if it's recorded in perf.data. If it's not recorded, returns error message to user. v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample v3: Since the definitions of first_sample_time/last_sample_time are moved from perf_session to perf_evlist so change the related code. Signed-off-by: Jin Yao <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf report: Support time percent and multiple time rangesJin Yao2-5/+46
perf report has a --time option to limit the time range of output. It only supports absolute time. Now this option is extended to support multiple time ranges and support the percent of time. For example: 1. Select the first and second 10% time slices: perf report --time 10%/1,10%/2 2. Select from 0% to 10% and 30% to 40% slices: perf report --time 0%-10%,30%-40% Changelog: v6: Fix the merge issue with latest perf/core branch. No functional changes. v5: Add checking of first/last sample time to detect if it's recorded in perf.data. If it's not recorded, returns error message to user. v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample v3: Since the definitions of first_sample_time/last_sample_time are moved from perf_session to perf_evlist so change the related code. Signed-off-by: Jin Yao <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] [ Add missing colons at end of examples in the man page ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf tools: Create function to perform multiple time range checkingJin Yao2-0/+31
Previous patch supports the multiple time range. For example, select the first and second 10% time slices. perf report --time 10%/1,10%/2 We need a function to check if a timestamp is in the ranges of [0, 10%) and [10%, 20%]. Note that it includes the last element in [10%, 20%] but it doesn't include the last element in [0, 10%). It's to avoid the overlap. This patch implments a new function perf_time__ranges_skip_sample for this checking. Change log: v4: Let perf_time__ranges_skip_sample be compatible with perf_time__skip_sample when only one time range. Signed-off-by: Jin Yao <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf tools: Create function to parse time percentJin Yao2-12/+196
Current perf report/script/... have a --time option to limit the time range of output. But right now it only supports absolute time, add support for time percentage. For example: 1. Select the second 10% time slice perf report --time 10%/2 2. Select from 0% to 10% time slice perf report --time 0%-10% It also support the multiple time ranges. 3. Select the first and second 10% time slices perf report --time 10%/1,10%/2 4. Select from 0% to 10% and 30% to 40% slices perf report --time 0%-10%,30%-40% Changelog: v4: An issue is found. Following passes. perf script --time 10%/10x12321xsdfdasfdsafdsafdsa Now it uses strtol to replace atoi. Committer notes: This just puts in place the infrastructure, so the examples in this cset comment will only work later, after more patches in this series are applied. Signed-off-by: Jin Yao <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf record: Record the first and last sample time in the headerJin Yao2-3/+18
In the default 'perf record' configuration, all samples are processed, to create the HEADER_BUILD_ID table. So it's very easy to get the first/last samples and save the time to perf file header via the function write_sample_time(). Later, at post processing time, perf report/script will fetch the time from perf file header. Committer testing: # perf record -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.099 MB perf.data (1101 samples) ] [root@jouet home]# perf report --header | grep "time of " # time of first sample : 22947.909226 # time of last sample : 22948.910704 # # perf report -D | grep PERF_RECORD_SAMPLE\( 0 22947909226101 0x20bb68 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa21b1af3 period: 1 addr: 0 0 22947909229928 0x20bb98 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa200d204 period: 1 addr: 0 <SNIP> 3 22948910397351 0x219360 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 28251/28251: 0xffffffffa22071d8 period: 169518 addr: 0 0 22948910652380 0x20f120 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 198807 addr: 0 2 22948910704034 0x2172d0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xffffffffa2856816 period: 88111 addr: 0 # Changelog: v7: Just update the patch description according to Arnaldo's suggestion. v6: Currently '--buildid-all' is not enabled at default. So the walking on all samples is the default operation. There is no big overhead to calculate the timestamp boundary in process_sample_event handler once we already go through all samples. So the timestamp boundary calculation is enabled by default when '--buildid-all' is not enabled. While if '--buildid-all' is enabled, we creates a new option "--timestamp-boundary" for user to decide if it enables the timestamp boundary calculation. v5: There is an issue that the sample walking can only work when '--buildid-all' is not enabled. So we need to let the walking be able to work even if '--buildid-all' is enabled and let the processing skips the dso hit marking for this case. At first, I want to provide a new option "--record-time-boundaries". While after consideration, I think a new option is not very necessary. v3: Remove the definitions of first_sample_time and last_sample_time from struct record and directly save them in perf_evlist. Signed-off-by: Jin Yao <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf header: Add infrastructure to record first and last sample timeJin Yao4-0/+67
perf report/script/... have a --time option to limit the time range of output. That's very useful to slice large traces, e.g. when processing the output of perf script for some analysis. But right now --time only supports absolute time. Also there is no fast way to get the start/end times of a given trace except for looking at it. This makes it hard to e.g. only decode the first half of the trace, which is useful for parallelization of scripts Another problem is that perf records are variable size and there is no synchronization mechanism. So the only way to find the last sample reliably would be to walk all samples. But we want to avoid that in perf report/... because it is already quite expensive. That is why storing the first sample time and last sample time in perf record is better. This patch creates a new header feature type HEADER_SAMPLE_TIME and related ops. Save the first sample time and the last sample time to the feature section in perf file header. That will be done when, for instance, processing build-ids, where we already have to process all samples to create the build-id table, take advantage of that to further amortize that processing by storing HEADER_SAMPLE_TIME to make 'perf report/script' faster when using --time. Committer testing: After this patch is applied the header is written with zeroes, we need the next patch, for "perf record" to actually write the timestamps: # perf report -D | grep PERF_RECORD_SAMPLE\( 22501155244406 0x44f0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 25016/25016: 0xffffffffa21be8c5 period: 1 addr: 0 <SNIP> 22501155793625 0x4a30 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 25016/25016: 0xffffffffa21ffd50 period: 2828043 addr: 0 # perf report --header | grep "time of " # time of first sample : 0.000000 # time of last sample : 0.000000 # Changelog: v7: 1. Rebase to latest perf/core branch. 2. Add following clarification in patch description according to Arnaldo's suggestion. "That will be done when, for instance, processing build-ids, where we already have to process all samples to create the build-id table, take advantage of that to further amortize that processing by storing HEADER_SAMPLE_TIME to make 'perf report/script' faster when using --time." v4: Use perf script time style for timestamp printing. Also add with the printing of sample duration. v3: Remove the definitions of first_sample_time/last_sample_time from perf_session. Just define them in perf_evlist Signed-off-by: Jin Yao <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf report: Fix a no annotate browser displayed issueJin Yao1-1/+17
When enabling '-b' option in perf record, for example, perf record -b ... perf report and then browsing the annotate browser from perf report (press 'A'), it would fail (annotate browser can't be displayed). It's because the '.add_entry_cb' op of struct report is overwritten by hist_iter__branch_callback() in builtin-report.c. But this function doesn't do something like mapping symbols and sources. So next, do_annotate() will return directly. notes = symbol__annotation(act->ms.sym); if (!notes->src) return 0; This patch adds the lost code to hist_iter__branch_callback (refer to hist_iter__report_callback). v2: Fix a crash bug when perform 'perf report --stdio'. The reason is that we init the symbol annotation only in browser mode, it doesn't allocate/init resources for stdio mode. So now in hist_iter__branch_callback(), it will return directly if it's not in browser mode. Signed-off-by: Jin Yao <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf report: Fix a wrong offset issue when using /proc/kcoreJin Yao6-15/+22
When a valid vmlinux is not found, 'perf report' falls back to look at /proc/kcore. In this case, it will report the impossible large offset. For example: # perf record -b -e cycles:k find /etc/ > /dev/null # perf report --stdio --branch-history 22.77% _vm_normal_page+18446603336221188162 | ---page_remove_rmap +18446603336221188324 page_remove_rmap +18446603336221188487 (cycles:5) unlock_page_memcg +18446603336221188096 page_remove_rmap +18446603336221188327 (cycles:1) The issue is the value which is passed to parameter 'addr' in __get_srcline() is the objdump address. It's not correct if we calculate the offset by using 'addr - sym->start'. This patch creates a new parameter 'ip' in __get_srcline(). It is not converted to objdump address. With this patch, the perf report output is: 22.77% _vm_normal_page+66 | ---page_remove_rmap +228 page_remove_rmap +391 (cycles:5) unlock_page_memcg +0 page_remove_rmap +231 (cycles:1) page_remove_rmap +236 Committer testing: Make sure you get any valid vmlinux out of the way, using '-v' on the 'perf report' case and deleting it from places where perf searches them, like your kernel build dir and the build-id cache, in ~/.debug/. Reported-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Jin Yao <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf tools: Fix compile error with libunwind x86Wang Nan1-1/+1
Fix a compile error: ... CC util/libunwind/x86_32.o In file included from util/libunwind/x86_32.c:33:0: util/libunwind/../../arch/x86/util/unwind-libunwind.c: In function 'libunwind__x86_reg_id': util/libunwind/../../arch/x86/util/unwind-libunwind.c:110:11: error: 'EINVAL' undeclared (first use in this function) return -EINVAL; ^ util/libunwind/../../arch/x86/util/unwind-libunwind.c:110:11: note: each undeclared identifier is reported only once for each function it appears in mv: cannot stat 'util/libunwind/.x86_32.o.tmp': No such file or directory make[4]: *** [util/libunwind/x86_32.o] Error 1 make[3]: *** [util] Error 2 make[2]: *** [libperf-in.o] Error 2 make[1]: *** [sub-make] Error 2 make: *** [all] Error 2 It happens when libunwind-x86 feature is detected. Signed-off-by: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf test bpf: Hook on epoll_pwait()Arnaldo Carvalho de Melo2-6/+6
The 'perf test bpf' was hooking a eBPF program on the SyS_epoll_wait() kernel function, that was what the epoll_wait() glibc function ended up calling, but since at least glibc 2.26, the one that comes with, for instance, Fedora 27, glibc ends up calling SyS_epoll_pwait() when epoll_wait() is used, causing this 'perf test' entry to fail. So switch to using epoll_pwait() and hook the eBPF program to the SyS_epoll_pwait() kernel function to make it work on a wider range of glibc and kernel versions. Tested-by: Wang Nan <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf test bpf: Use designated struct field initializersArnaldo Carvalho de Melo1-32/+27
To follow standard practice in the kernel sources, documenting the initialization better and helping quickly finding the value for some field in a struct with many entries. Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Wang Nan <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-08perf test bpf: Improve message about expected samplesArnaldo Carvalho de Melo1-1/+1
When failing on one of the BPF tests we were just stating: BPF filter result incorrect Add some more info to help figuring out the problem: BPF filter result incorrect, expected 56, got 0 samples This came out while investigating this failure, first seen after updating the kernel to the 4.15.0-rc6 tag: [root@jouet ~]# perf test bpf 39: BPF filter : 39.1: Basic BPF filtering : FAILED! 39.2: BPF pinning : Skip 39.3: BPF prologue generation: Skip 39.4: BPF relocation checker : Skip [root@jouet ~]# Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Wang Nan <[email protected]> Link: https://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2018-01-06perf/x86/msr: Clean up the codeIngo Molnar1-32/+29
Recent changes made a bit of an inconsistent mess out of arch/x86/events/msr.c, fix it: - re-align the initialization tables to be vertically aligned and readable again - harmonize comment style in terms of punctuation, capitalization and spelling - use curly braces for multi-condition branches - remove extra newlines - simplify the code a bit Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-06perf/x86/msr: Add support for MSR_IA32_THERM_STATUSStephane Eranian1-4/+23
This patch adds support for the Digital Readout provided by the IA32_THERM_STATUS MSR (0x19C) on Intel X86 processors. The readout shows the number of degrees Celcius to the TCC (critical temperature) supported by the processor. Thus, the larger, the better. The perf_event support is provided via the msr PMU. The new logical event is called cpu_thermal_margin. It comes with a unit and snapshot files. The event shows the current temprature distance (margin). It is not an accumulating event. The unit is degrees C. The event is provided per logical CPU to make things simpler but it is the same for both hyper-threads sharing a physical core. $ perf stat -I 1000 -a -A -e msr/cpu_thermal_margin/ This will print the temperature for all logical CPUs. time CPU counts unit events 1.000123741 CPU0 38 C msr/cpu_thermal_margin/ 1.000161837 CPU1 37 C msr/cpu_thermal_margin/ 1.000187906 CPU2 36 C msr/cpu_thermal_margin/ 1.000189046 CPU3 39 C msr/cpu_thermal_margin/ 1.000283044 CPU4 40 C msr/cpu_thermal_margin/ 1.000344297 CPU5 40 C msr/cpu_thermal_margin/ 1.000365832 CPU6 39 C msr/cpu_thermal_margin/ ... In case the temperature margin cannot be read, the reported value would be -1. Works on all processors supporting the Digital Readout (dtherm in cpuinfo) Signed-off-by: Stephane Eranian <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Vince Weaver <[email protected]> Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Ingo Molnar <[email protected]>
2018-01-06Merge branch 'linus' into perf/core, to pick up fixesIngo Molnar579-2915/+7820
Signed-off-by: Ingo Molnar <[email protected]>
2018-01-05Merge tag 'for-4.15-rc7-tag' of ↵Linus Torvalds2-12/+34
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fixes from David Sterba: "We have two more fixes for 4.15, both aimed for stable. The leak fix is obvious, the second patch fixes a bug revealed by the refcount API, when it behaves differently than previous atomic_t and reports refs going from 0 to 1 in one case" * tag 'for-4.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes btrfs: Fix flush bio leak
2018-01-05Merge tag 'xfs-4.15-fixes-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds3-19/+33
Pull XFS fixes from Darrick Wong: "I have just a few fixes for bugs and resource cleanup problems this week: - Fix resource cleanup of failed quota initialization - Fix integer overflow problems wrt s_maxbytes" * tag 'xfs-4.15-fixes-10' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix s_maxbytes overflow problems xfs: quota: check result of register_shrinker() xfs: quota: fix missed destroy of qi_tree_lock
2018-01-05Merge tag 'mfd-fixes-4.15-1' of ↵Linus Torvalds1-0/+3
git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd Pull MFD fix from Lee Jones: "Late bugfix to plug a leak in rtsx_pcr" * tag 'mfd-fixes-4.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd: mfd: rtsx: Release IRQ during shutdown
2018-01-05Merge branch 'x86-pti-for-linus' of ↵Linus Torvalds10-43/+55
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more x86 pti fixes from Thomas Gleixner: "Another small stash of fixes for fallout from the PTI work: - Fix the modules vs. KASAN breakage which was caused by making MODULES_END depend of the fixmap size. That was done when the cpu entry area moved into the fixmap, but now that we have a separate map space for that this is causing more issues than it solves. - Use the proper cache flush methods for the debugstore buffers as they are mapped/unmapped during runtime and not statically mapped at boot time like the rest of the cpu entry area. - Make the map layout of the cpu_entry_area consistent for 4 and 5 level paging and fix the KASLR vaddr_end wreckage. - Use PER_CPU_EXPORT for per cpu variable and while at it unbreak nvidia gfx drivers by dropping the GPL export. The subject line of the commit tells it the other way around, but I noticed that too late. - Fix the ASM alternative macros so they can be used in the middle of an inline asm block. - Rename the BUG_CPU_INSECURE flag to BUG_CPU_MELTDOWN so the attack vector is properly identified. The Spectre mitigations will come with their own bug bits later" * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWN x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asm x86/tlb: Drop the _GPL from the cpu_tlbstate export x86/events/intel/ds: Use the proper cache flush method for mapping ds buffers x86/kaslr: Fix the vaddr_end mess x86/mm: Map cpu_entry_area at the same place on 4/5 level x86/mm: Set MODULES_END to 0xffffffffff000000
2018-01-05Merge branch 'efi-urgent-for-linus' of ↵Linus Torvalds5-17/+57
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull EFI updates from Thomas Gleixner: - A fix for a add_efi_memmap parameter regression which ensures that the parameter is parsed before it is used. - Reinstate the virtual capsule mapping as the cached copy turned out to break Quark and other things - Remove Matt Fleming as EFI co-maintainer. He stepped back a few days ago. Thanks Matt for all your great work! * 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Remove Matt Fleming as EFI co-maintainer efi/capsule-loader: Reinstate virtual capsule mapping x86/efi: Fix kernel param add_efi_memmap regression
2018-01-05Merge branch 'for-linus' of ↵Linus Torvalds5-3/+35
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Martin Schwidefsky: "Four bug fixes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/dasd: fix wrongly assigned configuration data s390: fix preemption race in disable_sacf_uaccess s390/sclp: disable FORTIFY_SOURCE for early sclp code s390/pci: handle insufficient resources during dma tlb flush
2018-01-05Merge tag 'for-linus-4.15-rc7-tag' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "One minor fix adjusting the kmalloc flags in the new pvcalls driver added in rc1" * tag 'for-linus-4.15-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/pvcalls: use GFP_ATOMIC under spin lock
2018-01-05Merge branch 'linus' of ↵Linus Torvalds12-62/+173
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fixes from Herbert Xu: "This fixes the following issues: - racy use of ctx->rcvused in af_alg - algif_aead crash in chacha20poly1305 - freeing bogus pointer in pcrypt - build error on MIPS in mpi - memory leak in inside-secure - memory overwrite in inside-secure - NULL pointer dereference in inside-secure - state corruption in inside-secure - build error without CRYPTO_GF128MUL in chelsio - use after free in n2" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: crypto: inside-secure - do not use areq->result for partial results crypto: inside-secure - fix request allocations in invalidation path crypto: inside-secure - free requests even if their handling failed crypto: inside-secure - per request invalidation lib/mpi: Fix umul_ppmm() for MIPS64r6 crypto: pcrypt - fix freeing pcrypt instances crypto: n2 - cure use after free crypto: af_alg - Fix race around ctx->rcvused by making it atomic_t crypto: chacha20poly1305 - validate the digest size crypto: chelsio - select CRYPTO_GF128MUL
2018-01-05Merge branch 'akpm' (patches from Andrew)Linus Torvalds12-23/+43
Merge misc fixes from Andrew Morton: "9 fixes" * emailed patches from Andrew Morton <[email protected]>: mailmap: update Mark Yao's email address userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK fails mm/sparse.c: wrong allocation for mem_section mm/zsmalloc.c: include fs.h mm/debug.c: provide useful debugging information for VM_BUG kernel/exit.c: export abort() to modules mm/mprotect: add a cond_resched() inside change_pmd_range() kernel/acct.c: fix the acct->needcheck check in check_free_space() mm: check pfn_valid first in zero_resv_unavail
2018-01-05x86/pti: Rename BUG_CPU_INSECURE to BUG_CPU_MELTDOWNThomas Gleixner3-5/+5
Use the name associated with the particular attack which needs page table isolation for mitigation. Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: David Woodhouse <[email protected]> Cc: Alan Cox <[email protected]> Cc: Jiri Koshina <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Tim Chen <[email protected]> Cc: Andi Lutomirski <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Turner <[email protected]> Cc: Tom Lendacky <[email protected]> Cc: Greg KH <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/alpine.DEB.2.20.1801051525300.1724@nanos
2018-01-05x86/alternatives: Add missing '\n' at end of ALTERNATIVE inline asmDavid Woodhouse1-2/+2
Where an ALTERNATIVE is used in the middle of an inline asm block, this would otherwise lead to the following instruction being appended directly to the trailing ".popsection", and a failed compile. Fixes: 9cebed423c84 ("x86, alternative: Use .pushsection/.popsection") Signed-off-by: David Woodhouse <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: Rik van Riel <[email protected]> Cc: [email protected] Cc: Tim Chen <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Paul Turner <[email protected]> Cc: Jiri Kosina <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Kees Cook <[email protected]> Cc: Linus Torvalds <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: [email protected] Link: https://lkml.kernel.org/r/[email protected]
2018-01-05mfd: rtsx: Release IRQ during shutdownSinan Kaya1-0/+3
'Commit cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during shutdown")' revealed a resource leak in rtsx_pci driver during shutdown. Issue shows up as a warning during shutdown as follows: remove_proc_entry: removing non-empty directory 'irq/17', leaking at least 'rtsx_pci' WARNING: CPU: 0 PID: 1578 at fs/proc/generic.c:572 remove_proc_entry+0x11d/0x130 Modules linked in <long list but none that are out-of-tree> ... Call Trace: unregister_irq_proc free_desc irq_free_descs mp_unmap_irq acpi_unregister_gsi_apic acpi_pci_irq_disable do_pci_disable_device pci_disable_device device_shutdown kernel_restart Sys_reboot Even though rtsx_pci driver implements a shutdown callback, it is not releasing the interrupt that it registered during probe. This is causing the ACPI layer to complain that the shared IRQ is in use while freeing IRQ. This code releases the IRQ to prevent resource leak and eliminate the warning. Fixes: cc27b735ad3a ("PCI/portdrv: Turn off PCIe services during shutdown") Link: https://bugzilla.kernel.org/show_bug.cgi?id=198141 Reported-by: Chris Clayton <[email protected]> Signed-off-by: Sinan Kaya <[email protected]> Reviewed-by: Rafael J. Wysocki <[email protected]> Signed-off-by: Lee Jones <[email protected]>
2018-01-04Merge tag 'drm-fixes-for-v4.15-rc7' of ↵Linus Torvalds16-106/+127
git://people.freedesktop.org/~airlied/linux Pull drm fixes from Dave Airlie: "Just collecting some fixes to finish my hoildays :-). A few fixes for i915 (one documentation build fix), one ttm fix, one AMD display fix, one omapdrm fix, and a set of armada fixes from Russell. All seem pretty small, you can now return to your latest security news site" * tag 'drm-fixes-for-v4.15-rc7' of git://people.freedesktop.org/~airlied/linux: drm/i915: Apply Display WA #1183 on skl, kbl, and cfl drm/ttm: check the return value of kzalloc drm/amd/display: call set csc_default if enable adjustment is false docs: fix, intel_guc_loader.c has been moved to intel_guc_fw.c omapdrm/dss/hdmi4_cec: fix interrupt handling documentation/gpu/i915: fix docs build error after file rename drm/i915: Put all non-blocking modesets onto an ordered wq drm/i915: Disable DC states around GMBUS on GLK drm/i915/psr: Fix register name mess up. drm/armada: fix YUV planar format framebuffer offsets drm/armada: improve efficiency of armada_drm_plane_calc_addrs() drm/armada: fix UV swap code drm/armada: fix SRAM powerdown drm/armada: fix leak of crtc structure
2018-01-04mailmap: update Mark Yao's email addressJeffy Chen1-0/+1
Change the previous employers email addresses to the current email address. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Jeffy Chen <[email protected]> Acked-by: Martin Kepplinger <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-04userfaultfd: clear the vma->vm_userfaultfd_ctx if UFFD_EVENT_FORK failsAndrea Arcangeli1-2/+18
The previous fix in commit 384632e67e08 ("userfaultfd: non-cooperative: fix fork use after free") corrected the refcounting in case of UFFD_EVENT_FORK failure for the fork userfault paths. That still didn't clear the vma->vm_userfaultfd_ctx of the vmas that were set to point to the aborted new uffd ctx earlier in dup_userfaultfd. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Andrea Arcangeli <[email protected]> Reported-by: syzbot <[email protected]> Reviewed-by: Mike Rapoport <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-04mm/sparse.c: wrong allocation for mem_sectionBaoquan He1-1/+1
In commit 83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") mem_section is allocated at runtime to save memory. It allocates the first dimension of array with sizeof(struct mem_section). It costs extra memory, should be sizeof(struct mem_section *). Fix it. Link: http://lkml.kernel.org/r/[email protected] Fixes: 83e3c48729 ("mm/sparsemem: Allocate mem_section at runtime for CONFIG_SPARSEMEM_EXTREME=y") Signed-off-by: Baoquan He <[email protected]> Tested-by: Dave Young <[email protected]> Acked-by: Kirill A. Shutemov <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Andy Lutomirski <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Atsushi Kumagai <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-04mm/zsmalloc.c: include fs.hSergey Senozhatsky1-0/+1
`struct file_system_type' and alloc_anon_inode() function are defined in fs.h, include it directly. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Sergey Senozhatsky <[email protected]> Cc: Minchan Kim <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-04mm/debug.c: provide useful debugging information for VM_BUGMatthew Wilcox1-14/+14
With the recent addition of hashed kernel pointers, places which need to produce useful debug output have to specify %px, not %p. This patch fixes all the VM debug to use %px. This is appropriate because it's debug output that the user should never be able to trigger, and kernel developers need to see the actual pointers. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Matthew Wilcox <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: "Tobin C. Harding" <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-04kernel/exit.c: export abort() to modulesAndrew Morton4-3/+1
gcc -fisolate-erroneous-paths-dereference can generate calls to abort() from modular code too. [[email protected]: drop duplicate exports of abort()] Link: http://lkml.kernel.org/r/[email protected] Reported-by: Vineet Gupta <[email protected]> Cc: Sudip Mukherjee <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Alexey Brodkin <[email protected]> Cc: Russell King <[email protected]> Cc: Jose Abreu <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-04mm/mprotect: add a cond_resched() inside change_pmd_range()Anshuman Khandual1-2/+4
While testing on a large CPU system, detected the following RCU stall many times over the span of the workload. This problem is solved by adding a cond_resched() in the change_pmd_range() function. INFO: rcu_sched detected stalls on CPUs/tasks: 154-....: (670 ticks this GP) idle=022/140000000000000/0 softirq=2825/2825 fqs=612 (detected by 955, t=6002 jiffies, g=4486, c=4485, q=90864) Sending NMI from CPU 955 to CPUs 154: NMI backtrace for cpu 154 CPU: 154 PID: 147071 Comm: workload Not tainted 4.15.0-rc3+ #3 NIP: c0000000000b3f64 LR: c0000000000b33d4 CTR: 000000000000aa18 REGS: 00000000a4b0fb44 TRAP: 0501 Not tainted (4.15.0-rc3+) MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 22422082 XER: 00000000 CFAR: 00000000006cf8f0 SOFTE: 1 GPR00: 0010000000000000 c00003ef9b1cb8c0 c0000000010cc600 0000000000000000 GPR04: 8e0000018c32b200 40017b3858fd6e00 8e0000018c32b208 40017b3858fd6e00 GPR08: 8e0000018c32b210 40017b3858fd6e00 8e0000018c32b218 40017b3858fd6e00 GPR12: ffffffffffffffff c00000000fb25100 NIP [c0000000000b3f64] plpar_hcall9+0x44/0x7c LR [c0000000000b33d4] pSeries_lpar_flush_hash_range+0x384/0x420 Call Trace: flush_hash_range+0x48/0x100 __flush_tlb_pending+0x44/0xd0 hpte_need_flush+0x408/0x470 change_protection_range+0xaac/0xf10 change_prot_numa+0x30/0xb0 task_numa_work+0x2d0/0x3e0 task_work_run+0x130/0x190 do_notify_resume+0x118/0x120 ret_from_except_lite+0x70/0x74 Instruction dump: 60000000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378 e9410060 e9610068 e9810070 44000022 <7d806378> e9810028 f88c0000 f8ac0008 Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Anshuman Khandual <[email protected]> Suggested-by: Nicholas Piggin <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-04kernel/acct.c: fix the acct->needcheck check in check_free_space()Oleg Nesterov1-1/+1
As Tsukada explains, the time_is_before_jiffies(acct->needcheck) check is very wrong, we need time_is_after_jiffies() to make sys_acct() work. Ignoring the overflows, the code should "goto out" if needcheck > jiffies, while currently it checks "needcheck < jiffies" and thus in the likely case check_free_space() does nothing until jiffies overflow. In particular this means that sys_acct() is simply broken, acct_on() sets acct->needcheck = jiffies and expects that check_free_space() should set acct->active = 1 after the free-space check, but this won't happen if jiffies increments in between. This was broken by commit 32dc73086015 ("get rid of timer in kern/acct.c") in 2011, then another (correct) commit 795a2f22a8ea ("acct() should honour the limits from the very beginning") made the problem more visible. Link: http://lkml.kernel.org/r/[email protected] Fixes: 32dc73086015 ("get rid of timer in kern/acct.c") Reported-by: TSUKADA Koutaro <[email protected]> Suggested-by: TSUKADA Koutaro <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Cc: Al Viro <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2018-01-04mm: check pfn_valid first in zero_resv_unavailDave Young1-0/+2
With latest kernel I get below bug while testing kdump: BUG: unable to handle kernel paging request at ffffea00034b1040 IP: zero_resv_unavail+0xbd/0x126 PGD 37b98067 P4D 37b98067 PUD 37b97067 PMD 0 Oops: 0002 [#1] SMP Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 4.15.0-rc1+ #316 Hardware name: LENOVO 20ARS1BJ02/20ARS1BJ02, BIOS GJET92WW (2.42 ) 03/03/2017 task: ffffffff81a0e4c0 task.stack: ffffffff81a00000 RIP: 0010:zero_resv_unavail+0xbd/0x126 RSP: 0000:ffffffff81a03d88 EFLAGS: 00010006 RAX: 0000000000000000 RBX: ffffea00034b1040 RCX: 0000000000000010 RDX: 0000000000000000 RSI: 0000000000000092 RDI: ffffea00034b1040 RBP: 00000000000d2c41 R08: 00000000000000c0 R09: 0000000000000a0d R10: 0000000000000002 R11: 0000000000007f01 R12: ffffffff81a03d90 R13: ffffea0000000000 R14: 0000000000000063 R15: 0000000000000062 FS: 0000000000000000(0000) GS:ffffffff81c73000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffea00034b1040 CR3: 0000000037609000 CR4: 00000000000606b0 Call Trace: ? free_area_init_nodes+0x640/0x664 ? zone_sizes_init+0x58/0x72 ? setup_arch+0xb50/0xc6c ? start_kernel+0x64/0x43d ? secondary_startup_64+0xa5/0xb0 Code: c1 e8 0c 48 39 d8 76 27 48 89 de 48 c1 e3 06 48 c7 c7 7a 87 79 81 e8 b0 c0 3e ff 4c 01 eb b9 10 00 00 00 31 c0 48 89 df 49 ff c6 <f3> ab eb bc 6a 00 49 c7 c0 f0 93 d1 81 31 d2 83 ce ff 41 54 49 RIP: zero_resv_unavail+0xbd/0x126 RSP: ffffffff81a03d88 CR2: ffffea00034b1040 ---[ end trace f5ba9e8f73c7ee26 ]--- This is introduced by commit a4a3ede2132a ("mm: zero reserved and unavailable struct pages"). The reason is some efi reserved boot ranges is not reported in E820 ram. In my case it is a bgrt buffer: efi: mem00: [Boot Data |RUN| | | | | | | |WB|WT|WC|UC] range=[0x00000000d2c41000-0x00000000d2c85fff] (0MB) Use "add_efi_memmap" can workaround the problem with another fix: http://lkml.kernel.org/r/[email protected] In zero_resv_unavail it would be better to check pfn_valid first before zero the page struct. This fixes the problem and potential other similar problems. Also as Pavel Tatashin suggested checks pfn_valid at the beginning of the section. The range is backed by real memory. The memory range is efi "Boot Service Data", that means after ExitBootServices() these ranges can be used as system ram. But some of them need to be reserved, for example the bgrt image address in an acpi table, if the image memory is freed then kexec reboot will fail because kexec inherit same acpi table to initialize the driver. Link: http://lkml.kernel.org/r/[email protected] Fixes: a4a3ede2132a ("mm: zero reserved and unavailable struct pages") Signed-off-by: Dave Young <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Pavel Tatashin <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>