aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2016-08-10powerpc: Print the kernel load address at the end of prom_init()Benjamin Herrenschmidt1-1/+1
This makes it easier to debug crashes that happen very early before the kernel takes over Open Firmware by allowing us to relate the OF reported crashing addresses to offsets within the kernel. Signed-off-by: Benjamin Herrenschmidt <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-08-10powerpc/ptrace: Fix coredump since ptrace TM changesCyril Bur3-28/+19
Commit 8d460f6156cd ("powerpc/process: Add the function flush_tmregs_to_thread") added flush_tmregs_to_thread() and included the assumption that it would only be called for a task which is not current. Although this is correct for ptrace, when generating a core dump, some of the routines which call flush_tmregs_to_thread() are called. This leads to a WARNing such as: Not expecting ptrace on self: TM regs may be incorrect ------------[ cut here ]------------ WARNING: CPU: 123 PID: 7727 at arch/powerpc/kernel/process.c:1088 flush_tmregs_to_thread+0x78/0x80 CPU: 123 PID: 7727 Comm: libvirtd Not tainted 4.8.0-rc1-gcc6x-g61e8a0d #1 task: c000000fe631b600 task.stack: c000000fe63b0000 NIP: c00000000001a1a8 LR: c00000000001a1a4 CTR: c000000000717780 REGS: c000000fe63b3420 TRAP: 0700 Not tainted (4.8.0-rc1-gcc6x-g61e8a0d) MSR: 900000010282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 28004222 XER: 20000000 ... NIP [c00000000001a1a8] flush_tmregs_to_thread+0x78/0x80 LR [c00000000001a1a4] flush_tmregs_to_thread+0x74/0x80 Call Trace: flush_tmregs_to_thread+0x74/0x80 (unreliable) vsr_get+0x64/0x1a0 elf_core_dump+0x604/0x1430 do_coredump+0x5fc/0x1200 get_signal+0x398/0x740 do_signal+0x54/0x2b0 do_notify_resume+0x98/0xb0 ret_from_except_lite+0x70/0x74 So fix flush_tmregs_to_thread() to detect the case where it is called on current, and a transaction is active, and in that case flush the TM regs to the thread_struct. This patch also moves flush_tmregs_to_thread() into ptrace.c as it is only called from that file. Fixes: 8d460f6156cd ("powerpc/process: Add the function flush_tmregs_to_thread") Signed-off-by: Cyril Bur <[email protected]> [mpe: Flesh out change log] Signed-off-by: Michael Ellerman <[email protected]>
2016-08-10powerpc/32: Fix csum_partial_copy_generic()Christophe Leroy1-3/+4
Commit 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()") introduced a bug when destination address is odd and initial csum is not null In that (rare) case the initial csum value has to be rotated one byte as well as the resulting value is This patch also fixes related comments Fixes: 7aef4136566b0 ("powerpc32: rewrite csum_partial_copy_generic() based on copy_tofrom_user()") Signed-off-by: Christophe Leroy <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-08-10cxl: Set psl_fir_cntl to production environment valueFrederic Barrat1-3/+6
Switch the setting of psl_fir_cntl from debug to production environment recommended value. It mostly affects the PSL behavior when an error is raised in psl_fir1/2. Tested with cxlflash. Signed-off-by: Frederic Barrat <[email protected]> Reviewed-by: Uma Krishnan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-08-09mm, writeback: flush plugged IO in wakeup_flusher_threads()Konstantin Khlebnikov1-0/+6
I've found funny live-lock between raid10 barriers during resync and memory controller hard limits. Inside mpage_readpages() task holds on to its plug bio which blocks the barrier in raid10. Its memory cgroup have no free memory thus the task goes into reclaimer but all reclaimable pages are dirty and cannot be written because raid10 is rebuilding and stuck on the barrier. Common flush of such IO in schedule() never happens, because the caller doesn't go to sleep. Lock is 'live' because changing memory limit or killing tasks which holds that stuck bio unblock whole progress. That was what happened in 3.18.x but I see no difference in upstream logic. Theoretically this might happen even without memory cgroup. Signed-off-by: Konstantin Khlebnikov <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2016-08-09Merge tag 'perf-urgent-for-mingo-20160809' of ↵Ingo Molnar14-54/+210
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent Pull perf/urgent fixes from Arnaldo Carvalho de Melo: User visible fixes: - Fix the lookup for a kernel module in 'perf probe', fixing for instance, the erroneous return of "[raid10]" when looking for "[raid1]" (Konstantin Khlebnikov) - Disable counters in a group before reading them in 'perf stat', to avoid skew (Mark Rutland) - Fix adding probes to function aliases in systems using kaslr (Masami Hiramatsu) - Trip libtraceevent trace_seq buffers, removing unnecessary memory usage that could bring a system using tracepoint events with 'perf top' to a crawl, as the trace_seq buffers start at a whooping 4 KB, which is very rarely used in perf's usecases, so realloc it to the really used space as a last measure after using libtraceevent functions to format the fields of tracepoint events (Arnaldo Carvalho de Melo) - Fix 'perf probe' location when using DWARF on ppc64le (Ravi Bangoria) - Allow specifying signedness casts to a 'perf probe' variable, to shorten the number of steps to see signed values that otherwise would always appear as hex values (Naohiro Aota) Documentation fixes: - Add 'bpf-output' field to 'perf script' usage message (Brendan Gregg) Infrastructure fixes: - Sync kernel header files: cpufeatures.h, {disabled,required}-features.h, bpf.h and vmx.h, so that we get a clean build, without warnings about files being different from the kernel counterparts. A verification of the need or desirability of changes in tools/ based on what was done in the kernel changesets was made and documented in the respective file sync changesets (Arnaldo Carvalho de Melo) Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2016-08-09Revert "printk: create pr_<level> functions"Linus Torvalds4-76/+26
This reverts commit 874f9c7da9a4acbc1b9e12ca722579fb50e4d142. Geert Uytterhoeven reports: "This change seems to have an (unintendent?) side-effect. Before, pr_*() calls without a trailing newline characters would be printed with a newline character appended, both on the console and in the output of the dmesg command. After this commit, no new line character is appended, and the output of the next pr_*() call of the same type may be appended, like in: - Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000 - Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM) + Truncating RAM at 0x0000000040000000-0x00000000c0000000 to -0x0000000070000000Ignoring RAM at 0x0000000200000000-0x0000000240000000 (!CONFIG_HIGHMEM)" Joe Perches says: "No, that is not intentional. The newline handling code inside vprintk_emit is a bit involved and for now I suggest a revert until this has all the same behavior as earlier" Reported-by: Geert Uytterhoeven <[email protected]> Requested-by: Joe Perches <[email protected]> Cc: Andrew Morton <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-09Merge tag 'trace-v4.8-2' of ↵Linus Torvalds1-3/+11
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "Fix tick_stop tracepoint symbols for user export. Luiz Capitulino noticed that the tick_stop tracepoint wasn't being parsed properly by the tracing user space tools. This was due to the TRACE_DEFINE_ENUM() being set to a define, when it should have been set to the enum itself. The define was of the MASK that used the BIT to shift. The BIT was the enum and by adding that, everything gets converted nicely. The MASK is still kept just in case it gets converted to an enum in the future" * tag 'trace-v4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix tick_stop tracepoint symbols for user export
2016-08-09Merge tag 'gcc-plugin-infrastructure-v4.8-rc2' of ↵Linus Torvalds5-26/+56
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull gcc plugin improvements from Kees Cook: "Several fixes/improvements for the gcc plugin infrastructure: - fix a problem with gcc plugins interfering with cc-option tests. - abort more gracefully when gcc plugin headers or compiler support is missing. - improve the gcc plugin rule generation to be more dynamic, pass arguments, and build from subdirectories" * tag 'gcc-plugin-infrastructure-v4.8-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: gcc-plugins: Add support for plugin subdirectories gcc-plugins: Automate make rule generation gcc-plugins: Add support for passing plugin arguments gcc-plugins: abort builds cleanly when not supported kbuild: no gcc-plugins during cc-option tests
2016-08-09Merge tag 'platform-drivers-x86-v4.8-3' of ↵Linus Torvalds1-2/+2
git://git.infradead.org/users/dvhart/linux-platform-drivers-x86 Pull x86 platform driver update from Darren Hart: "dell-wmi: ignore battery remove/insert event" * tag 'platform-drivers-x86-v4.8-3' of git://git.infradead.org/users/dvhart/linux-platform-drivers-x86: dell-wmi: Ignore WMI event 0xe00e
2016-08-09Merge tag 'drm-fixes-for-4.8-rc2' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds18-64/+95
Pull drm fixes from Dave Airlie: "This contains a bunch of amdgpu fixes, and some i915 regression fixes. It also contains some fixes for an older regression with some EDID changes and some 6bpc panels. Then there are the lockdep, cirrus and rcar-du regression fixes from this window" * tag 'drm-fixes-for-4.8-rc2' of git://people.freedesktop.org/~airlied/linux: drm/cirrus: Fix NULL pointer dereference when registering the fbdev drm/edid: Set 8 bpc color depth for displays with "DFP 1.x compliant TMDS". drm/i915/dp: Revert "drm/i915/dp: fall back to 18 bpp when sink capability is unknown" drm/edid: Add 6 bpc quirk for display AEO model 0. drm: Paper over locking inversion after registration rework drm: rcar-du: Link HDMI encoder with bridge drm/ttm: Wait for a BO to become idle before unbinding it from GTT drm/i915/fbdev: Check for the framebuffer before use drm/amdgpu: update golden setting of polaris10 drm/amdgpu: update golden setting of stoney drm/amdgpu: update golden setting of polaris11 drm/amdgpu: update golden setting of carrizo drm/amdgpu: update golden setting of iceland drm/amd/amdgpu: change pptable output format from ASCII to binary drm/amdgpu/ci: add mullins to default case for smc ucode drm/amdgpu/gmc7: add missing mullins case drm/i915: Never fully mask the the EI up rps interrupt on SNB/IVB drm/i915: Wait up to 3ms for the pcu to ack the cdclk change request on SKL
2016-08-09ipr: Fix sync scsi scanBrian King1-5/+6
Commit b195d5e2bffd ("ipr: Wait to do async scan until scsi host is initialized") fixed async scan for ipr, but broke sync scan for ipr. This fixes sync scan back up. Signed-off-by: Brian King <[email protected]> Reported-and-tested-by: Michael Ellerman <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-09mm: memcontrol: only mark charged pages with PageKmemcgVladimir Davydov3-14/+18
To distinguish non-slab pages charged to kmemcg we mark them PageKmemcg, which sets page->_mapcount to -512. Currently, we set/clear PageKmemcg in __alloc_pages_nodemask()/free_pages_prepare() for any page allocated with __GFP_ACCOUNT, including those that aren't actually charged to any cgroup, i.e. allocated from the root cgroup context. To avoid overhead in case cgroups are not used, we only do that if memcg_kmem_enabled() is true. The latter is set iff there are kmem-enabled memory cgroups (online or offline). The root cgroup is not considered kmem-enabled. As a result, if a page is allocated with __GFP_ACCOUNT for the root cgroup when there are kmem-enabled memory cgroups and is freed after all kmem-enabled memory cgroups were removed, e.g. # no memory cgroups has been created yet, create one mkdir /sys/fs/cgroup/memory/test # run something allocating pages with __GFP_ACCOUNT, e.g. # a program using pipe dmesg | tail # remove the memory cgroup rmdir /sys/fs/cgroup/memory/test we'll get bad page state bug complaining about page->_mapcount != -1: BUG: Bad page state in process swapper/0 pfn:1fd945c page:ffffea007f651700 count:0 mapcount:-511 mapping: (null) index:0x0 flags: 0x1000000000000000() To avoid that, let's mark with PageKmemcg only those pages that are actually charged to and hence pin a non-root memory cgroup. Fixes: 4949148ad433 ("mm: charge/uncharge kmemcg from generic page allocator paths") Reported-and-tested-by: Eric Dumazet <[email protected]> Signed-off-by: Vladimir Davydov <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2016-08-09drivers/perf: arm-pmu: Fix handling of SPI lacking "interrupt-affinity" propertyMarc Zyngier1-3/+4
Patch 19a469a58720 ("drivers/perf: arm-pmu: Handle per-interrupt affinity mask") added support for partitionned PPI setups, but inadvertently broke setups using SPIs without the "interrupt-affinity" property (which is the case for UP platforms). This patch restore the broken functionnality by testing whether the interrupt is percpu or not instead of relying on the using_spi flag that really means "SPI *and* interrupt-affinity property". Acked-by: Mark Rutland <[email protected]> Reported-by: Geert Uytterhoeven <[email protected]> Tested-by: Geert Uytterhoeven <[email protected]> Fixes: 19a469a58720 ("drivers/perf: arm-pmu: Handle per-interrupt affinity mask") Signed-off-by: Marc Zyngier <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2016-08-09drivers/perf: arm-pmu: convert arm_pmu_mutex to spinlockSudeep Holla1-9/+9
arm_pmu_mutex is never held long and we don't want to sleep while the lock is being held as it's executed in the context of hotplug notifiers. So it can be converted to a simple spinlock instead. Without this patch we get the following warning: BUG: sleeping function called from invalid context at kernel/locking/mutex.c:620 in_atomic(): 1, irqs_disabled(): 128, pid: 0, name: swapper/2 no locks held by swapper/2/0. irq event stamp: 381314 hardirqs last enabled at (381313): _raw_spin_unlock_irqrestore+0x7c/0x88 hardirqs last disabled at (381314): cpu_die+0x28/0x48 softirqs last enabled at (381294): _local_bh_enable+0x28/0x50 softirqs last disabled at (381293): irq_enter+0x58/0x78 CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.7.0 #12 Call trace: dump_backtrace+0x0/0x220 show_stack+0x24/0x30 dump_stack+0xb4/0xf0 ___might_sleep+0x1d8/0x1f0 __might_sleep+0x5c/0x98 mutex_lock_nested+0x54/0x400 arm_perf_starting_cpu+0x34/0xb0 cpuhp_invoke_callback+0x88/0x3d8 notify_cpu_starting+0x78/0x98 secondary_start_kernel+0x108/0x1a8 This patch converts the mutex to spinlock to eliminate the above warnings. This constraints pmu->reset to be non-blocking call which is the case with all the ARM PMU backends. Cc: Stephen Boyd <[email protected]> Fixes: 37b502f121ad ("arm/perf: Fix hotplug state machine conversion") Acked-by: Mark Rutland <[email protected]> Signed-off-by: Sudeep Holla <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2016-08-09ceph: initialize pathbase in the !dentry case in encode_caps_cb()Ilya Dryomov1-0/+1
pathbase is the base inode; set it to 0 if we've got no path. Coverity-id: 146348 Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2016-08-09rbd: nuke the 32-bit pool id checkIlya Dryomov1-9/+0
ceph_file_layout::pool_id is now s64. rbd_add_get_pool_id() and ceph_pg_poolid_by_name() both return an int, so it's bogus anyway. Signed-off-by: Ilya Dryomov <[email protected]> Reviewed-by: Alex Elder <[email protected]>
2016-08-09perf probe ppc64le: Fix probe location when using DWARFRavi Bangoria3-21/+49
Powerpc has Global Entry Point and Local Entry Point for functions. LEP catches call from both the GEP and the LEP. Symbol table of ELF contains GEP and Offset from which we can calculate LEP, but debuginfo does not have LEP info. Currently, perf prioritize symbol table over dwarf to probe on LEP for ppc64le. But when user tries to probe with function parameter, we fall back to using dwarf(i.e. GEP) and when function called via LEP, probe will never hit. For example: $ objdump -d vmlinux ... do_sys_open(): c0000000002eb4a0: e8 00 4c 3c addis r2,r12,232 c0000000002eb4a4: 60 00 42 38 addi r2,r2,96 c0000000002eb4a8: a6 02 08 7c mflr r0 c0000000002eb4ac: d0 ff 41 fb std r26,-48(r1) $ sudo ./perf probe do_sys_open $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/do_sys_open _text+3060904 $ sudo ./perf probe 'do_sys_open filename:string' $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/do_sys_open _text+3060896 filename_string=+0(%gpr4):string For second case, perf probed on GEP. So when function will be called via LEP, probe won't hit. $ sudo ./perf record -a -e probe:do_sys_open ls [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.195 MB perf.data ] To resolve this issue, let's not prioritize symbol table, let perf decide what it wants to use. Perf is already converting GEP to LEP when it uses symbol table. When perf uses debuginfo, let it find LEP offset form symbol table. This way we fall back to probe on LEP for all cases. After patch: $ sudo ./perf probe 'do_sys_open filename:string' $ sudo cat /sys/kernel/debug/tracing/kprobe_events p:probe/do_sys_open _text+3060904 filename_string=+0(%gpr4):string $ sudo ./perf record -a -e probe:do_sys_open ls [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.197 MB perf.data (11 samples) ] Signed-off-by: Ravi Bangoria <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/1470723805-5081-2-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-08-09perf probe: Add function to post process kernel trace eventsRavi Bangoria1-11/+18
Instead of inline code, introduce function to post process kernel probe trace events. Signed-off-by: Ravi Bangoria <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Ananth N Mavinakayanahalli <[email protected]> Cc: Balbir Singh <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Naveen N. Rao <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/1470723805-5081-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-08-09tools: Sync cpufeatures headers with the kernelArnaldo Carvalho de Melo2-0/+4
Due to: 1e61f78baf89 ("x86/cpufeature: Make sure DISABLED/REQUIRED macros are updated") No changes to tools using those headers (tools/arch/x86/lib/mem{set,cpu}_64.S) seems necessary. Detected by the tools build header drift checker: $ make -C tools/perf O=/tmp/build/perf make: Entering directory '/home/acme/git/linux/tools/perf' BUILD: Doing 'make -j4' parallel build GEN /tmp/build/perf/common-cmds.h Warning: tools/arch/x86/include/asm/disabled-features.h differs from kernel Warning: tools/arch/x86/include/asm/required-features.h differs from kernel Warning: tools/arch/x86/include/asm/cpufeatures.h differs from kernel CC /tmp/build/perf/util/probe-finder.o CC /tmp/build/perf/builtin-help.o <SNIP> ^C$ Cc: Adrian Hunter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-08-09toops: Sync tools/include/uapi/linux/bpf.h with the kernelArnaldo Carvalho de Melo1-1/+85
The way we're using kernel headers in tools/ now, with a copy that is made to the same path prefixed by "tools/" plus checking if that copy got stale, i.e. if the kernel counterpart changed, helps in keeping track with new features that may be useful for tools to exploit. For instance, looking at all the changes to bpf.h since it was last copied to tools/include brings this to toolers' attention: Need to investigate this one to check how to run a program via perf, setting up a BPF event, that will take advantage of the way perf already calls clang/LLVM, sets up the event and runs the workload in a single command line, helping in debugging such semi cooperative programs: 96ae52279594 ("bpf: Add bpf_probe_write_user BPF helper to be called in tracers") This one needs further investigation about using the feature it improves in 'perf trace' to do some tcpdumpin' mixed with syscalls, tracepoints, probe points, callgraphs, etc: 555c8a8623a3 ("bpf: avoid stack copy and use skb ctx for event output") Add tracing just packets that are related to some container to that mix: 4a482f34afcc ("cgroup: bpf: Add bpf_skb_in_cgroup_proto") 4ed8ec521ed5 ("cgroup: bpf: Add BPF_MAP_TYPE_CGROUP_ARRAY") Definetely needs to have example programs accessing task_struct from a bpf proggie started from 'perf trace': 606274c5abd8 ("bpf: introduce bpf_get_current_task() helper") Core networking related, XDP: 6ce96ca348a9 ("bpf: add XDP_TX xdp_action for direct forwarding") 6a773a15a1e8 ("bpf: add XDP prog type for early driver filter") 13c5c240f789 ("bpf: add bpf_get_hash_recalc helper") d2485c4242a8 ("bpf: add bpf_skb_change_type helper") 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper") Changes detected by the tools build system: $ make -C tools/perf O=/tmp/build/perf install-bin make: Entering directory '/home/acme/git/linux/tools/perf' BUILD: Doing 'make -j4' parallel build Warning: tools/include/uapi/linux/bpf.h differs from kernel INSTALL GTK UI CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o <SNIP> $ Cc: Adrian Hunter <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Brenden Blanco <[email protected]> Cc: Daniel Borkmann <[email protected]> Cc: David Ahern <[email protected]> Cc: David S. Miller <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Martin KaFai Lau <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Sargun Dhillon <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-08-09tools: Sync cpufeatures.h and vmx.h with the kernelArnaldo Carvalho de Melo2-9/+4
There were changes related to the deprecation of the "pcommit" instruction: fd1d961dd681 ("x86/insn: remove pcommit") dfa169bbee00 ("Revert "KVM: x86: add pcommit support"") No need to update anything in the tools, as "pcommit" wasn't being listed on the VMX_EXIT_REASONS in the tools/perf/arch/x86/util/kvm-stat.c file. Just grab fresh copies of these files to silence the file cache coherency detector: $ make -C tools/perf O=/tmp/build/perf install-bin make: Entering directory '/home/acme/git/linux/tools/perf' BUILD: Doing 'make -j4' parallel build Warning: tools/arch/x86/include/asm/cpufeatures.h differs from kernel Warning: tools/arch/x86/include/uapi/asm/vmx.h differs from kernel INSTALL GTK UI <SNIP> # Cc: Adrian Hunter <[email protected]> Cc: Dan Williams <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Ross Zwisler <[email protected]> Cc: Wang Nan <[email protected]> Cc: Xiao Guangrong <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-08-09perf probe: Support signedness castingNaohiro Aota2-4/+21
The 'perf probe' tool detects a variable's type and use the detected type to add a new probe. Then, kprobes prints its variable in hexadecimal format if the variable is unsigned and prints in decimal if it is signed. We sometimes want to see unsigned variable in decimal format (i.e. sector_t or size_t). In that case, we need to investigate the variable's size manually to specify just signedness. This patch add signedness casting support. By specifying "s" or "u" as a type, perf-probe will investigate variable size as usual and use the specified signedness. E.g. without this: $ perf probe -a 'submit_bio bio->bi_iter.bi_sector' Added new event: probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector) You can now use it in all perf tools, such as: perf record -e probe:submit_bio -aR sleep 1 $ cat trace_pipe|head dbench-9692 [003] d..1 971.096633: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x3a3d00 dbench-9692 [003] d..1 971.096685: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x1a3d80 dbench-9692 [003] d..1 971.096687: submit_bio: (submit_bio+0x0/0x140) bi_sector=0x3a3d80 ... // need to investigate the variable size $ perf probe -a 'submit_bio bio->bi_iter.bi_sector:s64' Added new event: probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s64) You can now use it in all perf tools, such as: perf record -e probe:submit_bio -aR sleep 1 With this: // just use "s" to cast its signedness $ perf probe -v -a 'submit_bio bio->bi_iter.bi_sector:s' Added new event: probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s) You can now use it in all perf tools, such as: perf record -e probe:submit_bio -aR sleep 1 $ cat trace_pipe|head dbench-9689 [001] d..1 1212.391237: submit_bio: (submit_bio+0x0/0x140) bi_sector=128 dbench-9689 [001] d..1 1212.391252: submit_bio: (submit_bio+0x0/0x140) bi_sector=131072 dbench-9697 [006] d..1 1212.398611: submit_bio: (submit_bio+0x0/0x140) bi_sector=30208 This commit also update perf-probe.txt to describe "types". Most parts are based on existing documentation: Documentation/trace/kprobetrace.txt Committer note: Testing using 'perf trace': # perf probe -a 'submit_bio bio->bi_iter.bi_sector' Added new event: probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector) You can now use it in all perf tools, such as: perf record -e probe:submit_bio -aR sleep 1 # trace --no-syscalls --ev probe:submit_bio 0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=0xc133c0) 3181.861 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffb8) 3181.881 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffc0) 3184.488 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x6cffc8) <SNIP> 4717.927 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x4dc7a88) 4717.970 probe:submit_bio:(ffffffffac3aee00) bi_sector=0x4dc7880) ^C[root@jouet ~]# Now, using this new feature: [root@jouet ~]# perf probe -a 'submit_bio bio->bi_iter.bi_sector:s' Added new event: probe:submit_bio (on submit_bio with bi_sector=bio->bi_iter.bi_sector:s) You can now use it in all perf tools, such as: perf record -e probe:submit_bio -aR sleep 1 [root@jouet ~]# trace --no-syscalls --ev probe:submit_bio 0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145704) 0.017 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145712) 0.019 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145720) 2.567 probe:submit_bio:(ffffffffac3aee00) bi_sector=7145728) 5631.919 probe:submit_bio:(ffffffffac3aee00) bi_sector=0) 5631.941 probe:submit_bio:(ffffffffac3aee00) bi_sector=8) 5631.945 probe:submit_bio:(ffffffffac3aee00) bi_sector=16) 5631.948 probe:submit_bio:(ffffffffac3aee00) bi_sector=24) ^C# With callchains: # trace --no-syscalls --ev probe:submit_bio/max-stack=10/ 0.000 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662544) submit_bio+0xa8200001 ([kernel.kallsyms]) submit_bh+0xa8200013 ([kernel.kallsyms]) jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms]) kjournald2+0xa82000ca ([kernel.kallsyms]) kthread+0xa82000d8 ([kernel.kallsyms]) ret_from_fork+0xa820001f ([kernel.kallsyms]) 0.023 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662552) submit_bio+0xa8200001 ([kernel.kallsyms]) submit_bh+0xa8200013 ([kernel.kallsyms]) jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms]) kjournald2+0xa82000ca ([kernel.kallsyms]) kthread+0xa82000d8 ([kernel.kallsyms]) ret_from_fork+0xa820001f ([kernel.kallsyms]) 0.027 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662560) submit_bio+0xa8200001 ([kernel.kallsyms]) submit_bh+0xa8200013 ([kernel.kallsyms]) jbd2_journal_commit_transaction+0xa8200691 ([kernel.kallsyms]) kjournald2+0xa82000ca ([kernel.kallsyms]) kthread+0xa82000d8 ([kernel.kallsyms]) ret_from_fork+0xa820001f ([kernel.kallsyms]) 2.593 probe:submit_bio:(ffffffffac3aee00) bi_sector=50662568) submit_bio+0xa8200001 ([kernel.kallsyms]) submit_bh+0xa8200013 ([kernel.kallsyms]) journal_submit_commit_record+0xa82001ac ([kernel.kallsyms]) jbd2_journal_commit_transaction+0xa82012e8 ([kernel.kallsyms]) kjournald2+0xa82000ca ([kernel.kallsyms]) kthread+0xa82000d8 ([kernel.kallsyms]) ret_from_fork+0xa820001f ([kernel.kallsyms]) ^C# Signed-off-by: Naohiro Aota <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Hemant Kumar <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-08-09tracing: Fix tick_stop tracepoint symbols for user exportSteven Rostedt (Red Hat)1-3/+11
The symbols used in the tick_stop tracepoint were not being converted properly into integers in the trace_stop format file. Instead we had this: print fmt: "success=%d dependency=%s", REC->success, __print_symbolic(REC->dependency, { 0, "NONE" }, { (1 << TICK_DEP_BIT_POSIX_TIMER), "POSIX_TIMER" }, { (1 << TICK_DEP_BIT_PERF_EVENTS), "PERF_EVENTS" }, { (1 << TICK_DEP_BIT_SCHED), "SCHED" }, { (1 << TICK_DEP_BIT_CLOCK_UNSTABLE), "CLOCK_UNSTABLE" }) User space tools have no idea how to parse "TICK_DEP_BIT_SCHED" or the other symbols used to do the bit shifting. The reason is that the conversion was done with using the TICK_DEP_MASK_* symbols which are just macros that convert to the BIT shift itself (with the exception of NONE, which was converted properly, because it doesn't use bits, and is defined as zero). The TICK_DEP_BIT_* needs to be denoted by TRACE_DEFINE_ENUM() in order to have this properly converted for user space tools to parse this event. Cc: [email protected] Cc: Frederic Weisbecker <[email protected]> Fixes: e6e6cc22e067 ("nohz: Use enum code for tick stop failure tracing message") Reported-by: Luiz Capitulino <[email protected]> Tested-by: Luiz Capitulino <[email protected]> Signed-off-by: Steven Rostedt <[email protected]>
2016-08-09perf stat: Avoid skew when reading eventsMark Rutland1-8/+23
When we don't have a tracee (i.e. we're attaching to a task or CPU), counters can still be running after our workload finishes, and can still be running as we read their values. As we read events one-by-one, there can be arbitrary skew between values of events, even within a group. This means that ratios within an event group are not reliable. This skew can be seen if measuring a group of identical events, e.g: # perf stat -a -C0 -e '{cycles,cycles}' sleep 1 To avoid this, we must stop groups from counting before we read the values of any constituent events. This patch adds and makes use of a new disable_counters() helper, which disables group leaders (and thus each group as a whole). This mirrors the use of enable_counters() for starting event groups in the absence of a tracee. Closing a group leader splits the group, and without a disabled group leader the newly split events will begin counting. Thus to ensure counts are reliable we must defer closing group leaders until all counts have been read. To do so this patch removes the event closing logic from the read_counters() helper, explicitly closes the events using perf_evlist__close(), which also aids legibility. Signed-off-by: Mark Rutland <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Peter Zijlstra <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-08-09perf probe: Fix module name matchingKonstantin Khlebnikov1-1/+3
If module is "module" then dso->short_name is "[module]". Substring comparing is't enough: "raid10" matches to "[raid1]". This patch also checks terminating zero in module name. Signed-off-by: Konstantin Khlebnikov <[email protected]> Acked-by: Masami Hiramatsu <[email protected]> Link: http://lkml.kernel.org/r/147039975648.715620.12985971832789032159.stgit@buzz Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-08-09perf probe: Adjust map->reloc offset when finding kernel symbol from mapMasami Hiramatsu1-1/+1
Adjust map->reloc offset for the unmapped address when finding alternative symbol address from map, because KASLR can relocate the kernel symbol address. The same adjustment has been done when finding appropriate kernel symbol address from map which was introduced by commit f90acac75713 ("perf probe: Find given address from offline dwarf") Reported-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Masami Hiramatsu <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-08-09perf hists: Trim libtraceevent trace_seq buffersArnaldo Carvalho de Melo1-1/+5
When we use libtraceevent to format trace event fields into printable strings to use in hist entries it is important to trim it from the default 4 KiB it starts with to what is really used, to reduce the memory footprint, so use realloc(seq.buffer, seq.len + 1) when returning the seq.buffer formatted with the fields contents. Reported-and-Tested-by: Wang Nan <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: David Ahern <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Link: http://lkml.kernel.org/n/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-08-09perf script: Add 'bpf-output' field to usage messageBrendan Gregg2-3/+3
This adds the 'bpf-output' field to the perf script usage message, and docs. Signed-off-by: Brendan Gregg <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Wang Nan <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2016-08-09metag: Drop show_mem() from mem_init()James Hogan1-1/+0
The recent commit 599d0c954f91 ("mm, vmscan: move LRU lists to node"), changed memory management code so that show_mem() is no longer safe to call prior to setup_per_cpu_pageset(), as pgdat->per_cpu_nodestats will still be NULL. This causes an oops on metag due to the call to show_mem() from mem_init(): node_page_state_snapshot(...) + 0x48 pgdat_reclaimable(struct pglist_data * pgdat = 0x402517a0) show_free_areas(unsigned int filter = 0) + 0x2cc show_mem(unsigned int filter = 0) + 0x18 mem_init() mm_init() start_kernel() + 0x204 This wasn't a problem before with zone_reclaimable() as zone_pcp_init() was already setting zone->pageset to &boot_pageset, via setup_arch() and paging_init(), which happens before mm_init(): zone_pcp_init(...) free_area_init_core(...) + 0x138 free_area_init_node(int nid = 0, ...) + 0x1a0 free_area_init_nodes(...) + 0x440 paging_init(unsigned long mem_end = 0x4fe00000) + 0x378 setup_arch(char ** cmdline_p = 0x4024e038) + 0x2b8 start_kernel() + 0x54 No other arches appear to call show_mem() during boot, and it doesn't really add much value to the log, so lets just drop it from mem_init(). Signed-off-by: James Hogan <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: [email protected]
2016-08-09virtio/s390: deprecate old transportCornelia Huck3-1/+20
There only ever have been two host implementations of the old s390-virtio (pre-ccw) transport: the experimental kuli userspace, and qemu. As qemu switched its default to ccw with 2.4 (with most users having used ccw well before that) and removed the old transport entirely in 2.6, s390-virtio probably hasn't been in active use for quite some time and is therefore likely to bitrot. Let's start the slow march towards removing the code by deprecating it. Note that this also deprecates the early virtio console code, which has been causing trouble in the guest without being wired up in any relevant hypervisor code. Acked-by: Christian Borntraeger <[email protected]> Reviewed-by: Dong Jia Shi <[email protected]> Reviewed-by: Sascha Silbe <[email protected]> Signed-off-by: Cornelia Huck <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2016-08-09virtio/s390: keep early_put_charsChristian Borntraeger1-1/+1
In case the registration of the hvc tty never happens AND the kernel thinks that hvc0 is the preferred console we should keep the early printk function to avoid a kernel panic due to code being removed. Signed-off-by: Christian Borntraeger <[email protected]> Signed-off-by: Jing Liu <[email protected]> Signed-off-by: Cornelia Huck <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2016-08-09virtio_blk: Fix a slient kernel panicMinfei Huang1-18/+8
We do a lot of memory allocation in function init_vq, and don't handle the allocation failure properly. Then this function will return 0, although initialization fails due to lacking memory. At that moment, kernel will panic in guest machine, if virtio is used to drive disk. To fix this bug, we should take care of allocation failure, and return correct value to let caller know what happen. Tested-by: Chao Fan <[email protected]> Signed-off-by: Minfei Huang <[email protected]> Signed-off-by: Minfei Huang <[email protected]> Reviewed-by: Cornelia Huck <[email protected]> Reviewed-by: Stefan Hajnoczi <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2016-08-09virtio-vsock: fix include guard typoStefan Hajnoczi1-1/+1
Signed-off-by: Stefan Hajnoczi <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2016-08-09vhost/vsock: fix vhost virtio_vsock_pkt use-after-freeStefan Hajnoczi1-1/+5
Stash the packet length in a local variable before handing over ownership of the packet to virtio_transport_recv_pkt() or virtio_transport_free_pkt(). This patch solves the use-after-free since pkt is no longer guaranteed to be alive. Reported-by: Dan Carpenter <[email protected]> Signed-off-by: Stefan Hajnoczi <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2016-08-099p/trans_virtio: use kvfree() for iov_iter_get_pages_alloc()Vegard Nossum1-2/+2
The memory allocated by iov_iter_get_pages_alloc() can be allocated with vmalloc() if kmalloc() failed -- see get_pages_array(). In that case we need to free it with vfree(), so let's use kvfree(). The bug manifests like this: BUG: unable to handle kernel paging request at ffffeb0400072da0 IP: [<ffffffff8139c67b>] kfree+0x4b/0x140 PGD 0 Oops: 0000 [#1] PREEMPT SMP KASAN CPU: 2 PID: 675 Comm: trinity-c2 Not tainted 4.7.0-rc7+ #14 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014 task: ffff8800badef2c0 ti: ffff880069208000 task.ti: ffff880069208000 RIP: 0010:[<ffffffff8139c67b>] [<ffffffff8139c67b>] kfree+0x4b/0x140 RSP: 0000:ffff88006920f3f0 EFLAGS: 00010282 RAX: ffffea0000000000 RBX: ffffc90001cb6000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: 0000000000000246 RDI: ffffc90001cb6000 RBP: ffff88006920f410 R08: 0000000000000000 R09: dffffc0000000000 R10: ffff8800badefa30 R11: 0000056a3d3b0d9f R12: ffff88006920f620 R13: ffffeb0400072d80 R14: ffff8800baa94078 R15: 0000000000000000 FS: 00007fbd2b437700(0000) GS:ffff88011af00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffeb0400072da0 CR3: 000000006926d000 CR4: 00000000000006e0 Stack: 0000000000000001 ffff88006920f620 ffffed001755280f ffff8800baa94078 ffff88006920f6a8 ffffffff8310442b dffffc0000000000 ffff8800badefa30 ffff8800badefa28 ffff88011af1fba0 1ffff1000d241e98 ffff8800ba892150 Call Trace: [<ffffffff8310442b>] p9_virtio_zc_request+0x72b/0xdb0 [<ffffffff830f2116>] p9_client_zc_rpc.constprop.8+0x246/0xb10 [<ffffffff830f5d79>] p9_client_read+0x4c9/0x750 [<ffffffff8175ceac>] v9fs_fid_readpage+0x14c/0x320 [<ffffffff8175d0b6>] v9fs_vfs_readpage+0x36/0x50 [<ffffffff812c6f13>] filemap_fault+0x9a3/0xe60 [<ffffffff81331878>] __do_fault+0x158/0x300 [<ffffffff81339e01>] handle_mm_fault+0x1cf1/0x3c80 [<ffffffff810c0aaa>] __do_page_fault+0x30a/0x8e0 [<ffffffff810c10df>] do_page_fault+0x2f/0x80 [<ffffffff810b5b07>] do_async_page_fault+0x27/0xa0 [<ffffffff83296c48>] async_page_fault+0x28/0x30 Code: 00 80 41 54 53 49 01 fd 48 0f 42 05 b0 39 67 02 48 89 fb 49 01 c5 48 b8 00 00 00 00 00 ea ff ff 49 c1 ed 0c 49 c1 e5 06 49 01 c5 <49> 8b 45 20 48 8d 50 ff a8 01 4c 0f 45 ea 49 8b 55 20 48 8d 42 RIP [<ffffffff8139c67b>] kfree+0x4b/0x140 RSP <ffff88006920f3f0> CR2: ffffeb0400072da0 ---[ end trace f3d59a04bafec038 ]--- Cc: Al Viro <[email protected]> Signed-off-by: Vegard Nossum <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2016-08-09virtio: fix error handling for debug buildsMichael S. Tsirkin1-0/+1
On error, virtqueue_add calls START_USE but not END_USE. Thankfully that's normally empty anyway, but might not be when debugging. Fix it up. Signed-off-by: Michael S. Tsirkin <[email protected]>
2016-08-09virtio: fix memory leak in virtqueue_add()Wei Yongjun1-0/+2
When using the indirect buffers feature, 'desc' is allocated in virtqueue_add() but isn't freed before leaving on a ring full error, causing a memory leak. For example, it seems rather clear that this can trigger with virtio net if mergeable buffers are not used. Cc: [email protected] Signed-off-by: Wei Yongjun <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2016-08-09arm64: Support hard limit of cpu count by nr_cpusKefeng Wang1-4/+4
Enable the hard limit of cpu count by set boot options nr_cpus=x on arm64, and make a minor change about message when total number of cpu exceeds the limit. Reviewed-by: Suzuki K Poulose <[email protected]> Reported-by: Shiyuan Hu <[email protected]> Signed-off-by: Kefeng Wang <[email protected]> Signed-off-by: Will Deacon <[email protected]>
2016-08-09powerpc/pnv/pci: Fix incorrect PE reservation attempt on some 64-bit BARsBenjamin Herrenschmidt1-10/+20
The generic allocation code may sometimes decide to assign a prefetchable 64-bit BAR to the M32 window. In fact it may also decide to allocate a 64-bit non-prefetchable BAR to the M64 one ! So using the resource flags as a test to decide which window was used for PE allocation is just wrong and leads to insane PE numbers. Instead, compare the addresses to figure it out. Signed-off-by: Benjamin Herrenschmidt <[email protected]> [mpe: Rename the function as agreed by Ben & Gavin] Signed-off-by: Michael Ellerman <[email protected]>
2016-08-09powerpc/book3s: Fix MCE console messages for unrecoverable MCE.Mahesh Salgaonkar2-1/+3
When machine check occurs with MSR(RI=0), it means MC interrupt is unrecoverable and kernel goes down to panic path. But the console message still shows it as recovered. This patch fixes the MCE console messages. Fixes: 36df96f8acaf ("powerpc/book3s: Decode and save machine check event.") Signed-off-by: Mahesh Salgaonkar <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-08-09timers: Fix get_next_timer_interrupt() computationChris Metcalf1-1/+4
The tick_nohz_stop_sched_tick() routine is not properly canceling the sched timer when nothing is pending, because get_next_timer_interrupt() is no longer returning KTIME_MAX in that case. This causes periodic interrupts when none are needed. When determining the next interrupt time, we first use __next_timer_interrupt() to get the first expiring timer in the timer wheel. If no timer is found, we return the base clock value plus NEXT_TIMER_MAX_DELTA to indicate there is no timer in the timer wheel. Back in get_next_timer_interrupt(), we set the "expires" value by converting the timer wheel expiry (in ticks) to a nsec value. But we don't want to do this if the timer wheel expiry value indicates no timer; we want to return KTIME_MAX. Prior to commit 500462a9de65 ("timers: Switch to a non-cascading wheel") we checked base->active_timers to see if any timers were active, and if not, we didn't touch the expiry value and so properly returned KTIME_MAX. Now we don't have active_timers. To fix this, we now just check the timer wheel expiry value to see if it is "now + NEXT_TIMER_MAX_DELTA", and if it is, we don't try to compute a new value based on it, but instead simply let the KTIME_MAX value in expires remain. Fixes: 500462a9de65 "timers: Switch to a non-cascading wheel" Signed-off-by: Chris Metcalf <[email protected]> Cc: Frederic Weisbecker <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: John Stultz <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-08-09genirq/msi: Make sure PCI MSIs are activated earlyMarc Zyngier3-0/+15
Bharat Kumar Gogada reported issues with the generic MSI code, where the end-point ended up with garbage in its MSI configuration (both for the vector and the message). It turns out that the two MSI paths in the kernel are doing slightly different things: generic MSI: disable MSI -> allocate MSI -> enable MSI -> setup EP PCI MSI: disable MSI -> allocate MSI -> setup EP -> enable MSI And it turns out that end-points are allowed to latch the content of the MSI configuration registers as soon as MSIs are enabled. In Bharat's case, the end-point ends up using whatever was there already, which is not what you want. In order to make things converge, we introduce a new MSI domain flag (MSI_FLAG_ACTIVATE_EARLY) that is unconditionally set for PCI/MSI. When set, this flag forces the programming of the end-point as soon as the MSIs are allocated. A consequence of this is that we have an extra activate in irq_startup, but that should be without much consequence. tglx: - Several people reported a VMWare regression with PCI/MSI-X passthrough. It turns out that the patch also cures that issue. - We need to have a look at the MSI disable interrupt path, where we write the msg to all zeros without disabling MSI in the PCI device. Is that correct? Fixes: 52f518a3a7c2 "x86/MSI: Use hierarchical irqdomains to manage MSI interrupts" Reported-and-tested-by: Bharat Kumar Gogada <[email protected]> Reported-and-tested-by: Foster Snowhill <[email protected]> Reported-by: Matthias Prager <[email protected]> Reported-by: Jason Taylor <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Thomas Gleixner <[email protected]>
2016-08-09powerpc/pci: Fix endian bug in fixed PHB numberingMichael Ellerman1-2/+5
The recent commit 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on device-tree properties"), added code to read a 64-bit property from the device tree, and if not found read a 32-bit property (reg). There was a bug in the 32-bit case, on big endian machines, due to the use of the 64-bit value to read the 32-bit property. The cast of &prop means we end up writing to the high 32-bit of prop, leaving the low 32-bits containing whatever junk was on the stack. If that junk value was non-zero, and < MAX_PHBS, we would end up using it as the PHB id. This results in users seeing what appear to be random PHB ids. Fix it by reading into a u32 property and then assigning that to the u64 value, letting the CPU do the correct conversions for us. Fixes: 63a72284b159 ("powerpc/pci: Assign fixed PHB number based on device-tree properties") Signed-off-by: Michael Ellerman <[email protected]>
2016-08-09powerpc/eeh: Switch to conventional PCI address output in EEH logGuilherme G. Piccoli1-2/+2
This is a very minor/trivial fix for the output of PCI address on EEH logs. The PCI address on "OF node" field currently is using ":" as a separator for the function, but the usual separator is ".". This patch changes the separator to dot, so the PCI address is printed as usual. Signed-off-by: Guilherme G. Piccoli <[email protected]> Reviewed-by: Gavin Shan <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-08-09cxl: Fix sparse warningsAndrew Donnellan2-2/+2
Make native_irq_wait() static and use NULL rather than 0 to initialise phb->cfg_data in cxl_pci_vphb_add() to remove sparse warnings. Signed-off-by: Andrew Donnellan <[email protected]> Reviewed-by: Matthew R. Ochs <[email protected]> Acked-by: Ian Munsie <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-08-09cxl: Fix NULL dereference in cxl_context_init() on PowerVM guestsAndrew Donnellan3-4/+4
Commit f67a6722d650 ("cxl: Workaround PE=0 hardware limitation in Mellanox CX4") added a "min_pe" field to struct cxl_service_layer_ops, to allow us to work around a Mellanox CX-4 hardware limitation. When allocating the PE number in cxl_context_init(), we read from ctx->afu->adapter->native->sl_ops->min_pe to get the minimum PE number. Unsurprisingly, in a PowerVM guest ctx->afu->adapter->native is NULL, and guests don't have a cxl_service_layer_ops struct anywhere. Move min_pe from struct cxl_service_layer_ops to struct cxl so it's accessible in both native and PowerVM environments. For the Mellanox CX-4, set the min_pe value in set_sl_ops(). Fixes: f67a6722d650 ("cxl: Workaround PE=0 hardware limitation in Mellanox CX4") Reported-by: Frederic Barrat <[email protected]> Signed-off-by: Andrew Donnellan <[email protected]> Acked-by: Ian Munsie <[email protected]> Reviewed-by: Frederic Barrat <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-08-09cxl: Use fixed width predefined types in data structure.Philippe Bergheaud1-2/+2
This patch fixes a regression introduced by commit b810253bd934 ("cxl: Add mechanism for delivering AFU driver specific events"). It changes the type u8 to __u8 in the uapi header cxl.h, because the former is a kernel internal type, and may not be defined in userland build environments, in particular when cross-compiling libcxl on x86_64 linux machines (RHEL6.7 and Ubuntu 16.04). This patch also changes the size of the field data_size, and makes it constant, to support 32-bit userland applications running on big-endian ppc64 kernels transparently. mpe: This is an ABI change, however the ABI was only added during the 4.8 merge window so has never been part of a released kernel - therefore we give ourselves permission to change it. Fixes: b810253bd934 ("cxl: Add mechanism for delivering AFU driver specific events") Signed-off-by: Philippe Bergheaud <[email protected]> Reviewed-by: Frederic Barrat <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-08-09powerpc/vdso: Add missing include fileGuenter Roeck1-0/+1
Some powerpc builds fail with the following buld error. In file included from ./arch/powerpc/include/asm/mmu_context.h:11:0, from arch/powerpc/kernel/vdso.c:28: arch/powerpc/include/asm/cputhreads.h: In function 'get_tensr': arch/powerpc/include/asm/cputhreads.h:101:2: error: implicit declaration of function 'cpu_has_feature' Fixes: b92a226e5284 ("powerpc: Move cpu_has_feature() to a separate file") Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Michael Ellerman <[email protected]>
2016-08-09powerpc: Fix unused function warning 'lmb_to_memblock'Alastair D'Silva1-13/+13
This patch fixes the following warning: arch/powerpc/platforms/pseries/hotplug-memory.c:323:29: error: 'lmb_to_memblock' defined but not used [-Werror=unused-function] static struct memory_block *lmb_to_memblock(struct of_drconf_cell *lmb) ^~~~~~~~~~~~~~~ The only consumer of this function is 'dlpar_remove_lmb', which is enabled with CONFIG_MEMORY_HOTREMOVE, so move it into the same ifdef block. Signed-off-by: Michael Ellerman <[email protected]>