aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-04-17irqchip/sifive-plic: Fix maximum priority threshold valueAtish Patra1-1/+1
As per the PLIC specification, maximum priority threshold value is 0x7 not 0xF. Even though it doesn't cause any error in qemu/hifive unleashed, there may be some implementation which checks the upper bound resulting in an illegal access. Fixes: ccbe80bad571 ("irqchip/sifive-plic: Enable/Disable external interrupts upon cpu online/offline") Signed-off-by: Atish Patra <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-04-17irqchip/ti-sci-inta: Fix processing of masked irqsGrygorii Strashko1-1/+2
The ti_sci_inta_irq_handler() does not take into account INTA IRQs state (masked/unmasked) as it uses INTA_STATUS_CLEAR_j register to get INTA IRQs status, which provides raw status value. This causes hard IRQ handlers to be called or threaded handlers to be scheduled many times even if corresponding INTA IRQ is masked. Above, first of all, affects the LEVEL interrupts processing and causes unexpected behavior up the system stack or crash. Fix it by using the Interrupt Masked Status INTA_STATUSM_j register which provides masked INTA IRQs status. Fixes: 9f1463b86c13 ("irqchip/ti-sci-inta: Add support for Interrupt Aggregator driver") Signed-off-by: Grygorii Strashko <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Reviewed-by: Lokesh Vutla <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: [email protected]
2020-04-17irqchip/mbigen: Free msi_desc on device teardownZenghui Yu1-1/+7
Using irq_domain_free_irqs_common() on the irqdomain free path will leave the MSI descriptor unfreed when platform devices get removed. Properly free it by MSI domain free function. Fixes: 9650c60ebfec0 ("irqchip/mbigen: Create irq domain for each mbigen device") Signed-off-by: Zenghui Yu <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2020-04-17arm/xen: make _xen_start_info staticJason Yan1-1/+1
Fix the following sparse warning: arch/arm64/xen/../../arm/xen/enlighten.c:39:19: warning: symbol '_xen_start_info' was not declared. Should it be static? Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Reviewed-by: Stefano Stabellini <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2020-04-16Merge tag 'nfs-for-5.7-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds1-1/+2
Pull NFS client bugfix from Trond Myklebust: "Fix an ABBA spinlock issue in pnfs_update_layout()" * tag 'nfs-for-5.7-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFS: Fix an ABBA spinlock issue in pnfs_update_layout()
2020-04-16virtio/test: fix up after IOTLB changesMichael S. Tsirkin6-4/+18
Allow building vringh without IOTLB (that's the case for userspace builds, will be useful for CAIF/VOD down the road too). Update for API tweaks. Don't include vringh with userspace builds. Cc: Jason Wang <[email protected]> Cc: Eugenio Pérez <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]>
2020-04-16vhost: Create accessors for virtqueues private_dataEugenio Pérez5-32/+61
Signed-off-by: Eugenio Pérez <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2020-04-16vdpasim: Return status in vdpasim_get_statusYueHaibing1-1/+1
vdpasim->status should acquired under spin lock. Fixes: 870448c31775 ("vdpasim: vDPA device simulator") Signed-off-by: YueHaibing <[email protected]> Acked-by: Jason Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2020-04-16vdpa: remove unused variables 'ifcvf' and 'ifcvf_lm'YueHaibing2-4/+0
drivers/vdpa/ifcvf/ifcvf_main.c:34:24: warning: variable ‘ifcvf’ set but not used [-Wunused-but-set-variable] drivers/vdpa/ifcvf/ifcvf_base.c:304:31: warning: variable ‘ifcvf_lm’ set but not used [-Wunused-but-set-variable] Reported-by: Hulk Robot <[email protected]> Signed-off-by: YueHaibing <[email protected]> Acked-by: Jason Wang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2020-04-16vhost: remove set but not used variable 'status'Jason Yan1-3/+0
Fix the following gcc warning: drivers/vhost/vdpa.c:299:5: warning: variable 'status' set but not used [-Wunused-but-set-variable] u8 status; ^~~~~~ Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Michael S. Tsirkin <[email protected]>
2020-04-16vhost: vdpa: remove unnecessary null checkGustavo A. R. Silva1-2/+0
container_of is never null, so this null check is unnecessary. Addresses-Coverity-ID: 1492006 ("Logically dead code") Fixes: 20453a45fb06 ("vhost: introduce vDPA-based backend") Signed-off-by: Gustavo A. R. Silva <[email protected]> Link: https://lore.kernel.org/r/20200330235040.GA9997@embeddedor Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]>
2020-04-16vdpa-sim: depend on HAS_DMAMichael S. Tsirkin1-1/+1
set_dma_ops isn't available on all architectures: make ARCH=um ... drivers/vdpa/vdpa_sim/vdpa_sim.c: In function 'vdpasim_create': >> drivers/vdpa/vdpa_sim/vdpa_sim.c:324:2: error: implicit declaration of function 'set_dma_ops'; did you mean 'set_groups'? +[-Werror=implicit-function-declaration] set_dma_ops(dev, &vdpasim_dma_ops); ^~~~~~~~~~~ set_groups Disable vdpa-sim on architectures where it isn't. Acked-by: Jason Wang <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2020-04-16Merge tag 'tag-chrome-platform-fixes-for-v5.7-rc2' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux Pull chrome-platform fixes from Benson Leung: "Two small fixes for cros_ec_sensorhub_ring.c, addressing issues introduced in the cros_ec_sensorhub FIFO support commit" * tag 'tag-chrome-platform-fixes-for-v5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux: platform/chrome: cros_ec_sensorhub: Add missing '\n' in log messages platform/chrome: cros_ec_sensorhub: Off by one in cros_sensorhub_send_sample()
2020-04-16Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netLinus Torvalds108-679/+1067
Pull networking fixes from David Miller: 1) Disable RISCV BPF JIT builds when !MMU, from Björn Töpel. 2) nf_tables leaves dangling pointer after free, fix from Eric Dumazet. 3) Out of boundary write in __xsk_rcv_memcpy(), fix from Li RongQing. 4) Adjust icmp6 message source address selection when routes have a preferred source address set, from Tim Stallard. 5) Be sure to validate HSR protocol version when creating new links, from Taehee Yoo. 6) CAP_NET_ADMIN should be sufficient to manage l2tp tunnels even in non-initial namespaces, from Michael Weiß. 7) Missing release firmware call in mlx5, from Eran Ben Elisha. 8) Fix variable type in macsec_changelink(), caught by KASAN. Fix from Taehee Yoo. 9) Fix pause frame negotiation in marvell phy driver, from Clemens Gruber. 10) Record RX queue early enough in tun packet paths such that XDP programs will see the correct RX queue index, from Gilberto Bertin. 11) Fix double unlock in mptcp, from Florian Westphal. 12) Fix offset overflow in ARM bpf JIT, from Luke Nelson. 13) marvell10g needs to soft reset PHY when coming out of low power mode, from Russell King. 14) Fix MTU setting regression in stmmac for some chip types, from Florian Fainelli. * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (101 commits) amd-xgbe: Use __napi_schedule() in BH context mISDN: make dmril and dmrim static net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizes net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware mode tipc: fix incorrect increasing of link window Documentation: Fix tcp_challenge_ack_limit default value net: tulip: make early_486_chipsets static dt-bindings: net: ethernet-phy: add desciption for ethernet-phy-id1234.d400 ipv6: remove redundant assignment to variable err net/rds: Use ERR_PTR for rds_message_alloc_sgs() net: mscc: ocelot: fix untagged packet drops when enslaving to vlan aware bridge selftests/bpf: Check for correct program attach/detach in xdp_attach test libbpf: Fix type of old_fd in bpf_xdp_set_link_opts libbpf: Always specify expected_attach_type on program load if supported xsk: Add missing check on user supplied headroom size mac80211: fix channel switch trigger from unknown mesh peer mac80211: fix race in ieee80211_register_hw() net: marvell10g: soft-reset the PHY when coming out of low power net: marvell10g: report firmware version net/cxgb4: Check the return from t4_query_params properly ...
2020-04-16amd-xgbe: Use __napi_schedule() in BH contextSebastian Andrzej Siewior1-1/+1
The driver uses __napi_schedule_irqoff() which is fine as long as it is invoked with disabled interrupts by everybody. Since the commit mentioned below the driver may invoke xgbe_isr_task() in tasklet/softirq context. This may lead to list corruption if another driver uses __napi_schedule_irqoff() in IRQ context. Use __napi_schedule() which safe to use from IRQ and softirq context. Fixes: 85b85c853401d ("amd-xgbe: Re-issue interrupt if interrupt status not cleared") Signed-off-by: Sebastian Andrzej Siewior <[email protected]> Acked-by: Tom Lendacky <[email protected]> Cc: Tom Lendacky <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-16mISDN: make dmril and dmrim staticJason Yan1-2/+2
Fix the following sparse warning: drivers/isdn/hardware/mISDN/mISDNisar.c:746:12: warning: symbol 'dmril' was not declared. Should it be static? drivers/isdn/hardware/mISDN/mISDNisar.c:749:12: warning: symbol 'dmrim' was not declared. Should it be static? Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-16net: stmmac: dwmac-sunxi: Provide TX and RX fifo sizesFlorian Fainelli1-0/+2
After commit bfcb813203e619a8960a819bf533ad2a108d8105 ("net: dsa: configure the MTU for switch ports") my Lamobo R1 platform which uses an allwinner,sun7i-a20-gmac compatible Ethernet MAC started to fail by rejecting a MTU of 1536. The reason for that is that the DMA capabilities are not readable on this version of the IP, and there is also no 'tx-fifo-depth' property being provided in Device Tree. The property is documented as optional, and is not provided. Chen-Yu indicated that the FIFO sizes are 4KB for TX and 16KB for RX, so provide these values through platform data as an immediate fix until various Device Tree sources get updated accordingly. Fixes: eaf4fac47807 ("net: stmmac: Do not accept invalid MTU values") Suggested-by: Chen-Yu Tsai <[email protected]> Signed-off-by: Florian Fainelli <[email protected]> Acked-by: Chen-Yu Tsai <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-16net: dsa: mt7530: fix tagged frames pass-through in VLAN-unaware modeDENG Qingfang2-6/+19
In VLAN-unaware mode, the Egress Tag (EG_TAG) field in Port VLAN Control register must be set to Consistent to let tagged frames pass through as is, otherwise their tags will be stripped. Fixes: 83163f7dca56 ("net: dsa: mediatek: add VLAN support for MT7530") Signed-off-by: DENG Qingfang <[email protected]> Reviewed-by: Florian Fainelli <[email protected]> Tested-by: René van Dorst <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-04-16dt-bindings: Fix misspellings of "Analog Devices"Geert Uytterhoeven4-6/+6
According to https://www.analog.com/, the company name is spelled "Analog Devices". Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: Rob Herring <[email protected]>
2020-04-16Merge tag 'selinux-pr-20200416' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull SELinux fix from Paul Moore: "One small SELinux fix to ensure we cleanup properly on an error condition" * tag 'selinux-pr-20200416' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: free str on error in str_read()
2020-04-16Merge tag 'ceph-for-5.7-rc2' of git://github.com/ceph/ceph-clientLinus Torvalds4-19/+24
Pull ceph fixes from Ilya Dryomov: - a set of patches for a deadlock on "rbd map" error path - a fix for invalid pointer dereference and uninitialized variable use on asynchronous create and unlink error paths. * tag 'ceph-for-5.7-rc2' of git://github.com/ceph/ceph-client: ceph: fix potential bad pointer deref in async dirops cb's rbd: don't mess with a page vector in rbd_notify_op_lock() rbd: don't test rbd_dev->opts in rbd_dev_image_release() rbd: call rbd_dev_unprobe() after unwatching and flushing notifies rbd: avoid a deadlock on header_rwsem when flushing notifies
2020-04-16smb3: remove overly noisy debug line in signing errorsSteve French1-2/+2
A dump_stack call for signature related errors can be too noisy and not of much value in debugging such problems. Signed-off-by: Steve French <[email protected]> Reviewed-by: Shyam Prasad N <[email protected]>
2020-04-16Merge tag 'trace-v5.7-rc1' of ↵Linus Torvalds1-7/+3
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fix from Steven Rostedt: "This fixes a small race between allocating a snapshot buffer and setting the snapshot trigger. On a slow machine, the trigger can occur before the snapshot is allocated causing a warning to be displayed in the ring buffer, and no snapshot triggering. Reversing the allocation and the enabling of the trigger fixes the problem" * tag 'trace-v5.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tracing: Fix the race between registering 'snapshot' event trigger and triggering 'snapshot' operation
2020-04-16keys: Fix proc_keys_next to increase position indexVasily Averin1-0/+2
If seq_file .next function does not change position index, read after some lseek can generate unexpected output: $ dd if=/proc/keys bs=1 # full usual output 0f6bfdf5 I--Q--- 2 perm 3f010000 1000 1000 user 4af2f79ab8848d0a: 740 1fb91b32 I--Q--- 3 perm 1f3f0000 1000 65534 keyring _uid.1000: 2 27589480 I--Q--- 1 perm 0b0b0000 0 0 user invocation_id: 16 2f33ab67 I--Q--- 152 perm 3f030000 0 0 keyring _ses: 2 33f1d8fa I--Q--- 4 perm 3f030000 1000 1000 keyring _ses: 1 3d427fda I--Q--- 2 perm 3f010000 1000 1000 user 69ec44aec7678e5a: 740 3ead4096 I--Q--- 1 perm 1f3f0000 1000 65534 keyring _uid_ses.1000: 1 521+0 records in 521+0 records out 521 bytes copied, 0,00123769 s, 421 kB/s But a read after lseek in middle of last line results in the partial last line and then a repeat of the final line: $ dd if=/proc/keys bs=500 skip=1 dd: /proc/keys: cannot skip to specified offset g _uid_ses.1000: 1 3ead4096 I--Q--- 1 perm 1f3f0000 1000 65534 keyring _uid_ses.1000: 1 0+1 records in 0+1 records out 97 bytes copied, 0,000135035 s, 718 kB/s and a read after lseek beyond end of file results in the last line being shown: $ dd if=/proc/keys bs=1000 skip=1 # read after lseek beyond end of file dd: /proc/keys: cannot skip to specified offset 3ead4096 I--Q--- 1 perm 1f3f0000 1000 65534 keyring _uid_ses.1000: 1 0+1 records in 0+1 records out 76 bytes copied, 0,000119981 s, 633 kB/s See https://bugzilla.kernel.org/show_bug.cgi?id=206283 Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code ...") Signed-off-by: Vasily Averin <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Jarkko Sakkinen <[email protected]> Cc: [email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-04-16ahci: Add Intel Comet Lake PCH-U PCI IDKai-Heng Feng1-0/+1
Add Intel Comet Lake PCH-U PCI ID to the list of supported controllers. Set default SATA LPM so the SoC can enter S0ix. Signed-off-by: Kai-Heng Feng <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-04-16xfs: move inode flush to the sync workqueueDarrick J. Wong2-19/+27
Move the inode dirty data flushing to a workqueue so that multiple threads can take advantage of a single thread's flushing work. The ratelimiting technique used in bdd4ee4 was not successful, because threads that skipped the inode flush scan due to ratelimiting would ENOSPC early, which caused occasional (but noticeable) changes in behavior and sporadic fstest regressions. Therefore, make all the writer threads wait on a single inode flush, which eliminates both the stampeding hordes of flushers and the small window in which a write could fail with ENOSPC because it lost the ratelimit race after even another thread freed space. Fixes: c6425702f21e ("xfs: ratelimit inode flush on buffered write ENOSPC") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Brian Foster <[email protected]>
2020-04-16blk-mq: Put driver tag in blk_mq_dispatch_rq_list() when no budgetJohn Garry1-1/+3
If in blk_mq_dispatch_rq_list() we find no budget, then we break of the dispatch loop, but the request may keep the driver tag, evaulated in 'nxt' in the previous loop iteration. Fix by putting the driver tag for that request. Reviewed-by: Ming Lei <[email protected]> Signed-off-by: John Garry <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-04-16perf intel-pt: Add support for synthesizing callchains for regular eventsAdrian Hunter1-7/+61
Currently, callchains can be synthesized only for synthesized events. Support also synthesizing callchains for regular events. Example: # perf record --kcore --aux-sample -e '{intel_pt//,cycles}' -c 10000 uname Linux [ perf record: Woken up 3 times to write data ] [ perf record: Captured and wrote 0.532 MB perf.data ] # perf script --itrace=Ge | head -20 uname 4864 2419025.358181: 10000 cycles: ffffffffbba56965 apparmor_bprm_committing_creds+0x35 ([kernel.kallsyms]) ffffffffbc400cd5 __indirect_thunk_start+0x5 ([kernel.kallsyms]) ffffffffbba07422 security_bprm_committing_creds+0x22 ([kernel.kallsyms]) ffffffffbb89805d install_exec_creds+0xd ([kernel.kallsyms]) ffffffffbb90d9ac load_elf_binary+0x3ac ([kernel.kallsyms]) uname 4864 2419025.358185: 10000 cycles: ffffffffbba56db0 apparmor_bprm_committed_creds+0x20 ([kernel.kallsyms]) ffffffffbc400cd5 __indirect_thunk_start+0x5 ([kernel.kallsyms]) ffffffffbba07452 security_bprm_committed_creds+0x22 ([kernel.kallsyms]) ffffffffbb89809a install_exec_creds+0x4a ([kernel.kallsyms]) ffffffffbb90d9ac load_elf_binary+0x3ac ([kernel.kallsyms]) uname 4864 2419025.358189: 10000 cycles: ffffffffbb86fdf6 vma_adjust_trans_huge+0x6 ([kernel.kallsyms]) ffffffffbb821660 __vma_adjust+0x160 ([kernel.kallsyms]) ffffffffbb897be7 shift_arg_pages+0x97 ([kernel.kallsyms]) ffffffffbb897ed9 setup_arg_pages+0x1e9 ([kernel.kallsyms]) ffffffffbb90d9f2 load_elf_binary+0x3f2 ([kernel.kallsyms]) Committer testing: # perf record --kcore --aux-sample -e '{intel_pt//,cycles}' -c 10000 uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.233 MB perf.data ] # Then, before this patch: # perf script --itrace=Ge | head -20 uname 28642 168664.856384: 10000 cycles: ffffffff9810aeaa commit_creds+0x2a ([kernel.kallsyms]) uname 28642 168664.856388: 10000 cycles: ffffffff982a24f1 mprotect_fixup+0x151 ([kernel.kallsyms]) uname 28642 168664.856392: 10000 cycles: ffffffff982a385b move_page_tables+0xbcb ([kernel.kallsyms]) uname 28642 168664.856396: 10000 cycles: ffffffff982fd4ec __mod_memcg_state+0x1c ([kernel.kallsyms]) uname 28642 168664.856400: 10000 cycles: ffffffff9829fddd do_mmap+0xfd ([kernel.kallsyms]) uname 28642 168664.856404: 10000 cycles: ffffffff9829c879 __vma_adjust+0x479 ([kernel.kallsyms]) uname 28642 168664.856408: 10000 cycles: ffffffff98238e94 __perf_addr_filters_adjust+0x34 ([kernel.kallsyms]) uname 28642 168664.856412: 10000 cycles: ffffffff98a38e0b down_write+0x1b ([kernel.kallsyms]) uname 28642 168664.856416: 10000 cycles: ffffffff983006a0 memcg_kmem_get_cache+0x0 ([kernel.kallsyms]) uname 28642 168664.856421: 10000 cycles: ffffffff98396eaf load_elf_binary+0x92f ([kernel.kallsyms]) uname 28642 168664.856425: 10000 cycles: ffffffff982e0222 kfree+0x62 ([kernel.kallsyms]) uname 28642 168664.856428: 10000 cycles: ffffffff9846dfd4 file_has_perm+0x54 ([kernel.kallsyms]) uname 28642 168664.856433: 10000 cycles: ffffffff98288911 vma_interval_tree_insert+0x51 ([kernel.kallsyms]) uname 28642 168664.856437: 10000 cycles: ffffffff9823e577 perf_event_mmap_output+0x27 ([kernel.kallsyms]) uname 28642 168664.856441: 10000 cycles: ffffffff98a26fa0 xas_load+0x40 ([kernel.kallsyms]) uname 28642 168664.856445: 10000 cycles: ffffffff98004f30 arch_setup_additional_pages+0x0 ([kernel.kallsyms]) uname 28642 168664.856448: 10000 cycles: ffffffff98a297c0 copy_user_generic_unrolled+0xa0 ([kernel.kallsyms]) uname 28642 168664.856452: 10000 cycles: ffffffff9853a87a strnlen_user+0x10a ([kernel.kallsyms]) uname 28642 168664.856456: 10000 cycles: ffffffff986638a7 randomize_page+0x27 ([kernel.kallsyms]) uname 28642 168664.856460: 10000 cycles: ffffffff98a3b645 _raw_spin_lock+0x5 ([kernel.kallsyms]) # And after: # perf script --itrace=Ge | head -20 uname 28642 168664.856384: 10000 cycles: ffffffff9810aeaa commit_creds+0x2a ([kernel.kallsyms]) ffffffff9831fe87 install_exec_creds+0x17 ([kernel.kallsyms]) ffffffff983968d9 load_elf_binary+0x359 ([kernel.kallsyms]) ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms]) ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms]) uname 28642 168664.856388: 10000 cycles: ffffffff982a24f1 mprotect_fixup+0x151 ([kernel.kallsyms]) ffffffff9831fa83 setup_arg_pages+0x123 ([kernel.kallsyms]) ffffffff9839691f load_elf_binary+0x39f ([kernel.kallsyms]) ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms]) ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms]) uname 28642 168664.856392: 10000 cycles: ffffffff982a385b move_page_tables+0xbcb ([kernel.kallsyms]) ffffffff9831f889 shift_arg_pages+0xa9 ([kernel.kallsyms]) ffffffff9831fb4f setup_arg_pages+0x1ef ([kernel.kallsyms]) ffffffff9839691f load_elf_binary+0x39f ([kernel.kallsyms]) ffffffff98e00c45 __x86_indirect_thunk_rax+0x5 ([kernel.kallsyms]) # Signed-off-by: Adrian Hunter <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf evsel: Add support for synthesized sample typeAdrian Hunter1-1/+14
For reporting purposes, an evsel sample can have a callchain synthesized from AUX area data. Add support for keeping track of synthesized sample types. Note, the recorded sample_type cannot be changed because it is needed to continue to parse events. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf evsel: Be consistent when looking which evsel PERF_SAMPLE_ bits are setAdrian Hunter1-1/+1
Using 'type' variable for checking for callchains is equivalent to using evsel__has_callchain(evsel) and is how the other PERF_SAMPLE_ bits are checked in this function, so use it to be consistent. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] [ split from a larger patch ] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf thread-stack: Add thread_stack__sample_late()Adrian Hunter2-0/+60
Add a thread stack function to create a call chain for hardware events where the sample records get created some time after the event occurred. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf auxtrace: Add an option to synthesize callchains for regular eventsAdrian Hunter6-4/+12
Currently, callchains can be synthesized only for synthesized events. Add an itrace option to synthesize callchains for regular events. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf auxtrace: For reporting purposes, un-group AUX area eventAdrian Hunter1-5/+55
An AUX area event must be the group leader when recording traces in sample mode, but that does not produce the expected results from 'perf report' because it expects the leader to provide samples. Rather than teach 'perf report' about AUX area sampling, un-group the AUX area event during processing, making the 2nd event the leader. Example: $ perf record -e '{intel_pt//u,branch-misses:u}' -c 1 uname Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.080 MB perf.data ] Before: $ perf report Samples: 800 of events 'anon group { intel_pt//u, branch-misses:u }', Event count (approx.): 800 Children Self Command Shared Object Symbol 0.00% 47.50% 0.00% 47.50% uname libc-2.28.so [.] _dl_addr 0.00% 16.38% 0.00% 16.38% uname ld-2.28.so [.] __GI___tunables_init 0.00% 54.75% 0.00% 4.75% uname ld-2.28.so [.] dl_main 0.00% 3.12% 0.00% 3.12% uname ld-2.28.so [.] _dl_map_object_from_fd 0.00% 2.38% 0.00% 2.38% uname ld-2.28.so [.] strcmp 0.00% 2.25% 0.00% 2.25% uname ld-2.28.so [.] _dl_check_map_versions 0.00% 2.00% 0.00% 2.00% uname ld-2.28.so [.] _dl_important_hwcaps 0.00% 2.00% 0.00% 2.00% uname ld-2.28.so [.] _dl_map_object_deps 0.00% 51.50% 0.00% 1.50% uname ld-2.28.so [.] _dl_sysdep_start 0.00% 1.25% 0.00% 1.25% uname ld-2.28.so [.] _dl_load_cache_lookup 0.00% 51.12% 0.00% 1.12% uname ld-2.28.so [.] _dl_start 0.00% 50.88% 0.00% 1.12% uname ld-2.28.so [.] do_lookup_x 0.00% 50.62% 0.00% 1.00% uname ld-2.28.so [.] _dl_lookup_symbol_x 0.00% 1.00% 0.00% 1.00% uname ld-2.28.so [.] _dl_map_object 0.00% 1.00% 0.00% 1.00% uname ld-2.28.so [.] _dl_next_ld_env_entry 0.00% 0.88% 0.00% 0.88% uname ld-2.28.so [.] _dl_cache_libcmp 0.00% 0.88% 0.00% 0.88% uname ld-2.28.so [.] _dl_new_object 0.00% 50.88% 0.00% 0.88% uname ld-2.28.so [.] _dl_relocate_object 0.00% 0.62% 0.00% 0.62% uname ld-2.28.so [.] _dl_init_paths 0.00% 0.62% 0.00% 0.62% uname ld-2.28.so [.] _dl_name_match_p 0.00% 0.50% 0.00% 0.50% uname ld-2.28.so [.] get_common_indeces.constprop.1 0.00% 0.50% 0.00% 0.50% uname ld-2.28.so [.] memmove 0.00% 0.50% 0.00% 0.50% uname ld-2.28.so [.] memset 0.00% 0.50% 0.00% 0.50% uname ld-2.28.so [.] open_verify.constprop.11 0.00% 0.38% 0.00% 0.38% uname ld-2.28.so [.] _dl_check_all_versions 0.00% 0.38% 0.00% 0.38% uname ld-2.28.so [.] _dl_find_dso_for_object 0.00% 0.38% 0.00% 0.38% uname ld-2.28.so [.] init_tls 0.00% 0.25% 0.00% 0.25% uname ld-2.28.so [.] __tunable_get_val 0.00% 0.25% 0.00% 0.25% uname ld-2.28.so [.] _dl_add_to_namespace_list 0.00% 0.25% 0.00% 0.25% uname ld-2.28.so [.] _dl_determine_tlsoffset 0.00% 0.25% 0.00% 0.25% uname ld-2.28.so [.] _dl_discover_osversion 0.00% 0.25% 0.00% 0.25% uname ld-2.28.so [.] calloc@plt 0.00% 0.25% 0.00% 0.25% uname ld-2.28.so [.] malloc 0.00% 0.25% 0.00% 0.25% uname ld-2.28.so [.] malloc@plt 0.00% 0.25% 0.00% 0.25% uname libc-2.28.so [.] _nl_load_locale_from_archive 0.00% 0.25% 0.00% 0.25% uname [unknown] [k] 0xffffffffa3a00010 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] __libc_scratch_buffer_set_array_size 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] _dl_allocate_tls_storage 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] _dl_catch_exception 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] _dl_setup_hash 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] _dl_sort_maps 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] _dl_sysdep_read_whole_file 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] access 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] calloc 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] mmap64 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] openaux 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] rtld_lock_default_lock_recursive 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] rtld_lock_default_unlock_recursive 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] strchr 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] strlen 0.00% 0.12% 0.00% 0.12% uname ld-2.28.so [.] 0x0000000000001080 0.00% 0.12% 0.00% 0.12% uname libc-2.28.so [.] __strchrnul_avx2 0.00% 0.12% 0.00% 0.12% uname libc-2.28.so [.] _nl_normalize_codeset 0.00% 0.12% 0.00% 0.12% uname libc-2.28.so [.] malloc 0.00% 0.12% 0.00% 0.12% uname [unknown] [k] 0xffffffffa3a011f0 0.00% 50.00% 0.00% 0.00% uname ld-2.28.so [.] _dl_start_user 0.00% 50.00% 0.00% 0.00% uname [unknown] [.] 0000000000000000 After: Samples: 800 of event 'branch-misses:u', Event count (approx.): 800 Children Self Command Shared Object Symbol 54.75% 4.75% uname ld-2.28.so [.] dl_main 51.50% 1.50% uname ld-2.28.so [.] _dl_sysdep_start 51.12% 1.12% uname ld-2.28.so [.] _dl_start 50.88% 0.88% uname ld-2.28.so [.] _dl_relocate_object 50.88% 1.12% uname ld-2.28.so [.] do_lookup_x 50.62% 1.00% uname ld-2.28.so [.] _dl_lookup_symbol_x 50.00% 0.00% uname ld-2.28.so [.] _dl_start_user 50.00% 0.00% uname [unknown] [.] 0000000000000000 47.50% 47.50% uname libc-2.28.so [.] _dl_addr 16.38% 16.38% uname ld-2.28.so [.] __GI___tunables_init 3.12% 3.12% uname ld-2.28.so [.] _dl_map_object_from_fd 2.38% 2.38% uname ld-2.28.so [.] strcmp 2.25% 2.25% uname ld-2.28.so [.] _dl_check_map_versions 2.00% 2.00% uname ld-2.28.so [.] _dl_important_hwcaps 2.00% 2.00% uname ld-2.28.so [.] _dl_map_object_deps 1.25% 1.25% uname ld-2.28.so [.] _dl_load_cache_lookup 1.00% 1.00% uname ld-2.28.so [.] _dl_map_object 1.00% 1.00% uname ld-2.28.so [.] _dl_next_ld_env_entry 0.88% 0.88% uname ld-2.28.so [.] _dl_cache_libcmp 0.88% 0.88% uname ld-2.28.so [.] _dl_new_object 0.62% 0.62% uname ld-2.28.so [.] _dl_init_paths 0.62% 0.62% uname ld-2.28.so [.] _dl_name_match_p 0.50% 0.50% uname ld-2.28.so [.] get_common_indeces.constprop.1 0.50% 0.50% uname ld-2.28.so [.] memmove 0.50% 0.50% uname ld-2.28.so [.] memset 0.50% 0.50% uname ld-2.28.so [.] open_verify.constprop.11 0.38% 0.38% uname ld-2.28.so [.] _dl_check_all_versions 0.38% 0.38% uname ld-2.28.so [.] _dl_find_dso_for_object 0.38% 0.38% uname ld-2.28.so [.] init_tls 0.25% 0.25% uname ld-2.28.so [.] __tunable_get_val 0.25% 0.25% uname ld-2.28.so [.] _dl_add_to_namespace_list 0.25% 0.25% uname ld-2.28.so [.] _dl_determine_tlsoffset 0.25% 0.25% uname ld-2.28.so [.] _dl_discover_osversion 0.25% 0.25% uname ld-2.28.so [.] calloc@plt 0.25% 0.25% uname ld-2.28.so [.] malloc 0.25% 0.25% uname ld-2.28.so [.] malloc@plt 0.25% 0.25% uname libc-2.28.so [.] _nl_load_locale_from_archive 0.25% 0.25% uname [unknown] [k] 0xffffffffa3a00010 0.12% 0.12% uname ld-2.28.so [.] __libc_scratch_buffer_set_array_size 0.12% 0.12% uname ld-2.28.so [.] _dl_allocate_tls_storage 0.12% 0.12% uname ld-2.28.so [.] _dl_catch_exception 0.12% 0.12% uname ld-2.28.so [.] _dl_setup_hash 0.12% 0.12% uname ld-2.28.so [.] _dl_sort_maps 0.12% 0.12% uname ld-2.28.so [.] _dl_sysdep_read_whole_file 0.12% 0.12% uname ld-2.28.so [.] access 0.12% 0.12% uname ld-2.28.so [.] calloc 0.12% 0.12% uname ld-2.28.so [.] mmap64 0.12% 0.12% uname ld-2.28.so [.] openaux 0.12% 0.12% uname ld-2.28.so [.] rtld_lock_default_lock_recursive 0.12% 0.12% uname ld-2.28.so [.] rtld_lock_default_unlock_recursive 0.12% 0.12% uname ld-2.28.so [.] strchr 0.12% 0.12% uname ld-2.28.so [.] strlen 0.12% 0.12% uname ld-2.28.so [.] 0x0000000000001080 0.12% 0.12% uname libc-2.28.so [.] __strchrnul_avx2 0.12% 0.12% uname libc-2.28.so [.] _nl_normalize_codeset 0.12% 0.12% uname libc-2.28.so [.] malloc 0.12% 0.12% uname [unknown] [k] 0xffffffffa3a011f0 Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf s390-cpumsf: Implement ->evsel_is_auxtrace() callbackAdrian Hunter2-0/+10
Implement ->evsel_is_auxtrace() callback. Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Thomas Richter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf cs-etm: Implement ->evsel_is_auxtrace() callbackAdrian Hunter1-0/+11
Implement ->evsel_is_auxtrace() callback. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Mathieu Poirier <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf arm-spe: Implement ->evsel_is_auxtrace() callbackAdrian Hunter1-0/+9
Implement ->evsel_is_auxtrace() callback. Signed-off-by: Adrian Hunter <[email protected]> Reviewed-by: Leo Yan <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kim Phillips <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf intel-bts: Implement ->evsel_is_auxtrace() callbackAdrian Hunter1-0/+10
Implement ->evsel_is_auxtrace() callback. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf intel-pt: Implement ->evsel_is_auxtrace() callbackAdrian Hunter1-0/+10
Implement ->evsel_is_auxtrace() callback. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf auxtrace: Add ->evsel_is_auxtrace() callbackAdrian Hunter2-0/+21
Add ->evsel_is_auxtrace() callback to identify if a selected event is an AUX area event. Signed-off-by: Adrian Hunter <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kim Phillips <[email protected]> Cc: Mathieu Poirier <[email protected]> Cc: Thomas Richter <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf script: Add flamegraph.py scriptAndreas Gerstmayr3-0/+129
This script works in tandem with d3-flame-graph to generate flame graphs from perf. It supports two output formats: JSON and HTML (the default). The HTML format will look for a standalone d3-flame-graph template file in /usr/share/d3-flame-graph/d3-flamegraph-base.html and fill in the collected stacks. Usage: perf record -a -g -F 99 sleep 60 perf script report flamegraph Combined: perf script flamegraph -a -F 99 sleep 60 Committer testing: Tested both with "PYTHON=python3" and with the default, that uses python2-devel: Complete set of instructions: $ mkdir /tmp/build/perf $ make PYTHON=python3 -C tools/perf O=/tmp/build/perf install-bin $ export PATH=~/bin:$PATH $ perf record -a -g -F 99 sleep 60 $ perf script report flamegraph Now go and open the generated flamegraph.html file in a browser. At first this required building with PYTHON=python3, but after I reported this Andreas was kind enough to send a patch making it work with both python and python3. Signed-off-by: Andreas Gerstmayr <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Brendan Gregg <[email protected]> Cc: Martin Spier <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf metrictroup: Split the metricgroup__add_metric functionKajol Jain1-25/+35
This patch refactors metricgroup__add_metric function where some part of it move to function metricgroup__add_metric_param. No logic change. Signed-off-by: Kajol Jain <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Anju T Sudhakar <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jin Yao <[email protected]> Cc: Joe Mario <[email protected]> Cc: Kan Liang <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mamatha Inamdar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf expr: Add expr_scanner_ctx objectJiri Olsa3-7/+13
Add the expr_scanner_ctx object to hold user data for the expr scanner. Currently it holds only start_token, Kajol Jain will use it to hold 24x7 runtime param. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Anju T Sudhakar <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jin Yao <[email protected]> Cc: Joe Mario <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mamatha Inamdar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf expr: Add expr_ prefix for parse_ctx and parse_idJiri Olsa5-17/+17
Adding expr_ prefix for parse_ctx and parse_id, to straighten out the expr* namespace. There's no functional change. Signed-off-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Anju T Sudhakar <[email protected]> Cc: Benjamin Herrenschmidt <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jin Yao <[email protected]> Cc: Joe Mario <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Kan Liang <[email protected]> Cc: Madhavan Srinivasan <[email protected]> Cc: Mamatha Inamdar <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Michael Petlan <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Paul Mackerras <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Ravi Bangoria <[email protected]> Cc: Sukadev Bhattiprolu <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf synthetic-events: save 4kb from 2 stack framesIan Rogers1-12/+10
Reuse an existing char buffer to avoid two PATH_MAX sized char buffers. Reduces stack frame sizes by 4kb. perf_event__synthesize_mmap_events before 'sub $0x45b8,%rsp' after 'sub $0x35b8,%rsp'. perf_event__get_comm_ids before 'sub $0x2028,%rsp' after 'sub $0x1028,%rsp'. The performance impact of this change is negligible. Signed-off-by: Ian Rogers <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrey Zhizhikin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16tools api fs: Make xxx__mountpoint() more scalableStephane Eranian2-0/+29
The xxx_mountpoint() interface provided by fs.c finds mount points for common pseudo filesystems. The first time xxx_mountpoint() is invoked, it scans the mount table (/proc/mounts) looking for a match. If found, it is cached. The price to scan /proc/mounts is paid once if the mount is found. When the mount point is not found, subsequent calls to xxx_mountpoint() scan /proc/mounts over and over again. There is no caching. This causes a scaling issue in perf record with hugeltbfs__mountpoint(). The function is called for each process found in synthesize__mmap_events(). If the machine has thousands of processes and if the /proc/mounts has many entries this could cause major overhead in perf record. We have observed multi-second slowdowns on some configurations. As an example on a laptop: Before: $ sudo umount /dev/hugepages $ strace -e trace=openat -o /tmp/tt perf record -a ls $ fgrep mounts /tmp/tt 285 After: $ sudo umount /dev/hugepages $ strace -e trace=openat -o /tmp/tt perf record -a ls $ fgrep mounts /tmp/tt 1 One could argue that the non-caching in case the moint point is not found is intentional. That way subsequent calls may discover a moint point if the sysadmin mounts the filesystem. But the same argument could be made against caching the mount point. It could be unmounted causing errors. It all depends on the intent of the interface. This patch assumes it is expected to scan /proc/mounts once. The patch documents the caching behavior in the fs.h header file. An alternative would be to just fix perf record. But it would solve the problem with hugetlbs__mountpoint() but there could be similar issues (possibly down the line) with other xxx_mountpoint() calls in perf or other tools. Signed-off-by: Stephane Eranian <[email protected]> Reviewed-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrey Zhizhikin <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Ian Rogers <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf bench: Add event synthesis benchmarkIan Rogers5-2/+117
Event synthesis may occur at the start or end (tail) of a perf command. In system-wide mode it can scan every process in /proc, which may add seconds of latency before event recording. Add a new benchmark that times how long event synthesis takes with and without data synthesis. An example execution looks like: $ perf bench internals synthesize # Running 'internals/synthesize' benchmark: Average synthesis took: 168.253800 usec Average data synthesis took: 208.104700 usec Signed-off-by: Ian Rogers <[email protected]> Acked-by: Jiri Olsa <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andrey Zhizhikin <[email protected]> Cc: Kan Liang <[email protected]> Cc: Kefeng Wang <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Petr Mladek <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16perf script: Simplify auxiliary event printing functionsAdrian Hunter1-238/+66
This simplifies the print functions for the following perf script options: --show-task-events --show-namespace-events --show-cgroup-events --show-mmap-events --show-switch-events --show-lost-events --show-bpf-events Example: # perf record --switch-events -a -e cycles -c 10000 sleep 1 Before: # perf script --show-task-events --show-namespace-events --show-cgroup-events --show-mmap-events --show-switch-events --show-lost-events --show-bpf-events > out-before.txt After: # perf script --show-task-events --show-namespace-events --show-cgroup-events --show-mmap-events --show-switch-events --show-lost-events --show-bpf-events > out-after.txt # diff -s out-before.txt out-after.txt Files out-before.txt and out-after.tx are identical Signed-off-by: Adrian Hunter <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16doc/admin-guide: update kernel.rst with CAP_PERFMON informationAlexey Budankov1-5/+11
Update the kernel.rst documentation file with the information related to usage of CAP_PERFMON capability to secure performance monitoring and observability operations in system. Signed-off-by: Alexey Budankov <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Igor Lubashev <[email protected]> Cc: James Morris <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16doc/admin-guide: Update perf-security.rst with CAP_PERFMON informationAlexey Budankov1-25/+61
Update perf-security.rst documentation file with the information related to usage of CAP_PERFMON capability to secure performance monitoring and observability operations in system. Committer notes: While testing 'perf top' under cap_perfmon I noticed that it needs some more capability and Alexey pointed out cap_ipc_lock, as needed by this kernel chunk: kernel/events/core.c: 6101 if ((locked > lock_limit) && perf_is_paranoid() && !capable(CAP_IPC_LOCK)) { ret = -EPERM; goto unlock; } So I added it to the documentation, and also mentioned that if the libcap version doesn't yet supports 'cap_perfmon', its numeric value can be used instead, i.e. if: # setcap "cap_perfmon,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf Fails, try: # setcap "38,cap_ipc_lock,cap_sys_ptrace,cap_syslog=ep" perf I also added a paragraph stating that using an unpatched libcap will fail the check for CAP_PERFMON, as it checks the cap number against a maximum to see if it is valid, which makes it use as the default the 'cycles:u' event, even tho a cap_perfmon capable perf binary can get kernel samples, to workaround that just use, e.g.: # perf top -e cycles # perf record -e cycles And it will sample kernel and user modes. Signed-off-by: Alexey Budankov <[email protected]> Tested-by: Arnaldo Carvalho de Melo <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Igor Lubashev <[email protected]> Cc: James Morris <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-04-16drivers/oprofile: Open access for CAP_PERFMON privileged processAlexey Budankov1-1/+1
Open access to monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to the monitoring remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <[email protected]> Acked-by: James Morris <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Igor Lubashev <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Song Liu <[email protected]> Cc: Stephane Eranian <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>