aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-11-23mm/hmm: make full use of walk_page_range()Ralph Campbell1-64/+56
hmm_range_fault() calls find_vma() and walk_page_range() in a loop. This is unnecessary duplication since walk_page_range() calls find_vma() in a loop already. Simplify hmm_range_fault() by defining a walk_test() callback function to filter unhandled vmas. This also fixes a bug where hmm_range_fault() was not checking start >= vma->vm_start before checking vma->vm_flags so hmm_range_fault() could return an error based on the wrong vma for the requested range. It also fixes a bug when the vma has no read access and the caller did not request a fault, there shouldn't be any error return code. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ralph Campbell <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-23xen/gntdev: use mmu_interval_notifier_insertJason Gunthorpe2-138/+49
gntdev simply wants to monitor a specific VMA for any notifier events, this can be done straightforwardly using mmu_interval_notifier_insert() over the VMA's VA range. The notifier should be attached until the original VMA is destroyed. It is unclear if any of this is even sane, but at least a lot of duplicate code is removed. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Boris Ostrovsky <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-23mm/hmm: remove hmm_mirror and relatedJason Gunthorpe4-540/+34
The only two users of this are now converted to use mmu_interval_notifier, delete all the code and update hmm.rst. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jérôme Glisse <[email protected]> Tested-by: Ralph Campbell <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-23drm/amdgpu: Use mmu_interval_notifier instead of hmm_mirrorJason Gunthorpe5-237/+94
Convert the collision-retry lock around hmm_range_fault to use the one now provided by the mmu_interval notifier. Although this driver does not seem to use the collision retry lock that hmm provides correctly, it can still be converted over to use the mmu_interval_notifier api instead of hmm_mirror without too much trouble. This also deletes another place where a driver is associating additional data (struct amdgpu_mn) with a mmu_struct. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Philip Yang <[email protected]> Tested-by: Philip Yang <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-23drm/amdgpu: Use mmu_interval_insert instead of hmm_mirrorJason Gunthorpe6-281/+77
Remove the interval tree in the driver and rely on the tree maintained by the mmu_notifier for delivering mmu_notifier invalidation callbacks. For some reason amdgpu has a very complicated arrangement where it tries to prevent duplicate entries in the interval_tree, this is not necessary, each amdgpu_bo can be its own stand alone entry. interval_tree already allows duplicates and overlaps in the tree. Also, there is no need to remove entries upon a release callback, the mmu_interval API safely allows objects to remain registered beyond the lifetime of the mm. The driver only has to stop touching the pages during release. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Philip Yang <[email protected]> Tested-by: Philip Yang <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-23drm/amdgpu: Call find_vma under mmap_semJason Gunthorpe1-16/+21
find_vma() must be called under the mmap_sem, reorganize this code to do the vma check after entering the lock. Further, fix the unlocked use of struct task_struct's mm, instead use the mm from hmm_mirror which has an active mm_grab. Also the mm_grab must be converted to a mm_get before acquiring mmap_sem or calling find_vma(). Fixes: 66c45500bfdc ("drm/amdgpu: use new HMM APIs and helpers") Fixes: 0919195f2b0d ("drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads") Link: https://lore.kernel.org/r/[email protected] Acked-by: Christian König <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Reviewed-by: Philip Yang <[email protected]> Tested-by: Philip Yang <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-23nouveau: use mmu_interval_notifier instead of hmm_mirrorJason Gunthorpe1-80/+99
Remove the hmm_mirror object and use the mmu_interval_notifier API instead for the range, and use the normal mmu_notifier API for the general invalidation callback. While here re-organize the pagefault path so the locking pattern is clear. nouveau is the only driver that uses a temporary range object and instead forwards nearly every invalidation range directly to the HW. While this is not how the mmu_interval_notifier was intended to be used, the overheads on the pagefaulting path are similar to the existing hmm_mirror version. Particularly since the interval tree will be small. Link: https://lore.kernel.org/r/[email protected] Tested-by: Ralph Campbell <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-23nouveau: use mmu_notifier directly for invalidate_range_startJason Gunthorpe1-32/+63
There is no reason to get the invalidate_range_start() callback via an indirection through hmm_mirror, just register a normal notifier directly. Link: https://lore.kernel.org/r/[email protected] Tested-by: Ralph Campbell <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-23drm/radeon: use mmu_interval_notifier_insertJason Gunthorpe2-176/+51
The new API is an exact match for the needs of radeon. For some reason radeon tries to remove overlapping ranges from the interval tree, but interval trees (and mmu_interval_notifier_insert()) support overlapping ranges directly. Simply delete all this code. Since this driver is missing a invalidate_range_end callback, but still calls get_user_pages(), it cannot be correct against all races. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Christian König <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-23RDMA/hfi1: Use mmu_interval_notifier_insert for user_exp_rcvJason Gunthorpe4-93/+60
This converts one of the two users of mmu_notifiers to use the new API. The conversion is fairly straightforward, however the existing use of notifiers here seems to be racey. Link: https://lore.kernel.org/r/[email protected] Tested-by: Dennis Dalessandro <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-23RDMA/odp: Use mmu_interval_notifier_insert()Jason Gunthorpe7-352/+82
Replace the internal interval tree based mmu notifier with the new common mmu_interval_notifier_insert() API. This removes a lot of code and fixes a deadlock that can be triggered in ODP: zap_page_range() mmu_notifier_invalidate_range_start() [..] ib_umem_notifier_invalidate_range_start() down_read(&per_mm->umem_rwsem) unmap_single_vma() [..] __split_huge_page_pmd() mmu_notifier_invalidate_range_start() [..] ib_umem_notifier_invalidate_range_start() down_read(&per_mm->umem_rwsem) // DEADLOCK mmu_notifier_invalidate_range_end() up_read(&per_mm->umem_rwsem) mmu_notifier_invalidate_range_end() up_read(&per_mm->umem_rwsem) The umem_rwsem is held across the range_start/end as the ODP algorithm for invalidate_range_end cannot tolerate changes to the interval tree. However, due to the nested invalidation regions the second down_read() can deadlock if there are competing writers. The new core code provides an alternative scheme to solve this problem. Fixes: ca748c39ea3f ("RDMA/umem: Get rid of per_mm->notifier_count") Link: https://lore.kernel.org/r/[email protected] Tested-by: Artemy Kovalyov <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-23mm/hmm: define the pre-processor related parts of hmm.h even if disabledJason Gunthorpe2-13/+47
Only the function calls are stubbed out with static inlines that always fail. This is the standard way to write a header for an optional component and makes it easier for drivers that only optionally need HMM_MIRROR. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jérôme Glisse <[email protected]> Tested-by: Ralph Campbell <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-23mm/hmm: allow hmm_range to be used with a mmu_interval_notifier or hmm_mirrorJason Gunthorpe2-6/+24
hmm_mirror's handling of ranges does not use a sequence count which results in this bug: CPU0 CPU1 hmm_range_wait_until_valid(range) valid == true hmm_range_fault(range) hmm_invalidate_range_start() range->valid = false hmm_invalidate_range_end() range->valid = true hmm_range_valid(range) valid == true Where the hmm_range_valid() should not have succeeded. Adding the required sequence count would make it nearly identical to the new mmu_interval_notifier. Instead replace the hmm_mirror stuff with mmu_interval_notifier. Co-existence of the two APIs is the first step. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jérôme Glisse <[email protected]> Tested-by: Philip Yang <[email protected]> Tested-by: Ralph Campbell <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-23mm/mmu_notifier: add an interval tree notifierJason Gunthorpe3-26/+620
Of the 13 users of mmu_notifiers, 8 of them use only invalidate_range_start/end() and immediately intersect the mmu_notifier_range with some kind of internal list of VAs. 4 use an interval tree (i915_gem, radeon_mn, umem_odp, hfi1). 4 use a linked list of some kind (scif_dma, vhost, gntdev, hmm) And the remaining 5 either don't use invalidate_range_start() or do some special thing with it. It turns out that building a correct scheme with an interval tree is pretty complicated, particularly if the use case is synchronizing against another thread doing get_user_pages(). Many of these implementations have various subtle and difficult to fix races. This approach puts the interval tree as common code at the top of the mmu notifier call tree and implements a shareable locking scheme. It includes: - An interval tree tracking VA ranges, with per-range callbacks - A read/write locking scheme for the interval tree that avoids sleeping in the notifier path (for OOM killer) - A sequence counter based collision-retry locking scheme to tell device page fault that a VA range is being concurrently invalidated. This is based on various ideas: - hmm accumulates invalidated VA ranges and releases them when all invalidates are done, via active_invalidate_ranges count. This approach avoids having to intersect the interval tree twice (as umem_odp does) at the potential cost of a longer device page fault. - kvm/umem_odp use a sequence counter to drive the collision retry, via invalidate_seq - a deferred work todo list on unlock scheme like RTNL, via deferred_list. This makes adding/removing interval tree members more deterministic - seqlock, except this version makes the seqlock idea multi-holder on the write side by protecting it with active_invalidate_ranges and a spinlock To minimize MM overhead when only the interval tree is being used, the entire SRCU and hlist overheads are dropped using some simple branches. Similarly the interval tree overhead is dropped when in hlist mode. The overhead from the mandatory spinlock is broadly the same as most of existing users which already had a lock (or two) of some sort on the invalidation path. Link: https://lore.kernel.org/r/[email protected] Acked-by: Christian König <[email protected]> Tested-by: Philip Yang <[email protected]> Tested-by: Ralph Campbell <[email protected]> Reviewed-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2019-11-23MIPS: SGI-IP27: Enable ethernet phy on second Origin 200 moduleThomas Bogendoerfer1-0/+22
PROM only enables ethernet PHY on first Origin 200 module, so we must do it ourselves for the second module. Signed-off-by: Thomas Bogendoerfer <[email protected]> Signed-off-by: Paul Burton <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Paul Burton <[email protected]> Cc: James Hogan <[email protected]> Cc: Lee Jones <[email protected]> Cc: David S. Miller <[email protected]> Cc: Srinivas Kandagatla <[email protected]> Cc: Alessandro Zummo <[email protected]> Cc: Alexandre Belloni <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected]
2019-11-23MIPS: PCI: Fix fake subdevice ID for IOC3Thomas Bogendoerfer1-1/+1
Generation of fake subdevice ID had vendor and device ID swapped. Signed-off-by: Thomas Bogendoerfer <[email protected]> Signed-off-by: Paul Burton <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Ralf Baechle <[email protected]> Cc: Paul Burton <[email protected]> Cc: James Hogan <[email protected]> Cc: Lee Jones <[email protected]> Cc: David S. Miller <[email protected]> Cc: Srinivas Kandagatla <[email protected]> Cc: Alessandro Zummo <[email protected]> Cc: Alexandre Belloni <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: Jiri Slaby <[email protected]> Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected] Cc: [email protected]
2019-11-23Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds3-24/+28
Pull last minute virtio bugfixes from Michael Tsirkin: "Minor bugfixes all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: virtio_balloon: fix shrinker count virtio_balloon: fix shrinker scan number of pages virtio_console: allocate inbufs in add_port() only if it is needed virtio_ring: fix return code on DMA mapping fails
2019-11-23net: use rhashtable_lookup() instead of rhashtable_lookup_fast()Taehee Yoo4-6/+6
rhashtable_lookup_fast() internally calls rcu_read_lock() then, calls rhashtable_lookup(). So if rcu_read_lock() is already held, rhashtable_lookup() is enough. Signed-off-by: Taehee Yoo <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-23Merge tag 'wireless-drivers-next-2019-11-22' of ↵Jakub Kicinski96-562/+2252
git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for v5.5 Last set of patches for v5.5. Major features here 802.11ax support for qtnfmac and airtime fairness support to mt76. And naturally smaller fixes and improvements all over. Major changes: qtnfmac * add 802.11ax support in AP mode * enable offload bridging support iwlwifi * support TX/RX antennas reporting mt76 * mt7615 smart carrier sense support * aggregation statistics via debugfs * airtime fairness (ATF) support * mt76x0 OF mac address support ==================== Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-23Merge branch 'nfc-convert-from-txt-to-rst'Jakub Kicinski2-36/+39
Robert Schwebel says: ==================== here is v2 of the series converting the NFC documentation from txt to rst. Thanks to Jonathan and Dave for the input. Changes since (implicit) v1: * replace code-block by more compact :: syntax * really add the rst file to the index ==================== Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-23docs: networking: nfc: change to rst formatRobert Schwebel2-0/+1
Now that the sphinx syntax has been fixed, change the document from txt to rst and add it to the index. Signed-off-by: Robert Schwebel <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-23docs: networking: nfc: fix code block syntaxRobert Schwebel1-8/+8
Silence this warning: Documentation/networking/nfc.rst:113: WARNING: Definition list ends without a blank line; unexpected unindent. Signed-off-by: Robert Schwebel <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-23docs: networking: nfc: fix bullet list syntaxRobert Schwebel1-1/+1
Fix this warning: Documentation/networking/nfc.rst:87: WARNING: Bullet list ends without a blank line; unexpected unindent. Signed-off-by: Robert Schwebel <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-23docs: networking: nfc: change block diagram to sphinx syntaxRobert Schwebel1-24/+25
Change the block diagram to match the sphinx syntax. This will make it possible to switch this file to rst in the future. Signed-off-by: Robert Schwebel <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-23docs: networking: nfc: change headlines to sphinx syntaxRobert Schwebel1-3/+4
The headlines in this file do are not in the standard kernel docu- mentation headline format. Change it, so this file can be switched to rst in the future. Signed-off-by: Robert Schwebel <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-23net: phy: initialise phydev speed and duplex sanelyRussell King1-2/+2
When a phydev is created, the speed and duplex are set to zero and -1 respectively, rather than using the predefined SPEED_UNKNOWN and DUPLEX_UNKNOWN constants. There is a window at initialisation time where we may report link down using the 0/-1 values. Tidy this up and use the predefined constants, so debug doesn't complain with: "Unsupported (update phy-core.c)/Unsupported (update phy-core.c)" when the speed and duplex settings are printed. Signed-off-by: Russell King <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-23net: phy: remove phy_ethtool_sset()Russell King3-62/+2
There are no users of phy_ethtool_sset() in the kernel anymore, and as of commit 3c1bcc8614db ("net: ethernet: Convert phydev advertize and supported from u32 to link mode"), the implementation is slightly buggy - it doesn't correctly check the masked advertising mask as it used to. Remove it, and update the phy documentation to refer to its replacement function. Signed-off-by: Russell King <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-23Revert "bpf: Emit audit messages upon successful prog load and unload"Jakub Kicinski4-38/+1
This commit reverts commit 91e6015b082b ("bpf: Emit audit messages upon successful prog load and unload") and its follow up commit 7599a896f2e4 ("audit: Move audit_log_task declaration under CONFIG_AUDITSYSCALL") as requested by Paul Moore. The change needs close review on linux-audit, tests etc. Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-23kvm: nVMX: Relax guest IA32_FEATURE_CONTROL constraintsJim Mattson1-1/+3
Commit 37e4c997dadf ("KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL") broke the KVM_SET_MSRS ABI by instituting new constraints on the data values that kvm would accept for the guest MSR, IA32_FEATURE_CONTROL. Perhaps these constraints should have been opt-in via a new KVM capability, but they were applied indiscriminately, breaking at least one existing hypervisor. Relax the constraints to allow either or both of FEATURE_CONTROL_VMXON_ENABLED_OUTSIDE_SMX and FEATURE_CONTROL_VMXON_ENABLED_INSIDE_SMX to be set when nVMX is enabled. This change is sufficient to fix the aforementioned breakage. Fixes: 37e4c997dadf ("KVM: VMX: validate individual bits of guest MSR_IA32_FEATURE_CONTROL") Signed-off-by: Jim Mattson <[email protected]> Reviewed-by: Liran Alon <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-23KVM: x86: Grab KVM's srcu lock when setting nested stateSean Christopherson1-0/+3
Acquire kvm->srcu for the duration of ->set_nested_state() to fix a bug where nVMX derefences ->memslots without holding ->srcu or ->slots_lock. The other half of nested migration, ->get_nested_state(), does not need to acquire ->srcu as it is a purely a dump of internal KVM (and CPU) state to userspace. Detected as an RCU lockdep splat that is 100% reproducible by running KVM's state_test selftest with CONFIG_PROVE_LOCKING=y. Note that the failing function, kvm_is_visible_gfn(), is only checking the validity of a gfn, it's not actually accessing guest memory (which is more or less unsupported during vmx_set_nested_state() due to incorrect MMU state), i.e. vmx_set_nested_state() itself isn't fundamentally broken. In any case, setting nested state isn't a fast path so there's no reason to go out of our way to avoid taking ->srcu. ============================= WARNING: suspicious RCU usage 5.4.0-rc7+ #94 Not tainted ----------------------------- include/linux/kvm_host.h:626 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 1 lock held by evmcs_test/10939: #0: ffff88826ffcb800 (&vcpu->mutex){+.+.}, at: kvm_vcpu_ioctl+0x85/0x630 [kvm] stack backtrace: CPU: 1 PID: 10939 Comm: evmcs_test Not tainted 5.4.0-rc7+ #94 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 Call Trace: dump_stack+0x68/0x9b kvm_is_visible_gfn+0x179/0x180 [kvm] mmu_check_root+0x11/0x30 [kvm] fast_cr3_switch+0x40/0x120 [kvm] kvm_mmu_new_cr3+0x34/0x60 [kvm] nested_vmx_load_cr3+0xbd/0x1f0 [kvm_intel] nested_vmx_enter_non_root_mode+0xab8/0x1d60 [kvm_intel] vmx_set_nested_state+0x256/0x340 [kvm_intel] kvm_arch_vcpu_ioctl+0x491/0x11a0 [kvm] kvm_vcpu_ioctl+0xde/0x630 [kvm] do_vfs_ioctl+0xa2/0x6c0 ksys_ioctl+0x66/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x54/0x200 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7f59a2b95f47 Fixes: 8fcc4b5923af5 ("kvm: nVMX: Introduce KVM_CAP_NESTED_STATE") Cc: [email protected] Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-23KVM: x86: Open code shared_msr_update() in its only callerSean Christopherson1-20/+9
Fold shared_msr_update() into its sole user to eliminate its pointless bounds check, its godawful printk, its misleading comment (it's called under a global lock), and its woefully inaccurate name. Signed-off-by: Sean Christopherson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-23KVM: Fix jump label out_free_* in kvm_init()Miaohe Lin1-4/+3
The jump label out_free_1 and out_free_2 deal with the same stuff, so git rid of one and rename the label out_free_0a to retain the label name order. Signed-off-by: Miaohe Lin <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-23KVM: x86: Remove a spurious export of a static functionSean Christopherson1-1/+0
A recent change inadvertently exported a static function, which results in modpost throwing a warning. Fix it. Fixes: cbbaa2727aa3 ("KVM: x86: fix presentation of TSX feature in ARCH_CAPABILITIES") Signed-off-by: Sean Christopherson <[email protected]> Cc: [email protected] Reviewed-by: Jim Mattson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2019-11-23Merge tag 'perf-core-for-mingo-5.5-20191122' of ↵Ingo Molnar46-200/+1190
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo: perf report: Jin Yao: - Allow entering the annotation view (symbol source/assembly + overhead/cycles/etc column) from the 'perf report --total-cycles' interface. E.g.: # perf record --all-cpus --branch-any --all-kernel ^C[ perf record: Woken up 5 times to write data ] # # perf evlist -v cycles: size: 120, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|CPU|PERIOD|BRANCH_STACK, read_format: ID, disabled: 1, inherit: 1, exclude_user: 1, mmap: 1, comm: 1, freq: 1, task: 1, precise_ip: 3, sample_id_all: 1, exclude_guest: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1, branch_sample_type: ANY # # perf report --total-cycles # # Samples: 78762 of event 'cycles' Sampled Sampled Avg Avg Cycles% Cycles Cycles% Cycles [Program Block Range] Shared Object 1.72% 95.8K 0.00% 254 [msr.h:105 -> msr.h:166] [kernel.vmlinux] 1.56% 107.6K 0.00% 618 [compiler.h:199 -> common.c:301] [kernel.vmlinux] 0.83% 46.3K 0.00% 409 [entry_64.S:153 -> entry_64.S:175] [kernel.vmlinux] 0.83% 46.1K 0.00% 83 [jump_label.h:41 -> tsc.c:230] [kernel.vmlinux] 0.64% 36.9K 0.01% 1.4K [hda_intel.c:904 -> hda_intel.c:916] [snd_hda_intel] 0.57% 30.2K 0.00% 282 [file.c:710 -> file.c:730] [kernel.vmlinux] 0.48% 25.8K 0.00% 82 [spinlock.c:158 -> spinlock.c:160] [kernel.vmlinux] 0.45% 23.7K 0.00% 369 [tick-broadcast.c:585 -> tick-broadcast.c:586] [kernel.vmlinux] 0.44% 24.4K 0.00% 73 [msr.h:236 -> tsc.c:1088] [kernel.vmlinux] 0.43% 22.7K 0.00% 144 [cpuidle.c:229 -> cpuidle.c:232] [kernel.vmlinux] Then press 'A' or Enter on one of those lines, just like with 'perf top', say the top one: [msr.h:105 -> msr.h:166], then this shows up: Samples: 78K of event 'cycles', 4000 Hz, Event count (approx.): 78762 native_write_msr /lib/modules/5.4.0-rc8/build/vmlinux [Percent: local period] Percent│ IPC Cycle (Average IPC: 0.02, IPC Coverage: 50.0%) │ │ Disassembly of section .text: │ │ ffffffff8106c480 <native_write_msr>: │ __wrmsr(): │ return EAX_EDX_VAL(val, low, high); │ } │ │ static inline void notrace __wrmsr(unsigned int msr, u32 low, u32 high) │ { │ asm volatile("1: wrmsr\n" 49.16 │0.02 mov %edi,%ecx │0.02 mov %esi,%eax │0.02 wrmsr │ arch_static_branch(): │ #include <linux/stringify.h> │ #include <linux/types.h> │ │ static __always_inline bool arch_static_branch(struct static_key *key, bool branch) │ { │ asm_volatile_goto("1:" 0.79 │0.02 nop │ native_write_msr(): │ { │ __wrmsr(msr, low, high); │ │ if (msr_tracepoint_active(__tracepoint_write_msr)) │ do_trace_write_msr(msr, ((u64)high << 32 | low), 0); │ } 50.05 │0.02 254 ← retq │ do_trace_write_msr(msr, ((u64)high << 32 | low), 0); │ shl $0x20,%rdx │ mov %esi,%esi │ or %rdx,%rsi │ xor %edx,%edx │ → jmpq do_trace_write_msr We need to improve this to show the source code line numbers in the annotation view, so one can go from that program block to the annotation view and see those source code line numbers straight away. auxtrace/Intel PT: Adrian Hunter: - Add support for AUX area sampling, requires new functionality that will land in 5.5, its already in tip. This includes kernel capability querying so that it fails gracefully with older kernels, duimping aux area samples in 'perf report -D' and 'perf script'. perf.data: Alexey Budankov: - Fix decompression of PERF_RECORD_COMPRESSED records. core: Arnaldo Carvalho de Melo: - Use the 'dcacheline' cmp routine to find the right DSOs taking into account the 'maj', 'min', 'ino' and 'ino_generation', that got moved from 'struct map' to 'struct dso', where it belongs. This further reduces the size of 'struct map', there is still more work to do to maybe get it to max one cacheline. libtraceevent: Hewenliang: - Fix memory leakage in copy_filter_type(). Sudip Mukherjee: - Fix header installation. perf parse: Ian Rogers : - Fix potential memory leak when handling tracepoint errors, found using LLVM's libFuzzer. perf probe: Colin Ian King: - Fix spelling mistake "addrees" -> "address". Signed-off-by: Arnaldo Carvalho de Melo <[email protected]> Signed-off-by: Ingo Molnar <[email protected]>
2019-11-22dell-smm-hwmon: Add documentationGiovanni Mascellani2-0/+165
Part of the documentation is taken from the README of the userspace utils (https://github.com/vitorafsr/i8kutils). The license is GPL-2+ and the author Massimo Dal Zotto is already credited as author of the module. Therefore there should be no copyright problem. I also added a paragraph with specific information on the experimental support for automatic BIOS fan control. Signed-off-by: Giovanni Mascellani <[email protected]> Link: https://lore.kernel.org/r/[email protected] [groeck: Fixed some of the documentation warnings] Signed-off-by: Guenter Roeck <[email protected]>
2019-11-22hwmon: (dell-smm) Add support for disabling automatic BIOS fan controlGiovanni Mascellani1-10/+105
This patch exports standard hwmon pwmX_enable sysfs attribute for enabling or disabling automatic fan control by BIOS. Standard value "1" is for disabling automatic BIOS fan control and value "2" for enabling. By default BIOS auto mode is enabled by laptop firmware. When BIOS auto mode is enabled, custom fan speed value (set via hwmon pwmX sysfs attribute) is overwritten by SMM in few seconds and therefore any custom settings are without effect. So this is reason why implementing option for disabling BIOS auto mode is needed. So finally this patch allows kernel to set and control fan speed on laptops, but it can be dangerous (like setting speed of other fans). The SMM commands to enable or disable automatic fan control are not documented and are not the same on all Dell laptops. Therefore a whitelist is used to send the correct codes only on laptopts for which they are known. This patch was originally developed by Pali Rohár; later Giovanni Mascellani implemented the whitelist. Signed-off-by: Giovanni Mascellani <[email protected]> Co-Developed-by: Pali Rohár <[email protected]> Signed-off-by: Pali Rohár <[email protected]> Link: https://lore.kernel.org/r/[email protected] [groeck: Fixed checkpatch warnings] Signed-off-by: Guenter Roeck <[email protected]>
2019-11-22Merge branch 'next/nommu' into for-nextPaul Walmsley54-375/+947
Conflicts: arch/riscv/boot/Makefile arch/riscv/include/asm/sbi.h
2019-11-22Merge branch 'next/misc' into for-nextPaul Walmsley18-42/+63
2019-11-22Merge branch 'next/tlb-opt' into for-nextPaul Walmsley1-2/+23
2019-11-22Merge branch 'next/isa-string' into for-nextPaul Walmsley1-42/+3
2019-11-22Merge branch 'next/seccomp' into for-nextPaul Walmsley6-4/+70
2019-11-22Merge branch 'sfc-ARFS-expiry-improvements'Jakub Kicinski6-41/+94
Edward Cree says: ==================== A series of changes to how we check filters for expiry, manage how much of that work to do & when, etc. Prompted by some pathological behaviour under heavy load, which was Reported-by: David Ahern <[email protected]> ==================== Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-22sfc: do ARFS expiry work occasionally even without NAPI pollEdward Cree3-8/+12
If there's no traffic on a channel, its ARFS expiry work will never get scheduled by efx_poll() as that isn't being run. So make efx_filter_rfs_expire() reschedule itself to run after 30 seconds. Signed-off-by: Edward Cree <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-22sfc: add statistics for ARFSEdward Cree3-0/+12
Report the number of successful and failed insertions, and also the current count of filters, to aid in tuning e.g. rps_flow_cnt. Signed-off-by: Edward Cree <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-22sfc: suppress MCDI errors from ARFSEdward Cree2-6/+30
In high connection count usage, the NIC's filter table may be filled with sufficiently many ARFS filters that further insertions fail. As this does not represent a correctness issue, do not log the resulting MCDI errors. Add a debug-level message under the (by default disabled) rx_status category instead; and take the opportunity to do a little extra expiry work. Since there are now multiple workitems able to call __efx_filter_rfs_expire on a given channel, it is possible for them to race and thus pass quotas which, combined, exceed rfs_filter_count. Thus, don't WARN_ON if we loop all the way around the table with quota left over. Signed-off-by: Edward Cree <[email protected]> Tested-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-22sfc: change ARFS expiry mechanismEdward Cree4-32/+45
The old rfs_filters_added method for determining the quota could potentially allow the NIC to become filled with old filters, which never get tested for expiry. Instead, explicitly make expiry check work depend on the number of filters installed, and don't count checking slots without filters in as doing work. This guarantees that each filter will be checked for expiry at least once every thirty seconds (assuming the channel to which it belongs is NAPI polling actively) regardless of fill level. Signed-off-by: Edward Cree <[email protected]> Tested-by: David Ahern <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-22Merge branch '100GbE' of ↵Jakub Kicinski17-539/+1022
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue Jeff Kirsher says: ==================== This series contains updates to the ice driver only. Bruce updates the driver to store the number of functions the device has so that it won't have to compute it when setting safe mode capabilities. Adds a check to adjust the reporting of capabilities for devices with more than 4 ports, which differ for devices with less than 4 ports. Brett adds a helper function to determine if the VF is allowed to do VLAN operations based on the host's VF configuration. Also adds a new function that initializes VLAN stripping (enabled/disabled) for the VF based on the device supported capabilities. Adds a check if the vector index is valid with the respect to the number of transmit and receive queues configured when we set coalesce settings for DCB. Adds a check if the promisc_mask contains ICE_PROMISC_VLAN_RX or ICE_PROMISC_VLAN_TX so that VLAN 0 promiscuous rules to be removed. Add a helper macro for a commonly used de-reference of a pointer to &pf->dev->pdev. Jesse fixes an issue where if an invalid virtchnl request from the VF, the driver would return uninitialized data to the VF from the PF stack, so ensure the stack variable is initialized earlier. Add helpers to the virtchnl interface make the reporting of strings consistent and help reduce stack space. Implements VF statistics gathering via the kernel ndo_get_vf_stats(). Akeem ensures we disable the state flag for each VF when its resources are returned to the device. Tony does additional cleanup in the driver to ensure the when we allocate and free memory within the same function, we should not be using devm_* variants; use regular alloc and free functions. Henry implements code to query and set the number of channels on the primary VSI for a PF via ethtool. Jake cleans up needless NULL checks in ice_sched_cleanup_all(). Kevin updates the firmware API version to align with current NVM images. v2: Added "Fixes:" tag to patch 5 commit description and added the use of netif_is_rxfh_configured() in patch 13 to see if RSS has been configured by the user, if so do not overwrite that configuration. ==================== Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-22Merge branch 'for-linus' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input Pull input fix from Dmitry Torokhov: "Just a single revert as RMI mode should not have been enabled for this model [yet?]" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Revert "Input: synaptics - enable RMI mode for X1 Extreme 2nd Generation"
2019-11-22net: inet_is_local_reserved_port() should return bool not intMaciej Żenczykowski1-4/+4
Cc: Eric Dumazet <[email protected]> Signed-off-by: Maciej Żenczykowski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2019-11-22tracing: Enable syscall optimization for MIPSHassan Naveed1-0/+1
Since MIPS architecture has a sparse syscall array, select the HAVE_SPARSE_SYSCALL_NR to save space. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Hassan Naveed <[email protected]> Reviewed-by: Paul Burton <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>