aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-05-08Merge tag 'perf-core-for-mingo-5.8-20200506' of ↵Thomas Gleixner135-1517/+2699
git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf updates from Arnaldo: perf/core improvements and fixes: perf record: - Introduce --switch-output-event to use arbitrary events to be setup and read from a side band thread and, when they take place a signal be sent to the main 'perf record' thread, reusing the --switch-output code to take perf.data snapshots from the --overwrite ring buffer, e.g.: # perf record --overwrite -e sched:* \ --switch-output-event syscalls:*connect* \ workload will take perf.data.YYYYMMDDHHMMSS snapshots up to around the connect syscalls. Stephane Eranian: - Add --num-synthesize-threads option to control degree of parallelism of the synthesize_mmap() code which is scanning /proc/PID/task/PID/maps and can be time consuming. This mimics pre-existing behaviour in 'perf top'. Intel PT: Adrian Hunter: - Add support for synthesizing branch stacks for regular events (cycles, instructions, etc) from Intel PT data. perf bench: Ian Rogers: - Add a multi-threaded synthesize benchmark. - Add kallsyms parsing benchmark. Tommi Rantala: - Fix div-by-zero if runtime is zero. perf synthetic events: - Remove use of sscanf from /proc reading when parsing pre-existing threads to generate synthetic PERF_RECORD_{FORK,MMAP,COMM,etc} events. tools api: - Add a lightweight buffered reading API. libsymbols: - Parse kallsyms using new lightweight buffered reading io API. perf parse-events: - Fix memory leaks found on parse_events. perf mem2node: - Avoid double free related to realloc(). perf stat: Jin Yao: - Zero all the 'ena' and 'run' array slot stats for interval mode. - Improve runtime stat for interval mode Kajol Jain: - Enable Hz/hz printing for --metric-only option - Enhance JSON/metric infrastructure to handle "?". perf tests: Kajol Jain: - Added test for runtime param in metric expression. Tommi Rantala: - Fix data path in the session topology test. perf vendor events power9: Kajol Jain: - Add hv_24x7 socket/chip level metric events Coresight: Leo Yan: - Move definition of 'traceid_list' global variable from header file. Mike Leach: - Update to build with latest opencsd version. perf pmu: Shaokun Zhang: - Fix function name in comment, its get_cpuid_str(), not get_cpustr() Stephane Eranian: - Add perf_pmu__find_by_type() helper perf script: Stephane Eranian: - Remove extraneous newline in perf_sample__fprintf_regs(). Ian Rogers: - Avoid NULL dereference on symbol. tools feature: Stephane Eranian: - Add support for detecting libpfm4. perf symbol: Thomas Richter: - Fix kernel symbol address display in TUI verbose mode. perf cgroup: Tommi Rantala: - Avoid needless closing of unopened fd libperf: He Zhe: - Add NULL pointer check for cpu_map iteration and NULL assignment for all_cpus. Ian Rogers: - Fix a refcount leak in evlist method. Arnaldo Carvalho de Melo: - Rename the code in tools/perf/util, i.e. perf tooling specific, that operates on 'struct evsel' to evsel__, leaving the perf_evsel__ namespace for the routines in tools/lib/perf/ that operate on 'struct perf_evsel__'. tools/perf specific libraries: Konstantin Khlebnikov: - Fix reading new topology attribute "core_cpus" - Simplify checking if SMT is active. perf flamegraph: Arnaldo Carvalho de Melo: - Use /bin/bash for report and record scripts, just like all other such scripts, fixing a package dependency bug in a Linaro OpenEmbedded build checker. perf evlist: Jagadeesh Pagadala: - Remove duplicate headers. Miscelaneous: Zou Wei: - Remove unneeded semicolon in libtraceevent, 'perf c2c' and others. - Fix warning assignment of 0/1 to bool variable in 'perf report' Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-05-08gfs2: Fix error exit in do_xmoteBob Peterson1-1/+1
Before this patch, if an error was detected from glock function go_sync by function do_xmote, it would return. But the function had temporarily unlocked the gl_lockref spin_lock, and it never re-locked it. When the caller of do_xmote tried to unlock it again, it was already unlocked, which resulted in a corrupted spin_lock value. This patch makes sure the gl_lockref spin_lock is re-locked after it is unlocked. Thanks to Wu Bo <[email protected]> for reporting this problem. Signed-off-by: Bob Peterson <[email protected]> Signed-off-by: Andreas Gruenbacher <[email protected]>
2020-05-08KVM: SVM: Disable AVIC before setting V_IRQSuravee Suthikulpanit1-1/+12
The commit 64b5bd270426 ("KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1") introduced a WARN_ON, which checks if AVIC is enabled when trying to set V_IRQ in the VMCB for enabling irq window. The following warning is triggered because the requesting vcpu (to deactivate AVIC) does not get to process APICv update request for itself until the next #vmexit. WARNING: CPU: 0 PID: 118232 at arch/x86/kvm/svm/svm.c:1372 enable_irq_window+0x6a/0xa0 [kvm_amd] RIP: 0010:enable_irq_window+0x6a/0xa0 [kvm_amd] Call Trace: kvm_arch_vcpu_ioctl_run+0x6e3/0x1b50 [kvm] ? kvm_vm_ioctl_irq_line+0x27/0x40 [kvm] ? _copy_to_user+0x26/0x30 ? kvm_vm_ioctl+0xb3e/0xd90 [kvm] ? set_next_entity+0x78/0xc0 kvm_vcpu_ioctl+0x236/0x610 [kvm] ksys_ioctl+0x8a/0xc0 __x64_sys_ioctl+0x1a/0x20 do_syscall_64+0x58/0x210 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes by sending APICV update request to all other vcpus, and immediately update APIC for itself. Signed-off-by: Suravee Suthikulpanit <[email protected]> Link: https://lkml.org/lkml/2020/5/2/167 Fixes: 64b5bd270426 ("KVM: nSVM: ignore L1 interrupt window while running L2 with V_INTR_MASKING=1") Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-05-08KVM: Introduce kvm_make_all_cpus_request_except()Suravee Suthikulpanit4-5/+16
This allows making request to all other vcpus except the one specified in the parameter. Signed-off-by: Suravee Suthikulpanit <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-05-08KVM: VMX: pass correct DR6 for GD userspace exitPaolo Bonzini2-2/+24
When KVM_EXIT_DEBUG is raised for the disabled-breakpoints case (DR7.GD), DR6 was incorrectly copied from the value in the VM. Instead, DR6.BD should be set in order to catch this case. On AMD this does not need any special code because the processor triggers a #DB exception that is intercepted. However, the testcase would fail without the previous patch because both DR6.BS and DR6.BD would be set. Signed-off-by: Paolo Bonzini <[email protected]>
2020-05-08KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6Paolo Bonzini5-30/+32
There are two issues with KVM_EXIT_DEBUG on AMD, whose root cause is the different handling of DR6 on intercepted #DB exceptions on Intel and AMD. On Intel, #DB exceptions transmit the DR6 value via the exit qualification field of the VMCS, and the exit qualification only contains the description of the precise event that caused a vmexit. On AMD, instead the DR6 field of the VMCB is filled in as if the #DB exception was to be injected into the guest. This has two effects when guest debugging is in use: * the guest DR6 is clobbered * the kvm_run->debug.arch.dr6 field can accumulate more debug events, rather than just the last one that happened (the testcase in the next patch covers this issue). This patch fixes both issues by emulating, so to speak, the Intel behavior on AMD processors. The important observation is that (after the previous patches) the VMCB value of DR6 is only ever observable from the guest is KVM_DEBUGREG_WONT_EXIT is set. Therefore we can actually set vmcb->save.dr6 to any value we want as long as KVM_DEBUGREG_WONT_EXIT is clear, which it will be if guest debugging is enabled. Therefore it is possible to enter the guest with an all-zero DR6, reconstruct the #DB payload from the DR6 we get at exit time, and let kvm_deliver_exception_payload move the newly set bits into vcpu->arch.dr6. Some extra bits may be included in the payload if KVM_DEBUGREG_WONT_EXIT is set, but this is harmless. This may not be the most optimized way to deal with this, but it is simple and, being confined within SVM code, it gets rid of the set_dr6 callback and kvm_update_dr6. Signed-off-by: Paolo Bonzini <[email protected]>
2020-05-08KVM: SVM: keep DR6 synchronized with vcpu->arch.dr6Paolo Bonzini5-28/+21
kvm_x86_ops.set_dr6 is only ever called with vcpu->arch.dr6 as the second argument. Ensure that the VMCB value is synchronized to vcpu->arch.dr6 on #DB (both "normal" and nested) and nested vmentry, so that the current value of DR6 is always available in vcpu->arch.dr6. The get_dr6 callback can just access vcpu->arch.dr6 and becomes redundant. Signed-off-by: Paolo Bonzini <[email protected]>
2020-05-08iwlwifi: pcie: handle QuZ configs with killer NICs as wellLuca Coelho1-0/+4
The killer devices were left out of the checks that convert Qu-B0 to QuZ configurations. Add them. Cc: [email protected] # v5.3+ Fixes: 5a8c31aa6357 ("iwlwifi: pcie: fix recognition of QuZ devices") Signed-off-by: Luca Coelho <[email protected]> Tested-by: You-Sheng Yang <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/iwlwifi.20200424121518.b715acfbe211.I273a098064a22577e4fca767910fd9cf0013f5cb@changeid
2020-05-08mmc: block: Fix request completion in the CQE timeout pathAdrian Hunter1-2/+1
First, it should be noted that the CQE timeout (60 seconds) is substantial so a CQE request that times out is really stuck, and the race between timeout and completion is extremely unlikely. Nevertheless this patch fixes an issue with it. Commit ad73d6feadbd7b ("mmc: complete requests from ->timeout") preserved the existing functionality, to complete the request. However that had only been necessary because the block layer timeout handler had been marking the request to prevent it from being completed normally. That restriction was removed at the same time, the result being that a request that has gone will have been completed anyway. That is, the completion was unnecessary. At the time, the unnecessary completion was harmless because the block layer would ignore it, although that changed in kernel v5.0. Note for stable, this patch will not apply cleanly without patch "mmc: core: Fix recursive locking issue in CQE recovery path" Signed-off-by: Adrian Hunter <[email protected]> Fixes: ad73d6feadbd7b ("mmc: complete requests from ->timeout") Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>
2020-05-08mmc: core: Fix recursive locking issue in CQE recovery pathSarthak Garg1-9/+4
Consider the following stack trace -001|raw_spin_lock_irqsave -002|mmc_blk_cqe_complete_rq -003|__blk_mq_complete_request(inline) -003|blk_mq_complete_request(rq) -004|mmc_cqe_timed_out(inline) -004|mmc_mq_timed_out mmc_mq_timed_out acquires the queue_lock for the first time. The mmc_blk_cqe_complete_rq function also tries to acquire the same queue lock resulting in recursive locking where the task is spinning for the same lock which it has already acquired leading to watchdog bark. Fix this issue with the lock only for the required critical section. Cc: <[email protected]> Fixes: 1e8e55b67030 ("mmc: block: Add CQE support") Suggested-by: Sahitya Tummala <[email protected]> Signed-off-by: Sarthak Garg <[email protected]> Acked-by: Adrian Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>
2020-05-08mmc: core: Check request type before completing the requestVeerabhadrarao Badiganti1-1/+2
In the request completion path with CQE, request type is being checked after the request is getting completed. This is resulting in returning the wrong request type and leading to the IO hang issue. ASYNC request type is getting returned for DCMD type requests. Because of this mismatch, mq->cqe_busy flag is never getting cleared and the driver is not invoking blk_mq_hw_run_queue. So requests are not getting dispatched to the LLD from the block layer. All these eventually leading to IO hang issues. So, get the request type before completing the request. Cc: <[email protected]> Fixes: 1e8e55b67030 ("mmc: block: Add CQE support") Signed-off-by: Veerabhadrarao Badiganti <[email protected]> Acked-by: Adrian Hunter <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Ulf Hansson <[email protected]>
2020-05-08Merge tag 'drm-misc-fixes-2020-05-07' of ↵Dave Airlie6-4/+14
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes A few minor fixes for an ordering issue in virtio, an (old) gcc warning in sun4i, a probe issue in ingenic-drm and a regression in the HDCP support. Signed-off-by: Dave Airlie <[email protected]> From: Maxime Ripard <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-08Merge tag 'amd-drm-fixes-5.7-2020-05-06' of ↵Dave Airlie6-27/+43
git://people.freedesktop.org/~agd5f/linux into drm-fixes amd-drm-fixes-5.7-2020-05-06: amdgpu: - Runtime PM fixes - DC fix for PPC - Misc DC fixes Signed-off-by: Dave Airlie <[email protected]> From: Alex Deucher <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-05-07Merge branch 'for-v5.7' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security subsystem fix from James Morris: "Fix the default value of fs_context_parse_param hook" * 'for-v5.7' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: security: Fix the default value of fs_context_parse_param hook
2020-05-07mm: limit boost_watermark on small zonesHenry Willard1-0/+8
Commit 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") adds a boost_watermark() function which increases the min watermark in a zone by at least pageblock_nr_pages or the number of pages in a page block. On Arm64, with 64K pages and 512M huge pages, this is 8192 pages or 512M. It does this regardless of the number of managed pages managed in the zone or the likelihood of success. This can put the zone immediately under water in terms of allocating pages from the zone, and can cause a small machine to fail immediately due to OoM. Unlike set_recommended_min_free_kbytes(), which substantially increases min_free_kbytes and is tied to THP, boost_watermark() can be called even if THP is not active. The problem is most likely to appear on architectures such as Arm64 where pageblock_nr_pages is very large. It is desirable to run the kdump capture kernel in as small a space as possible to avoid wasting memory. In some architectures, such as Arm64, there are restrictions on where the capture kernel can run, and therefore, the space available. A capture kernel running in 768M can fail due to OoM immediately after boost_watermark() sets the min in zone DMA32, where most of the memory is, to 512M. It fails even though there is over 500M of free memory. With boost_watermark() suppressed, the capture kernel can run successfully in 448M. This patch limits boost_watermark() to boosting a zone's min watermark only when there are enough pages that the boost will produce positive results. In this case that is estimated to be four times as many pages as pageblock_nr_pages. Mel said: : There is no harm in marking it stable. Clearly it does not happen very : often but it's not impossible. 32-bit x86 is a lot less common now : which would previously have been vulnerable to triggering this easily. : ppc64 has a larger base page size but typically only has one zone. : arm64 is likely the most vulnerable, particularly when CMA is : configured with a small movable zone. Fixes: 1c30844d2dfe ("mm: reclaim small amounts of memory when an external fragmentation event occurs") Signed-off-by: Henry Willard <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: David Hildenbrand <[email protected]> Acked-by: Mel Gorman <[email protected]> Cc: Vlastimil Babka <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07ubsan: disable UBSAN_ALIGNMENT under COMPILE_TESTKees Cook2-10/+6
The documentation for UBSAN_ALIGNMENT already mentions that it should not be used on all*config builds (and for efficient-unaligned-access architectures), so just refactor the Kconfig to correctly implement this so randconfigs will stop creating insane images that freak out objtool under CONFIG_UBSAN_TRAP (due to the false positives producing functions that never return, etc). Link: http://lkml.kernel.org/r/202005011433.C42EA3E2D@keescook Fixes: 0887a7ebc977 ("ubsan: add trap instrumentation option") Signed-off-by: Kees Cook <[email protected]> Reported-by: Randy Dunlap <[email protected]> Link: https://lore.kernel.org/linux-next/202004231224.D6B3B650@keescook/ Signed-off-by: Andrew Morton <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Stephen Rothwell <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Andrey Ryabinin <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07mm/vmscan: remove unnecessary argument description of isolate_lru_pages()Qiwu Chen1-1/+0
Since commit a9e7c39fa9fd9 ("mm/vmscan.c: remove 7th argument of isolate_lru_pages()"), the explanation of 'mode' argument has been unnecessary. Let's remove it. Signed-off-by: Qiwu Chen <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07epoll: atomically remove wait entry on wake upRoman Penyaev1-19/+24
This patch does two things: - fixes a lost wakeup introduced by commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") - improves performance for events delivery. The description of the problem is the following: if N (>1) threads are waiting on ep->wq for new events and M (>1) events come, it is quite likely that >1 wakeups hit the same wait queue entry, because there is quite a big window between __add_wait_queue_exclusive() and the following __remove_wait_queue() calls in ep_poll() function. This can lead to lost wakeups, because thread, which was woken up, can handle not all the events in ->rdllist. (in better words the problem is described here: https://lkml.org/lkml/2019/10/7/905) The idea of the current patch is to use init_wait() instead of init_waitqueue_entry(). Internally init_wait() sets autoremove_wake_function as a callback, which removes the wait entry atomically (under the wq locks) from the list, thus the next coming wakeup hits the next wait entry in the wait queue, thus preventing lost wakeups. Problem is very well reproduced by the epoll60 test case [1]. Wait entry removal on wakeup has also performance benefits, because there is no need to take a ep->lock and remove wait entry from the queue after the successful wakeup. Here is the timing output of the epoll60 test case: With explicit wakeup from ep_scan_ready_list() (the state of the code prior 339ddb53d373): real 0m6.970s user 0m49.786s sys 0m0.113s After this patch: real 0m5.220s user 0m36.879s sys 0m0.019s The other testcase is the stress-epoll [2], where one thread consumes all the events and other threads produce many events: With explicit wakeup from ep_scan_ready_list() (the state of the code prior 339ddb53d373): threads events/ms run-time ms 8 5427 1474 16 6163 2596 32 6824 4689 64 7060 9064 128 6991 18309 After this patch: threads events/ms run-time ms 8 5598 1429 16 7073 2262 32 7502 4265 64 7640 8376 128 7634 16767 (number of "events/ms" represents event bandwidth, thus higher is better; number of "run-time ms" represents overall time spent doing the benchmark, thus lower is better) [1] tools/testing/selftests/filesystems/epoll/epoll_wakeup_test.c [2] https://github.com/rouming/test-tools/blob/master/stress-epoll.c Signed-off-by: Roman Penyaev <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Jason Baron <[email protected]> Cc: Khazhismel Kumykov <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Heiher <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07kselftests: introduce new epoll60 testcase for catching lost wakeupsRoman Penyaev1-0/+146
This test case catches lost wake up introduced by commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") The test is simple: we have 10 threads and 10 event fds. Each thread can harvest only 1 event. 1 producer fires all 10 events at once and waits that all 10 events will be observed by 10 threads. In case of lost wakeup epoll_wait() will timeout and 0 will be returned. Test case catches two sort of problems: forgotten wakeup on event, which hits the ->ovflist list, this problem was fixed by: 5a2513239750 ("eventpoll: fix missing wakeup for ovflist in ep_poll_callback") the other problem is when several sequential events hit the same waiting thread, thus other waiters get no wakeups. Problem is fixed in the following patch. Signed-off-by: Roman Penyaev <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Cc: Khazhismel Kumykov <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Heiher <[email protected]> Cc: Jason Baron <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07percpu: make pcpu_alloc() aware of current gfp contextFilipe Manana1-4/+10
Since 5.7-rc1, on btrfs we have a percpu counter initialization for which we always pass a GFP_KERNEL gfp_t argument (this happens since commit 2992df73268f78 ("btrfs: Implement DREW lock")). That is safe in some contextes but not on others where allowing fs reclaim could lead to a deadlock because we are either holding some btrfs lock needed for a transaction commit or holding a btrfs transaction handle open. Because of that we surround the call to the function that initializes the percpu counter with a NOFS context using memalloc_nofs_save() (this is done at btrfs_init_fs_root()). However it turns out that this is not enough to prevent a possible deadlock because percpu_alloc() determines if it is in an atomic context by looking exclusively at the gfp flags passed to it (GFP_KERNEL in this case) and it is not aware that a NOFS context is set. Because percpu_alloc() thinks it is in a non atomic context it locks the pcpu_alloc_mutex. This can result in a btrfs deadlock when pcpu_balance_workfn() is running, has acquired that mutex and is waiting for reclaim, while the btrfs task that called percpu_counter_init() (and therefore percpu_alloc()) is holding either the btrfs commit_root semaphore or a transaction handle (done fs/btrfs/backref.c: iterate_extent_inodes()), which prevents reclaim from finishing as an attempt to commit the current btrfs transaction will deadlock. Lockdep reports this issue with the following trace: ====================================================== WARNING: possible circular locking dependency detected 5.6.0-rc7-btrfs-next-77 #1 Not tainted ------------------------------------------------------ kswapd0/91 is trying to acquire lock: ffff8938a3b3fdc8 (&delayed_node->mutex){+.+.}, at: __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] but task is already holding lock: ffffffffb4f0dbc0 (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #4 (fs_reclaim){+.+.}: fs_reclaim_acquire.part.0+0x25/0x30 __kmalloc+0x5f/0x3a0 pcpu_create_chunk+0x19/0x230 pcpu_balance_workfn+0x56a/0x680 process_one_work+0x235/0x5f0 worker_thread+0x50/0x3b0 kthread+0x120/0x140 ret_from_fork+0x3a/0x50 -> #3 (pcpu_alloc_mutex){+.+.}: __mutex_lock+0xa9/0xaf0 pcpu_alloc+0x480/0x7c0 __percpu_counter_init+0x50/0xd0 btrfs_drew_lock_init+0x22/0x70 [btrfs] btrfs_get_fs_root+0x29c/0x5c0 [btrfs] resolve_indirect_refs+0x120/0xa30 [btrfs] find_parent_nodes+0x50b/0xf30 [btrfs] btrfs_find_all_leafs+0x60/0xb0 [btrfs] iterate_extent_inodes+0x139/0x2f0 [btrfs] iterate_inodes_from_logical+0xa1/0xe0 [btrfs] btrfs_ioctl_logical_to_ino+0xb4/0x190 [btrfs] btrfs_ioctl+0x165a/0x3130 [btrfs] ksys_ioctl+0x87/0xc0 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x5c/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #2 (&fs_info->commit_root_sem){++++}: down_write+0x38/0x70 btrfs_cache_block_group+0x2ec/0x500 [btrfs] find_free_extent+0xc6a/0x1600 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0xc1/0x350 [btrfs] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] __btrfs_cow_block+0x122/0x5a0 [btrfs] btrfs_cow_block+0x106/0x240 [btrfs] commit_cowonly_roots+0x55/0x310 [btrfs] btrfs_commit_transaction+0x509/0xb20 [btrfs] sync_filesystem+0x74/0x90 generic_shutdown_super+0x22/0x100 kill_anon_super+0x14/0x30 btrfs_kill_super+0x12/0x20 [btrfs] deactivate_locked_super+0x31/0x70 cleanup_mnt+0x100/0x160 task_work_run+0x93/0xc0 exit_to_usermode_loop+0xf9/0x100 do_syscall_64+0x20d/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #1 (&space_info->groups_sem){++++}: down_read+0x3c/0x140 find_free_extent+0xef6/0x1600 [btrfs] btrfs_reserve_extent+0x9b/0x180 [btrfs] btrfs_alloc_tree_block+0xc1/0x350 [btrfs] alloc_tree_block_no_bg_flush+0x4a/0x60 [btrfs] __btrfs_cow_block+0x122/0x5a0 [btrfs] btrfs_cow_block+0x106/0x240 [btrfs] btrfs_search_slot+0x50c/0xd60 [btrfs] btrfs_lookup_inode+0x3a/0xc0 [btrfs] __btrfs_update_delayed_inode+0x90/0x280 [btrfs] __btrfs_commit_inode_delayed_items+0x81f/0x870 [btrfs] __btrfs_run_delayed_items+0x8e/0x180 [btrfs] btrfs_commit_transaction+0x31b/0xb20 [btrfs] iterate_supers+0x87/0xf0 ksys_sync+0x60/0xb0 __ia32_sys_sync+0xa/0x10 do_syscall_64+0x5c/0x260 entry_SYSCALL_64_after_hwframe+0x49/0xbe -> #0 (&delayed_node->mutex){+.+.}: __lock_acquire+0xef0/0x1c80 lock_acquire+0xa2/0x1d0 __mutex_lock+0xa9/0xaf0 __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] btrfs_evict_inode+0x40d/0x560 [btrfs] evict+0xd9/0x1c0 dispose_list+0x48/0x70 prune_icache_sb+0x54/0x80 super_cache_scan+0x124/0x1a0 do_shrink_slab+0x176/0x440 shrink_slab+0x23a/0x2c0 shrink_node+0x188/0x6e0 balance_pgdat+0x31d/0x7f0 kswapd+0x238/0x550 kthread+0x120/0x140 ret_from_fork+0x3a/0x50 other info that might help us debug this: Chain exists of: &delayed_node->mutex --> pcpu_alloc_mutex --> fs_reclaim Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(fs_reclaim); lock(pcpu_alloc_mutex); lock(fs_reclaim); lock(&delayed_node->mutex); *** DEADLOCK *** 3 locks held by kswapd0/91: #0: (fs_reclaim){+.+.}, at: __fs_reclaim_acquire+0x5/0x30 #1: (shrinker_rwsem){++++}, at: shrink_slab+0x12f/0x2c0 #2: (&type->s_umount_key#43){++++}, at: trylock_super+0x16/0x50 stack backtrace: CPU: 1 PID: 91 Comm: kswapd0 Not tainted 5.6.0-rc7-btrfs-next-77 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014 Call Trace: dump_stack+0x8f/0xd0 check_noncircular+0x170/0x190 __lock_acquire+0xef0/0x1c80 lock_acquire+0xa2/0x1d0 __mutex_lock+0xa9/0xaf0 __btrfs_release_delayed_node.part.0+0x3f/0x320 [btrfs] btrfs_evict_inode+0x40d/0x560 [btrfs] evict+0xd9/0x1c0 dispose_list+0x48/0x70 prune_icache_sb+0x54/0x80 super_cache_scan+0x124/0x1a0 do_shrink_slab+0x176/0x440 shrink_slab+0x23a/0x2c0 shrink_node+0x188/0x6e0 balance_pgdat+0x31d/0x7f0 kswapd+0x238/0x550 kthread+0x120/0x140 ret_from_fork+0x3a/0x50 This could be fixed by making btrfs pass GFP_NOFS instead of GFP_KERNEL to percpu_counter_init() in contextes where it is not reclaim safe, however that type of approach is discouraged since memalloc_[nofs|noio]_save() were introduced. Therefore this change makes pcpu_alloc() look up into an existing nofs/noio context before deciding whether it is in an atomic context or not. Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrew Morton <[email protected]> Acked-by: Tejun Heo <[email protected]> Acked-by: Dennis Zhou <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christoph Lameter <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07mm/slub: fix incorrect interpretation of s->offsetWaiman Long1-15/+30
In a couple of places in the slub memory allocator, the code uses "s->offset" as a check to see if the free pointer is put right after the object. That check is no longer true with commit 3202fa62fb43 ("slub: relocate freelist pointer to middle of object"). As a result, echoing "1" into the validate sysfs file, e.g. of dentry, may cause a bunch of "Freepointer corrupt" error reports like the following to appear with the system in panic afterwards. ============================================================================= BUG dentry(666:pmcd.service) (Tainted: G B): Freepointer corrupt ----------------------------------------------------------------------------- To fix it, use the check "s->offset == s->inuse" in the new helper function freeptr_outside_object() instead. Also add another helper function get_info_end() to return the end of info block (inuse + free pointer if not overlapping with object). Fixes: 3202fa62fb43 ("slub: relocate freelist pointer to middle of object") Signed-off-by: Waiman Long <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Matthew Wilcox (Oracle) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Rafael Aquini <[email protected]> Cc: Christoph Lameter <[email protected]> Cc: Vitaly Nikolenko <[email protected]> Cc: Silvio Cesare <[email protected]> Cc: Pekka Enberg <[email protected]> Cc: David Rientjes <[email protected]> Cc: Joonsoo Kim <[email protected]> Cc: Markus Elfring <[email protected]> Cc: Changbin Du <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07scripts/gdb: repair rb_first() and rb_last()Aymeric Agon-Rambosson1-2/+2
The current implementations of the rb_first() and rb_last() gdb functions have a variable that references itself in its instanciation, which causes the function to throw an error if a specific condition on the argument is met. The original author rather intended to reference the argument and made a typo. Referring the argument instead makes the function work as intended. Signed-off-by: Aymeric Agon-Rambosson <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Stephen Boyd <[email protected]> Cc: Jan Kiszka <[email protected]> Cc: Kieran Bingham <[email protected]> Cc: Douglas Anderson <[email protected]> Cc: Nikolay Borisov <[email protected]> Cc: Jackie Liu <[email protected]> Cc: Jason Wessel <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07eventpoll: fix missing wakeup for ovflist in ep_poll_callbackKhazhismel Kumykov1-9/+9
In the event that we add to ovflist, before commit 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") we would be woken up by ep_scan_ready_list, and did no wakeup in ep_poll_callback. With that wakeup removed, if we add to ovflist here, we may never wake up. Rather than adding back the ep_scan_ready_list wakeup - which was resulting in unnecessary wakeups, trigger a wake-up in ep_poll_callback. We noticed that one of our workloads was missing wakeups starting with 339ddb53d373 and upon manual inspection, this wakeup seemed missing to me. With this patch added, we no longer see missing wakeups. I haven't yet tried to make a small reproducer, but the existing kselftests in filesystem/epoll passed for me with this patch. [[email protected]: use if/elif instead of goto + cleanup suggested by Roman] Link: http://lkml.kernel.org/r/[email protected] Fixes: 339ddb53d373 ("fs/epoll: remove unnecessary wakeups of nested epoll") Signed-off-by: Khazhismel Kumykov <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Roman Penyaev <[email protected]> Cc: Alexander Viro <[email protected]> Cc: Roman Penyaev <[email protected]> Cc: Heiher <[email protected]> Cc: Jason Baron <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07arch/x86/kvm/svm/sev.c: change flag passed to GUP fast in sev_pin_memory()Janakarajan Natarajan1-1/+1
When trying to lock read-only pages, sev_pin_memory() fails because FOLL_WRITE is used as the flag for get_user_pages_fast(). Commit 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'") updated the get_user_pages_fast() call sites to use flags, but incorrectly updated the call in sev_pin_memory(). As the original coding of this call was correct, revert the change made by that commit. Fixes: 73b0140bf0fe ("mm/gup: change GUP fast to use flags rather than a write 'bool'") Signed-off-by: Janakarajan Natarajan <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Cc: Paolo Bonzini <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Vitaly Kuznetsov <[email protected]> Cc: Wanpeng Li <[email protected]> Cc: Jim Mattson <[email protected]> Cc: Joerg Roedel <[email protected]> Cc: Thomas Gleixner <[email protected]> Cc: Ingo Molnar <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: "H . Peter Anvin" <[email protected]> Cc: Mike Marshall <[email protected]> Cc: Brijesh Singh <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07scripts/decodecode: fix trapping instruction formattingIvan Delalande1-1/+1
If the trapping instruction contains a ':', for a memory access through segment registers for example, the sed substitution will insert the '*' marker in the middle of the instruction instead of the line address: 2b: 65 48 0f c7 0f cmpxchg16b %gs:*(%rdi) <-- trapping instruction I started to think I had forgotten some quirk of the assembly syntax before noticing that it was actually coming from the script. Fix it to add the address marker at the right place for these instructions: 28: 49 8b 06 mov (%r14),%rax 2b:* 65 48 0f c7 0f cmpxchg16b %gs:(%rdi) <-- trapping instruction 30: 0f 94 c0 sete %al Fixes: 18ff44b189e2 ("scripts/decodecode: make faulting insn ptr more robust") Signed-off-by: Ivan Delalande <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Link: http://lkml.kernel.org/r/20200419223653.GA31248@visor Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07kernel/kcov.c: fix typos in kcov_remote_start documentationMaciej Grochowski1-2/+2
Signed-off-by: Maciej Grochowski <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Andrey Konovalov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07mm/page_alloc: fix watchdog soft lockups during set_zone_contiguous()David Hildenbrand1-0/+1
Without CONFIG_PREEMPT, it can happen that we get soft lockups detected, e.g., while booting up. watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.6.0-next-20200331+ #4 Hardware name: Red Hat KVM, BIOS 1.11.1-4.module+el8.1.0+4066+0f1aadab 04/01/2014 RIP: __pageblock_pfn_to_page+0x134/0x1c0 Call Trace: set_zone_contiguous+0x56/0x70 page_alloc_init_late+0x166/0x176 kernel_init_freeable+0xfa/0x255 kernel_init+0xa/0x106 ret_from_fork+0x35/0x40 The issue becomes visible when having a lot of memory (e.g., 4TB) assigned to a single NUMA node - a system that can easily be created using QEMU. Inside VMs on a hypervisor with quite some memory overcommit, this is fairly easy to trigger. Signed-off-by: David Hildenbrand <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Reviewed-by: Pavel Tatashin <[email protected]> Reviewed-by: Pankaj Gupta <[email protected]> Reviewed-by: Baoquan He <[email protected]> Reviewed-by: Shile Zhang <[email protected]> Acked-by: Michal Hocko <[email protected]> Cc: Kirill Tkhai <[email protected]> Cc: Shile Zhang <[email protected]> Cc: Pavel Tatashin <[email protected]> Cc: Daniel Jordan <[email protected]> Cc: Michal Hocko <[email protected]> Cc: Alexander Duyck <[email protected]> Cc: Baoquan He <[email protected]> Cc: Oscar Salvador <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07mm, memcg: fix error return value of mem_cgroup_css_alloc()Yafang Shao1-6/+9
When I run my memcg testcase which creates lots of memcgs, I found there're unexpected out of memory logs while there're still enough available free memory. The error log is mkdir: cannot create directory 'foo.65533': Cannot allocate memory The reason is when we try to create more than MEM_CGROUP_ID_MAX memcgs, an -ENOMEM errno will be set by mem_cgroup_css_alloc(), but the right errno should be -ENOSPC "No space left on device", which is an appropriate errno for userspace's failed mkdir. As the errno really misled me, we should make it right. After this patch, the error log will be mkdir: cannot create directory 'foo.65533': No space left on device [[email protected]: s/EBUSY/ENOSPC/, per Michal] [[email protected]: s/EBUSY/ENOSPC/, per Michal] Fixes: 73f576c04b94 ("mm: memcontrol: fix cgroup creation failure after many small jobs") Suggested-by: Matthew Wilcox <[email protected]> Signed-off-by: Yafang Shao <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: Michal Hocko <[email protected]> Acked-by: Johannes Weiner <[email protected]> Cc: Vladimir Davydov <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()Oleg Nesterov1-8/+26
Commit cc731525f26a ("signal: Remove kernel interal si_code magic") changed the value of SI_FROMUSER(SI_MESGQ), this means that mq_notify() no longer works if the sender doesn't have rights to send a signal. Change __do_notify() to use do_send_sig_info() instead of kill_pid_info() to avoid check_kill_permission(). This needs the additional notify.sigev_signo != 0 check, shouldn't we change do_mq_notify() to deny sigev_signo == 0 ? Test-case: #include <signal.h> #include <mqueue.h> #include <unistd.h> #include <sys/wait.h> #include <assert.h> static int notified; static void sigh(int sig) { notified = 1; } int main(void) { signal(SIGIO, sigh); int fd = mq_open("/mq", O_RDWR|O_CREAT, 0666, NULL); assert(fd >= 0); struct sigevent se = { .sigev_notify = SIGEV_SIGNAL, .sigev_signo = SIGIO, }; assert(mq_notify(fd, &se) == 0); if (!fork()) { assert(setuid(1) == 0); mq_send(fd, "",1,0); return 0; } wait(NULL); mq_unlink("/mq"); assert(notified); return 0; } [[email protected]: 1) Add self_exec_id evaluation so that the implementation matches do_notify_parent 2) use PIDTYPE_TGID everywhere] Fixes: cc731525f26a ("signal: Remove kernel interal si_code magic") Reported-by: Yoji <[email protected]> Signed-off-by: Oleg Nesterov <[email protected]> Signed-off-by: Manfred Spraul <[email protected]> Signed-off-by: Andrew Morton <[email protected]> Acked-by: "Eric W. Biederman" <[email protected]> Cc: Davidlohr Bueso <[email protected]> Cc: Markus Elfring <[email protected]> Cc: <[email protected]> Cc: <[email protected]> Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Linus Torvalds <[email protected]>
2020-05-07evm: Fix RCU list related warningsMadhuparna Bhowmik3-4/+11
This patch fixes the following warning and few other instances of traversal of evm_config_xattrnames list: [ 32.848432] ============================= [ 32.848707] WARNING: suspicious RCU usage [ 32.848966] 5.7.0-rc1-00006-ga8d5875ce5f0b #1 Not tainted [ 32.849308] ----------------------------- [ 32.849567] security/integrity/evm/evm_main.c:231 RCU-list traversed in non-reader section!! Since entries are only added to the list and never deleted, use list_for_each_entry_lockless() instead of list_for_each_entry_rcu for traversing the list. Also, add a relevant comment in evm_secfs.c to indicate this fact. Reported-by: kernel test robot <[email protected]> Suggested-by: Paul E. McKenney <[email protected]> Signed-off-by: Madhuparna Bhowmik <[email protected]> Acked-by: Paul E. McKenney <[email protected]> (RCU viewpoint) Signed-off-by: Mimi Zohar <[email protected]>
2020-05-07ima: Fix return value of ima_write_policy()Roberto Sassu1-2/+1
This patch fixes the return value of ima_write_policy() when a new policy is directly passed to IMA and the current policy requires appraisal of the file containing the policy. Currently, if appraisal is not in ENFORCE mode, ima_write_policy() returns 0 and leads user space applications to an endless loop. Fix this issue by denying the operation regardless of the appraisal mode. Cc: [email protected] # 4.10.x Fixes: 19f8a84713edc ("ima: measure and appraise the IMA policy itself") Signed-off-by: Roberto Sassu <[email protected]> Reviewed-by: Krzysztof Struczynski <[email protected]> Signed-off-by: Mimi Zohar <[email protected]>
2020-05-07evm: Check also if *tfm is an error pointer in init_desc()Roberto Sassu1-1/+1
This patch avoids a kernel panic due to accessing an error pointer set by crypto_alloc_shash(). It occurs especially when there are many files that require an unsupported algorithm, as it would increase the likelihood of the following race condition: Task A: *tfm = crypto_alloc_shash() <= error pointer Task B: if (*tfm == NULL) <= *tfm is not NULL, use it Task B: rc = crypto_shash_init(desc) <= panic Task A: *tfm = NULL This patch uses the IS_ERR_OR_NULL macro to determine whether or not a new crypto context must be created. Cc: [email protected] Fixes: d46eb3699502b ("evm: crypto hash replaced by shash") Co-developed-by: Krzysztof Struczynski <[email protected]> Signed-off-by: Krzysztof Struczynski <[email protected]> Signed-off-by: Roberto Sassu <[email protected]> Signed-off-by: Mimi Zohar <[email protected]>
2020-05-07ima: Set file->f_mode instead of file->f_flags in ima_calc_file_hash()Roberto Sassu1-6/+6
Commit a408e4a86b36 ("ima: open a new file instance if no read permissions") tries to create a new file descriptor to calculate a file digest if the file has not been opened with O_RDONLY flag. However, if a new file descriptor cannot be obtained, it sets the FMODE_READ flag to file->f_flags instead of file->f_mode. This patch fixes this issue by replacing f_flags with f_mode as it was before that commit. Cc: [email protected] # 4.20.x Fixes: a408e4a86b36 ("ima: open a new file instance if no read permissions") Signed-off-by: Roberto Sassu <[email protected]> Reviewed-by: Goldwyn Rodrigues <[email protected]> Signed-off-by: Mimi Zohar <[email protected]>
2020-05-07net: fix a potential recursive NETDEV_FEAT_CHANGECong Wang1-1/+3
syzbot managed to trigger a recursive NETDEV_FEAT_CHANGE event between bonding master and slave. I managed to find a reproducer for this: ip li set bond0 up ifenslave bond0 eth0 brctl addbr br0 ethtool -K eth0 lro off brctl addif br0 bond0 ip li set br0 up When a NETDEV_FEAT_CHANGE event is triggered on a bonding slave, it captures this and calls bond_compute_features() to fixup its master's and other slaves' features. However, when syncing with its lower devices by netdev_sync_lower_features() this event is triggered again on slaves when the LRO feature fails to change, so it goes back and forth recursively until the kernel stack is exhausted. Commit 17b85d29e82c intentionally lets __netdev_update_features() return -1 for such a failure case, so we have to just rely on the existing check inside netdev_sync_lower_features() and skip NETDEV_FEAT_CHANGE event only for this specific failure case. Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack") Reported-by: [email protected] Reported-by: [email protected] Cc: Jarod Wilson <[email protected]> Cc: Nikolay Aleksandrov <[email protected]> Cc: Josh Poimboeuf <[email protected]> Cc: Jann Horn <[email protected]> Reviewed-by: Jay Vosburgh <[email protected]> Signed-off-by: Cong Wang <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-07mptcp: set correct vfs info for subflowsPaolo Abeni1-0/+10
When a subflow is created via mptcp_subflow_create_socket(), a new 'struct socket' is allocated, with a new i_ino value. When inspecting TCP sockets via the procfs and or the diag interface, the above ones are not related to the process owning the MPTCP master socket, even if they are a logical part of it ('ss -p' shows an empty process field) Additionally, subflows created by the path manager get the uid/gid from the running workqueue. Subflows are part of the owning MPTCP master socket, let's adjust the vfs info to reflect this. After this patch, 'ss' correctly displays subflows as belonging to the msk socket creator. Fixes: 2303f994b3e1 ("mptcp: Associate MPTCP context with TCP socket") Signed-off-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-07net: microchip: encx24j600: add missed kthread_stopChuhong Yuan1-1/+4
This driver calls kthread_run() in probe, but forgets to call kthread_stop() in probe failure and remove. Add the missed kthread_stop() to fix it. Signed-off-by: Chuhong Yuan <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-07Revert "ipv6: add mtu lock check in __ip6_rt_update_pmtu"Maciej Żenczykowski1-2/+4
This reverts commit 19bda36c4299ce3d7e5bce10bebe01764a655a6d: | ipv6: add mtu lock check in __ip6_rt_update_pmtu | | Prior to this patch, ipv6 didn't do mtu lock check in ip6_update_pmtu. | It leaded to that mtu lock doesn't really work when receiving the pkt | of ICMPV6_PKT_TOOBIG. | | This patch is to add mtu lock check in __ip6_rt_update_pmtu just as ipv4 | did in __ip_rt_update_pmtu. The above reasoning is incorrect. IPv6 *requires* icmp based pmtu to work. There's already a comment to this effect elsewhere in the kernel: $ git grep -p -B1 -A3 'RTAX_MTU lock' net/ipv6/route.c=4813= static int rt6_mtu_change_route(struct fib6_info *f6i, void *p_arg) ... /* In IPv6 pmtu discovery is not optional, so that RTAX_MTU lock cannot disable it. We still use this lock to block changes caused by addrconf/ndisc. */ This reverts to the pre-4.9 behaviour. Cc: Eric Dumazet <[email protected]> Cc: Willem de Bruijn <[email protected]> Cc: Xin Long <[email protected]> Cc: Hannes Frederic Sowa <[email protected]> Signed-off-by: Maciej Żenczykowski <[email protected]> Fixes: 19bda36c4299 ("ipv6: add mtu lock check in __ip6_rt_update_pmtu") Signed-off-by: David S. Miller <[email protected]>
2020-05-07net: bareudp: avoid uninitialized variable warningArnd Bergmann2-16/+4
clang points out that building without IPv6 would lead to returning an uninitialized variable if a packet with family!=AF_INET is passed into bareudp_udp_encap_recv(): drivers/net/bareudp.c:139:6: error: variable 'err' is used uninitialized whenever 'if' condition is false [-Werror,-Wsometimes-uninitialized] if (family == AF_INET) ^~~~~~~~~~~~~~~~~ drivers/net/bareudp.c:146:15: note: uninitialized use occurs here if (unlikely(err)) { ^~~ include/linux/compiler.h:78:42: note: expanded from macro 'unlikely' # define unlikely(x) __builtin_expect(!!(x), 0) ^ drivers/net/bareudp.c:139:2: note: remove the 'if' if its condition is always true if (family == AF_INET) ^~~~~~~~~~~~~~~~~~~~~~ This cannot happen in practice, so change the condition in a way that gcc sees the IPv4 case as unconditionally true here. For consistency, change all the similar constructs in this file the same way, using "if(IS_ENABLED())" instead of #if IS_ENABLED()". Fixes: 571912c69f0e ("net: UDP tunnel encapsulation module for tunnelling different protocols like MPLS, IP, NSH etc.") Signed-off-by: Arnd Bergmann <[email protected]> Reviewed-by: Nathan Chancellor <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-07Merge tag 'trace-v5.7-rc3' of ↵Linus Torvalds8-42/+114
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing fixes from Steven Rostedt: - Fix bootconfig causing kernels to fail with CONFIG_BLK_DEV_RAM enabled - Fix allocation leaks in bootconfig tool - Fix a double initialization of a variable - Fix API bootconfig usage from kprobe boot time events - Reject NULL location for kprobes - Fix crash caused by preempt delay module not cleaning up kthread correctly - Add vmalloc_sync_mappings() to prevent x86_64 page faults from recursively faulting from tracing page faults - Fix comment in gpu/trace kerneldoc header - Fix documentation of how to create a trace event class - Make the local tracing_snapshot_instance_cond() function static * tag 'trace-v5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: tools/bootconfig: Fix resource leak in apply_xbc() tracing: Make tracing_snapshot_instance_cond() static tracing: Fix doc mistakes in trace sample gpu/trace: Minor comment updates for gpu_mem_total tracepoint tracing: Add a vmalloc_sync_mappings() for safe measure tracing: Wait for preempt irq delay thread to finish tracing/kprobes: Reject new event if loc is NULL tracing/boottime: Fix kprobe event API usage tracing/kprobes: Fix a double initialization typo bootconfig: Fix to remove bootconfig data from initrd while boot
2020-05-07Merge tag 'linux-kselftest-5.7-rc5' of ↵Linus Torvalds2-3/+58
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fixes from Shuah Khan: "ftrace test fixes and a fix to kvm Makefile for relocatable native/cross builds and installs" * tag 'linux-kselftest-5.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests: fix kvm relocatable native/cross builds and installs selftests/ftrace: Make XFAIL green color ftrace/selftest: make unresolved cases cause failure if --fail-unresolved set ftrace/selftests: workaround cgroup RT scheduling issues
2020-05-07io_uring: don't use 'fd' for openat/openat2/statxJens Axboe1-25/+7
We currently make some guesses as when to open this fd, but in reality we have no business (or need) to do so at all. In fact, it makes certain things fail, like O_PATH. Remove the fd lookup from these opcodes, we're just passing the 'fd' to generic helpers anyway. With that, we can also remove the special casing of fd values in io_req_needs_file(), and the 'fd_non_neg' check that we have. And we can ensure that we only read sqe->fd once. This fixes O_PATH usage with openat/openat2, and ditto statx path side oddities. Cc: [email protected]: # v5.6 Reported-by: Max Kellermann <[email protected]> Signed-off-by: Jens Axboe <[email protected]>
2020-05-07ALSA: rawmidi: Fix racy buffer resize under concurrent accessesTakashi Iwai2-4/+28
The rawmidi core allows user to resize the runtime buffer via ioctl, and this may lead to UAF when performed during concurrent reads or writes: the read/write functions unlock the runtime lock temporarily during copying form/to user-space, and that's the race window. This patch fixes the hole by introducing a reference counter for the runtime buffer read/write access and returns -EBUSY error when the resize is performed concurrently against read/write. Note that the ref count field is a simple integer instead of refcount_t here, since the all contexts accessing the buffer is basically protected with a spinlock, hence we need no expensive atomic ops. Also, note that this busy check is needed only against read / write functions, and not in receive/transmit callbacks; the race can happen only at the spinlock hole mentioned in the above, while the whole function is protected for receive / transmit callbacks. Reported-by: butt3rflyh4ck <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/CAFcO6XMWpUVK_yzzCpp8_XP7+=oUpQvuBeCbMffEDkpe8jWrfg@mail.gmail.com Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2020-05-07net: hisilicon: Make CONFIG_HNS invisibleGeert Uytterhoeven1-1/+1
The HNS config symbol enables the framework support for the Hisilicon Network Subsystem. It is already selected by all of its users, so there is no reason to make it visible. Signed-off-by: Geert Uytterhoeven <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-07usb: hso: correct debug messageOliver Neukum1-1/+1
If you do not find the OUT endpoint, you should say so, rather than copy the error message for the IN endpoint. Presumably a copy and paste error. Signed-off-by: Oliver Neukum <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-05-07tools/bootconfig: Fix resource leak in apply_xbc()Yunfeng Ye1-3/+6
Fix the @data and @fd allocations that are leaked in the error path of apply_xbc(). Link: http://lkml.kernel.org/r/[email protected] Acked-by: Masami Hiramatsu <[email protected]> Signed-off-by: Yunfeng Ye <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-05-07tracing: Make tracing_snapshot_instance_cond() staticZou Wei1-1/+2
Fix the following sparse warning: kernel/trace/trace.c:950:6: warning: symbol 'tracing_snapshot_instance_cond' was not declared. Should it be static? Link: http://lkml.kernel.org/r/[email protected] Reported-by: Hulk Robot <[email protected]> Signed-off-by: Zou Wei <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-05-07tracing: Fix doc mistakes in trace sampleWei Yang1-1/+1
As the example below shows, DECLARE_EVENT_CLASS() is used instead of DEFINE_EVENT_CLASS(). Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Wei Yang <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-05-07gpu/trace: Minor comment updates for gpu_mem_total tracepointYiwei Zhang1-1/+1
This change updates the improper comment for the 'size' attribute in the tracepoint definition. Most gfx drivers pre-fault in physical pages instead of making virtual allocations. So we drop the 'Virtual' keyword here and leave this to the implementations. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: Yiwei Zhang <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-05-07tracing: Add a vmalloc_sync_mappings() for safe measureSteven Rostedt (VMware)1-0/+13
x86_64 lazily maps in the vmalloc pages, and the way this works with per_cpu areas can be complex, to say the least. Mappings may happen at boot up, and if nothing synchronizes the page tables, those page mappings may not be synced till they are used. This causes issues for anything that might touch one of those mappings in the path of the page fault handler. When one of those unmapped mappings is touched in the page fault handler, it will cause another page fault, which in turn will cause a page fault, and leave us in a loop of page faults. Commit 763802b53a42 ("x86/mm: split vmalloc_sync_all()") split vmalloc_sync_all() into vmalloc_sync_unmappings() and vmalloc_sync_mappings(), as on system exit, it did not need to do a full sync on x86_64 (although it still needed to be done on x86_32). By chance, the vmalloc_sync_all() would synchronize the page mappings done at boot up and prevent the per cpu area from being a problem for tracing in the page fault handler. But when that synchronization in the exit of a task became a nop, it caused the problem to appear. Link: https://lore.kernel.org/r/[email protected] Cc: [email protected] Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code") Reported-by: "Tzvetomir Stoyanov (VMware)" <[email protected]> Suggested-by: Joerg Roedel <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>
2020-05-07tracing: Wait for preempt irq delay thread to finishSteven Rostedt (VMware)1-6/+24
Running on a slower machine, it is possible that the preempt delay kernel thread may still be executing if the module was immediately removed after added, and this can cause the kernel to crash as the kernel thread might be executing after its code has been removed. There's no reason that the caller of the code shouldn't just wait for the delay thread to finish, as the thread can also be created by a trigger in the sysfs code, which also has the same issues. Link: http://lore.kernel.org/r/[email protected] Cc: [email protected] Fixes: 793937236d1ee ("lib: Add module for testing preemptoff/irqsoff latency tracers") Reported-by: Xiao Yang <[email protected]> Reviewed-by: Xiao Yang <[email protected]> Reviewed-by: Joel Fernandes <[email protected]> Signed-off-by: Steven Rostedt (VMware) <[email protected]>