aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-02-12Btrfs: fix race between shrinking truncate and fiemapFilipe Manana1-0/+8
When there is a fiemap executing in parallel with a shrinking truncate we can end up in a situation where we have extent maps for which we no longer have corresponding file extent items. This is generally harmless and at the moment the only consequences are missing file extent items representing holes after we expand the file size again after the truncate operation removed the prealloc extent items, and stale information for future fiemap calls (reporting extents that no longer exist or may have been reallocated to other files for example). Consider the following example: 1) Our inode has a size of 128KiB, one 128KiB extent at file offset 0 and a 1MiB prealloc extent at file offset 128KiB; 2) Task A starts doing a shrinking truncate of our inode to reduce it to a size of 64KiB. Before it searches the subvolume tree for file extent items to delete, it drops all the extent maps in the range from 64KiB to (u64)-1 by calling btrfs_drop_extent_cache(); 3) Task B starts doing a fiemap against our inode. When looking up for the inode's extent maps in the range from 128KiB to (u64)-1, it doesn't find any in the inode's extent map tree, since they were removed by task A. Because it didn't find any in the extent map tree, it scans the inode's subvolume tree for file extent items, and it finds the 1MiB prealloc extent at file offset 128KiB, then it creates an extent map based on that file extent item and adds it to inode's extent map tree (this ends up being done by btrfs_get_extent() <- btrfs_get_extent_fiemap() <- get_extent_skip_holes()); 4) Task A then drops the prealloc extent at file offset 128KiB and shrinks the 128KiB extent file offset 0 to a length of 64KiB. The truncation operation finishes and we end up with an extent map representing a 1MiB prealloc extent at file offset 128KiB, despite we don't have any more that extent; After this the two types of problems we have are: 1) Future calls to fiemap always report that a 1MiB prealloc extent exists at file offset 128KiB. This is stale information, no longer correct; 2) If the size of the file is increased, by a truncate operation that increases the file size or by a write into a file offset > 64KiB for example, we end up not inserting file extent items to represent holes for any range between 128KiB and 128KiB + 1MiB, since the hole expansion function, btrfs_cont_expand() will skip hole insertion for any range for which an extent map exists that represents a prealloc extent. This causes fsck to complain about missing file extent items when not using the NO_HOLES feature. The second issue could be often triggered by test case generic/561 from fstests, which runs fsstress and duperemove in parallel, and duperemove does frequent fiemap calls. Essentially the problems happens because fiemap does not acquire the inode's lock while truncate does, and fiemap locks the file range in the inode's iotree while truncate does not. So fix the issue by making btrfs_truncate_inode_items() lock the file range from the new file size to (u64)-1, so that it serializes with fiemap. CC: [email protected] # 4.4+ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2020-02-12btrfs: log message when rw remount is attempted with unclean tree-logDavid Sterba1-0/+2
A remount to a read-write filesystem is not safe when there's tree-log to be replayed. Files that could be opened until now might be affected by the changes in the tree-log. A regular mount is needed to replay the log so the filesystem presents the consistent view with the pending changes included. CC: [email protected] # 4.4+ Reviewed-by: Anand Jain <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: David Sterba <[email protected]>
2020-02-12btrfs: print message when tree-log replay startsDavid Sterba1-0/+1
There's no logged information about tree-log replay although this is something that points to previous unclean unmount. Other filesystems report that as well. Suggested-by: Chris Murphy <[email protected]> CC: [email protected] # 4.4+ Reviewed-by: Anand Jain <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: David Sterba <[email protected]>
2020-02-12Btrfs: fix race between using extent maps and merging themFilipe Manana1-0/+11
We have a few cases where we allow an extent map that is in an extent map tree to be merged with other extents in the tree. Such cases include the unpinning of an extent after the respective ordered extent completed or after logging an extent during a fast fsync. This can lead to subtle and dangerous problems because when doing the merge some other task might be using the same extent map and as consequence see an inconsistent state of the extent map - for example sees the new length but has seen the old start offset. With luck this triggers a BUG_ON(), and not some silent bug, such as the following one in __do_readpage(): $ cat -n fs/btrfs/extent_io.c 3061 static int __do_readpage(struct extent_io_tree *tree, 3062 struct page *page, (...) 3127 em = __get_extent_map(inode, page, pg_offset, cur, 3128 end - cur + 1, get_extent, em_cached); 3129 if (IS_ERR_OR_NULL(em)) { 3130 SetPageError(page); 3131 unlock_extent(tree, cur, end); 3132 break; 3133 } 3134 extent_offset = cur - em->start; 3135 BUG_ON(extent_map_end(em) <= cur); (...) Consider the following example scenario, where we end up hitting the BUG_ON() in __do_readpage(). We have an inode with a size of 8KiB and 2 extent maps: extent A: file offset 0, length 4KiB, disk_bytenr = X, persisted on disk by a previous transaction extent B: file offset 4KiB, length 4KiB, disk_bytenr = X + 4KiB, not yet persisted but writeback started for it already. The extent map is pinned since there's writeback and an ordered extent in progress, so it can not be merged with extent map A yet The following sequence of steps leads to the BUG_ON(): 1) The ordered extent for extent B completes, the respective page gets its writeback bit cleared and the extent map is unpinned, at that point it is not yet merged with extent map A because it's in the list of modified extents; 2) Due to memory pressure, or some other reason, the MM subsystem releases the page corresponding to extent B - btrfs_releasepage() is called and returns 1, meaning the page can be released as it's not dirty, not under writeback anymore and the extent range is not locked in the inode's iotree. However the extent map is not released, either because we are not in a context that allows memory allocations to block or because the inode's size is smaller than 16MiB - in this case our inode has a size of 8KiB; 3) Task B needs to read extent B and ends up __do_readpage() through the btrfs_readpage() callback. At __do_readpage() it gets a reference to extent map B; 4) Task A, doing a fast fsync, calls clear_em_loggin() against extent map B while holding the write lock on the inode's extent map tree - this results in try_merge_map() being called and since it's possible to merge extent map B with extent map A now (the extent map B was removed from the list of modified extents), the merging begins - it sets extent map B's start offset to 0 (was 4KiB), but before it increments the map's length to 8KiB (4kb + 4KiB), task A is at: BUG_ON(extent_map_end(em) <= cur); The call to extent_map_end() sees the extent map has a start of 0 and a length still at 4KiB, so it returns 4KiB and 'cur' is 4KiB, so the BUG_ON() is triggered. So it's dangerous to modify an extent map that is in the tree, because some other task might have got a reference to it before and still using it, and needs to see a consistent map while using it. Generally this is very rare since most paths that lookup and use extent maps also have the file range locked in the inode's iotree. The fsync path is pretty much the only exception where we don't do it to avoid serialization with concurrent reads. Fix this by not allowing an extent map do be merged if if it's being used by tasks other then the one attempting to merge the extent map (when the reference count of the extent map is greater than 2). Reported-by: ryusuke1925 <[email protected]> Reported-by: Koki Mitani <[email protected]> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206211 CC: [email protected] # 4.4+ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
2020-02-12btrfs: ref-verify: fix memory leaksWenwen Wang1-0/+5
In btrfs_ref_tree_mod(), 'ref' and 'ra' are allocated through kzalloc() and kmalloc(), respectively. In the following code, if an error occurs, the execution will be redirected to 'out' or 'out_unlock' and the function will be exited. However, on some of the paths, 'ref' and 'ra' are not deallocated, leading to memory leaks. For example, if 'action' is BTRFS_ADD_DELAYED_EXTENT, add_block_entry() will be invoked. If the return value indicates an error, the execution will be redirected to 'out'. But, 'ref' is not deallocated on this path, causing a memory leak. To fix the above issues, deallocate both 'ref' and 'ra' before exiting from the function when an error is encountered. CC: [email protected] # 4.15+ Signed-off-by: Wenwen Wang <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2020-02-12tools headers kvm: Sync linux/kvm.h with the kernel sourcesArnaldo Carvalho de Melo1-0/+5
To pick up the changes from: 7de3f1423ff9 ("KVM: s390: Add new reset vcpu API") So far we're ignoring those arch specific ioctls, we need to revisit this at some time to have arch specific tables, etc: $ grep S390 tools/perf/trace/beauty/kvm_ioctl.sh egrep -v " ((ARM|PPC|S390)_|[GS]ET_(DEBUGREGS|PIT2|XSAVE|TSC_KHZ)|CREATE_SPAPR_TCE_64)" | \ $ This addresses these tools/perf build warnings: Warning: Kernel ABI header at 'tools/arch/arm/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm/include/uapi/asm/kvm.h' diff -u tools/arch/arm/include/uapi/asm/kvm.h arch/arm/include/uapi/asm/kvm.h Cc: Adrian Hunter <[email protected]> Cc: Christian Borntraeger <[email protected]> Cc: Janosch Frank <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-02-12tools headers kvm: Sync kvm headers with the kernel sourcesArnaldo Carvalho de Melo1-2/+10
To pick up the changes from: 290a6bb06de9 ("arm64: KVM: Add UAPI notes for swapped registers") No tools changes are caused by this. This addresses these tools/perf build warnings: Cc: Adrian Hunter <[email protected]> Cc: Andrew Jones <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Marc Zyngier <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-02-12tools arch x86: Sync asm/cpufeatures.h with the kernel sourcesArnaldo Carvalho de Melo1-0/+2
To pick up the changes from: 85c17291e2eb ("x86/cpufeatures: Add flag to track whether MSR IA32_FEAT_CTL is configured") f444a5ff95dc ("x86/cpufeatures: Add support for fast short REP; MOVSB") These don't cause any changes in tooling, just silences this perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h' diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h Cc: Adrian Hunter <[email protected]> Cc: Borislav Petkov <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Sean Christopherson <[email protected]> Cc: Tony Luck <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-02-12tools headers x86: Sync disabled-features.hArnaldo Carvalho de Melo1-7/+1
To silence the following tools/perf build warning: Warning: Kernel ABI header at 'tools/arch/x86/include/asm/disabled-features.h' differs from latest version at 'arch/x86/include/asm/disabled-features.h' diff -u tools/arch/x86/include/asm/disabled-features.h arch/x86/include/asm/disabled-features.h Picking up the changes in: 45fc24e89b7c ("x86/mpx: remove MPX from arch/x86") that didn't entail any functionality change in the tooling side. Cc: Adrian Hunter <[email protected]> Cc: Dave Hansen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-02-12drm/i915: Mark the removal of the i915_request from the sched.linkChris Wilson1-0/+2
Keep the rq->fence.flags consistent with the status of the rq->sched.link, and clear the associated bits when decoupling the link on retirement (as we may wish to inspect those flags independent of other state). Fixes: c3f1ed90e6ff ("drm/i915/gt: Allow temporary suspension of inflight requests") References: https://gitlab.freedesktop.org/drm/intel/issues/997 Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit b4a9a149f91ea345da76bcfe3f8a39715ac346a6) Signed-off-by: Jani Nikula <[email protected]>
2020-02-12drm/i915/execlists: Reclaim the hanging virtual requestChris Wilson2-0/+185
If we encounter a hang on a virtual engine, as we process the hang the request may already have been moved back to the virtual engine (we are processing the hang on the physical engine). We need to reclaim the request from the virtual engine so that the locking is consistent and local to the real engine on which we will hold the request for error state capturing. v2: Pull the reclamation into execlists_hold() and assert that cannot be called from outside of the reset (i.e. with the tasklet disabled). v3: Added selftest v4: Drop the reference owned by the virtual engine Fixes: ad18ba7b5eeb ("drm/i915/execlists: Offline error capture") Testcase: igt/gem_exec_balancer/hang Signed-off-by: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 989df3a7bd2abe566521e61d1aebf603eb013b7f) Signed-off-by: Jani Nikula <[email protected]>
2020-02-12drm/i915/execlists: Take a reference while capturing the guilty requestChris Wilson2-10/+32
Thanks to preempt-to-busy, we leave the request on the HW as we submit the preemption request. This means that the request may complete at any moment as we process HW events, and in particular the request may be retired as we are planning to capture it for a preemption timeout. Be more careful while obtaining the request to capture after a preemption timeout, and check to see if it completed before we were able to put it on the on-hold list. If we do see it did complete just before we capture the request, proclaim the preemption-timeout a false positive and pardon the reset as we should hit an arbitration point momentarily and so be able to process the preemption. Note that even after we move the request to be on hold it may be retired (as the reset to stop the HW comes after), so we do require to hold our own reference as we work on the request for capture (and all of the peeking at state within the request needs to be carefully protected). Fixes: c3f1ed90e6ff ("drm/i915/gt: Allow temporary suspension of inflight requests") Closes: https://gitlab.freedesktop.org/drm/intel/issues/997 Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 4ba5c086a1d8e38d6927967ae1a3271a6ab7a927) Signed-off-by: Jani Nikula <[email protected]>
2020-02-12drm/i915/execlists: Offline error captureChris Wilson1-2/+120
Currently, we skip error capture upon forced preemption. We apply forced preemption when there is a higher priority request that should be running but is being blocked, and we skip inline error capture so that the preemption request is not further delayed by a user controlled capture -- extending the denial of service. However, preemption reset is also used for heartbeats and regular GPU hangs. By skipping the error capture, we remove the ability to debug GPU hangs. In order to capture the error without delaying the preemption request further, we can do an out-of-line capture by removing the guilty request from the execution queue and scheduling a worker to dump that request. When removing a request, we need to remove the entire context and all descendants from the execution queue, so that they do not jump past. Closes: https://gitlab.freedesktop.org/drm/intel/issues/738 Fixes: 3a7a92aba8fb ("drm/i915/execlists: Force preemption") Signed-off-by: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 748317386afb235e11616098d2c7772e49776b58) Signed-off-by: Jani Nikula <[email protected]>
2020-02-12drm/i915/gt: Allow temporary suspension of inflight requestsChris Wilson5-6/+321
In order to support out-of-line error capture, we need to remove the active request from HW and put it to one side while a worker compresses and stores all the details associated with that request. (As that compression may take an arbitrary user-controlled amount of time, we want to let the engine continue running on other workloads while the hanging request is dumped.) Not only do we need to remove the active request, but we also have to remove its context and all requests that were dependent on it (both in flight, queued and future submission). Finally once the capture is complete, we need to be able to resubmit the request and its dependents and allow them to execute. v2: Replace stack recursion with a simple list. v3: Check all the parents, not just the first, when searching for a stuck ancestor! References: https://gitlab.freedesktop.org/drm/intel/issues/738 Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 32ff621fd74496f0c33644125fb69ff175859b1f) Signed-off-by: Jani Nikula <[email protected]>
2020-02-12drm/i915: Keep track of request among the scheduling listsChris Wilson4-18/+38
If we keep track of when the i915_request.sched.link is on the HW runlist, or in the priority queue we can simplify our interactions with the request (such as during rescheduling). This also simplifies the next patch where we introduce a new in-between list, for requests that are ready but neither on the run list or in the queue. v2: Update i915_sched_node.link explanation for current usage where it is a link on both the queue and on the runlists. Signed-off-by: Chris Wilson <[email protected]> Cc: Mika Kuoppala <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 672c368f9398042b629740cc9816e8e939eff2db) Signed-off-by: Jani Nikula <[email protected]>
2020-02-12Merge tag 'gvt-fixes-2020-02-12' of https://github.com/intel/gvt-linux into ↵Jani Nikula2-2/+6
drm-intel-next-fixes gvt-fixes-2020-02-12 - fix possible high-order allocation fail for late load (Igor) - fix one missed lock for ppgtt mm LRU list (Igor) Signed-off-by: Jani Nikula <[email protected]> From: Zhenyu Wang <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2020-02-12tools include UAPI: Sync sound/asound.h copyArnaldo Carvalho de Melo1-23/+132
Picking the changes from: 46b770f720bd ("ALSA: uapi: Fix sparse warning") a103a3989993 ("ALSA: control: Fix incompatible protocol error") bd3eb4e87eb3 ("ALSA: ctl: bump protocol version up to v2.1.0") ff16351e3f30 ("ALSA: ctl: remove dimen member from elem_info structure") 542283566679 ("ALSA: ctl: remove unused macro for timestamping of elem_value") 7fd7d6c50451 ("ALSA: uapi: Fix typos and header inclusion in asound.h") 1cfaef961703 ("ALSA: bump uapi version numbers") 80fe7430c708 ("ALSA: add new 32-bit layout for snd_pcm_mmap_status/control") 07094ae6f952 ("ALSA: Avoid using timespec for struct snd_timer_tread") d9e5582c4bb2 ("ALSA: Avoid using timespec for struct snd_rawmidi_status") 3ddee7f88aaf ("ALSA: Avoid using timespec for struct snd_pcm_status") a4e7dd35b9da ("ALSA: Avoid using timespec for struct snd_ctl_elem_value") a07804cc7472 ("ALSA: Avoid using timespec for struct snd_timer_status") Which entails no changes in the tooling side. To silence this perf tools build warning: Warning: Kernel ABI header at 'tools/include/uapi/sound/asound.h' differs from latest version at 'include/uapi/sound/asound.h' diff -u tools/include/uapi/sound/asound.h include/uapi/sound/asound.h Cc: Adrian Hunter <[email protected]> Cc: Arnd Bergmann <[email protected]> Cc: Baolin Wang <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Ranjani Sridharan <[email protected]> Cc: Takashi Iwai <[email protected]> Cc: Takashi Sakamoto <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-02-12tools headers UAPI: Sync asm-generic/mman-common.h with the kernelArnaldo Carvalho de Melo1-0/+2
To pick the changes from: d41938d2cbee ("mm: Reserve asm-generic prot flags 0x10 and 0x20 for arch use") No changes in tooling, just a rebuild as files needed got touched. This addresses the following perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/asm-generic/mman-common.h' differs from latest version at 'include/uapi/asm-generic/mman-common.h' diff -u tools/include/uapi/asm-generic/mman-common.h include/uapi/asm-generic/mman-common.h Cc: Adrian Hunter <[email protected]> Cc: Dave Martin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Will Deacon <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-02-12perf tools: Add arm64 version of get_cpuid()John Garry1-15/+48
Add an arm64 version of get_cpuid(), which is used for various annotation and headers - for example, I now get the CPUID in "perf report --header", as shown in this snippet: # hostname : ubuntu # os release : 5.5.0-rc1-dirty # perf version : 5.5.rc1.gbf8a13dc9851 # arch : aarch64 # nrcpus online : 96 # nrcpus avail : 96 # cpuid : 0x00000000480fd010 Since much of the code to read the MIDR is already in get_cpuid_str(), factor out this code. Tester notes: I tested this patch on my new ARM64 Kunpeng 920 server. [root@node1 zsk]# ./perf --version perf version 5.6.rc1.g2cdb955b7252 Both perf list and perf stat can work. Signed-off-by: John Garry <[email protected]> Tested-by: Shaokun Zhang <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Namhyung Kim <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Will Deacon <[email protected]> Cc: [email protected] Cc: [email protected] Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-02-12tools headers UAPI: Sync drm/i915_drm.h with the kernel sourcesArnaldo Carvalho de Melo1-0/+32
To pick the change in: cc662126b413 ("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET") That don't result in any changes in tooling, just silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/drm/i915_drm.h' differs from latest version at 'include/uapi/drm/i915_drm.h' diff -u tools/include/uapi/drm/i915_drm.h include/uapi/drm/i915_drm.h Cc: Abdiel Janulgue <[email protected]> Cc: Adrian Hunter <[email protected]> Cc: Chris Wilson <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-02-12tools headers uapi: Sync linux/fscrypt.h with the kernel sourcesArnaldo Carvalho de Melo1-1/+13
To pick the changes from: e933adde6f97 ("fscrypt: include <linux/ioctl.h> in UAPI header") 93edd392cad7 ("fscrypt: support passing a keyring key to FS_IOC_ADD_ENCRYPTION_KEY") That don't trigger any changes in tooling. This silences this perf build warning: Warning: Kernel ABI header at 'tools/include/uapi/linux/fscrypt.h' differs from latest version at 'include/uapi/linux/fscrypt.h' diff -u tools/include/uapi/linux/fscrypt.h include/uapi/linux/fscrypt.h Cc: Adrian Hunter <[email protected]> Cc: Eric Biggers <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Namhyung Kim <[email protected]> Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-02-12KVM: x86: Deliver exception payload on KVM_GET_VCPU_EVENTSOliver Upton1-13/+16
KVM allows the deferral of exception payloads when a vCPU is in guest mode to allow the L1 hypervisor to intercept certain events (#PF, #DB) before register state has been modified. However, this behavior is incompatible with the KVM_{GET,SET}_VCPU_EVENTS ABI, as userspace expects register state to have been immediately modified. Userspace may opt-in for the payload deferral behavior with the KVM_CAP_EXCEPTION_PAYLOAD per-VM capability. As such, kvm_multiple_exception() will immediately manipulate guest registers if the capability hasn't been requested. Since the deferral is only necessary if a userspace ioctl were to be serviced at the same as a payload bearing exception is recognized, this behavior can be relaxed. Instead, opportunistically defer the payload from kvm_multiple_exception() and deliver the payload before completing a KVM_GET_VCPU_EVENTS ioctl. Signed-off-by: Oliver Upton <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-12KVM: nVMX: Handle pending #DB when injecting INIT VM-exitOliver Upton1-0/+28
SDM 27.3.4 states that the 'pending debug exceptions' VMCS field will be populated if a VM-exit caused by an INIT signal takes priority over a debug-trap. Emulate this behavior when synthesizing an INIT signal VM-exit into L1. Fixes: 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states") Signed-off-by: Oliver Upton <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-12KVM: x86: Mask off reserved bit from #DB exception payloadOliver Upton1-0/+8
KVM defines the #DB payload as compatible with the 'pending debug exceptions' field under VMX, not DR6. Mask off bit 12 when applying the payload to DR6, as it is reserved on DR6 but not the 'pending debug exceptions' field. Fixes: f10c729ff965 ("kvm: vmx: Defer setting of DR6 until #DB delivery") Signed-off-by: Oliver Upton <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-12drm/i915/gem: Tighten checks and acquiring the mmap objectChris Wilson2-30/+21
Make sure we hold the rcu lock as we acquire the rcu protected reference of the object when looking it up from the associated mmap vma. Closes: https://gitlab.freedesktop.org/drm/intel/issues/1083 Fixes: cc662126b413 ("drm/i915: Introduce DRM_I915_GEM_MMAP_OFFSET") Signed-off-by: Chris Wilson <[email protected]> Cc: Abdiel Janulgue <[email protected]> Cc: Matthew Auld <[email protected]> Reviewed-by: Matthew Auld <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 280d14a69da2e71f43408537c008f2775d5e5360) Signed-off-by: Jani Nikula <[email protected]>
2020-02-12drm/i915: Fix preallocated barrier list appendJosé Roberto de Souza1-9/+10
Only the first and the last nodes were being added to ref->preallocated_barriers. Renaming variables to make it more easy to read. Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin") Cc: Chris Wilson <[email protected]> Cc: Maarten Lankhorst <[email protected]> Signed-off-by: José Roberto de Souza <[email protected]> Reviewed-by: Chris Wilson <[email protected]> Signed-off-by: Chris Wilson <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit d4c3c0b8221a72107eaf35c80c40716b81ca463e) Signed-off-by: Jani Nikula <[email protected]>
2020-02-12drm/i915/gt: Acquire ce->active before ce->pin_count/ce->pin_mutexChris Wilson2-21/+31
Similar to commit ac0e331a628b ("drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release") we have the same race of trying to pin the context underneath a mutex while allowing the decrement to be atomic outside of that mutex. This leads to the problem where two threads may simultaneously try to pin the context and the second not notice that they needed to repin the context. <2> [198.669621] kernel BUG at drivers/gpu/drm/i915/gt/intel_timeline.c:387! <4> [198.669703] invalid opcode: 0000 [#1] PREEMPT SMP PTI <4> [198.669712] CPU: 0 PID: 1246 Comm: gem_exec_create Tainted: G U W 5.5.0-rc6-CI-CI_DRM_7755+ #1 <4> [198.669723] Hardware name: /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017 <4> [198.669776] RIP: 0010:timeline_advance+0x7b/0xe0 [i915] <4> [198.669785] Code: 00 48 c7 c2 10 f1 46 a0 48 c7 c7 70 1b 32 a0 e8 bb dd e7 e0 bf 01 00 00 00 e8 d1 af e7 e0 31 f6 bf 09 00 00 00 e8 35 ef d8 e0 <0f> 0b 48 c7 c1 48 fa 49 a0 ba 84 01 00 00 48 c7 c6 10 f1 46 a0 48 <4> [198.669803] RSP: 0018:ffffc900004c3a38 EFLAGS: 00010296 <4> [198.669810] RAX: ffff888270b35140 RBX: ffff88826f32ee00 RCX: 0000000000000006 <4> [198.669818] RDX: 00000000000017c5 RSI: 0000000000000000 RDI: 0000000000000009 <4> [198.669826] RBP: ffffc900004c3a64 R08: 0000000000000000 R09: 0000000000000000 <4> [198.669834] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88826f9b5980 <4> [198.669841] R13: 0000000000000cc0 R14: ffffc900004c3dc0 R15: ffff888253610068 <4> [198.669849] FS: 00007f63e663fe40(0000) GS:ffff888276c00000(0000) knlGS:0000000000000000 <4> [198.669857] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4> [198.669864] CR2: 00007f171f8e39a8 CR3: 000000026b1f6005 CR4: 00000000003606f0 <4> [198.669872] Call Trace: <4> [198.669924] intel_timeline_get_seqno+0x12/0x40 [i915] <4> [198.669977] __i915_request_create+0x76/0x5a0 [i915] <4> [198.670024] i915_request_create+0x86/0x1c0 [i915] <4> [198.670068] i915_gem_do_execbuffer+0xbf2/0x2500 [i915] <4> [198.670082] ? __lock_acquire+0x460/0x15d0 <4> [198.670128] i915_gem_execbuffer2_ioctl+0x11f/0x470 [i915] <4> [198.670171] ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915] <4> [198.670181] drm_ioctl_kernel+0xa7/0xf0 <4> [198.670188] drm_ioctl+0x2e1/0x390 <4> [198.670233] ? i915_gem_execbuffer_ioctl+0x300/0x300 [i915] Fixes: 841350223816 ("drm/i915/gt: Drop mutex serialisation between context pin/unpin") References: ac0e331a628b ("drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_release") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit e5429340bfa2dc43a07c3329e0c30cdae4cc0b35) Signed-off-by: Jani Nikula <[email protected]>
2020-02-12drm/i915: Tighten atomicity of i915_active_acquire vs i915_active_releaseChris Wilson1-7/+9
As we use a mutex to serialise the first acquire (as it may be a lengthy operation), but only an atomic decrement for the release, we have to be careful in case a second thread races and completes both acquire/release as the first finishes its acquire. Thread A Thread B i915_active_acquire i915_active_acquire atomic_read() == 0 atomic_read() == 0 mutex_lock() mutex_lock() atomic_read() == 0 ref->active(); atomic_inc() mutex_unlock() atomic_read() == 1 i915_active_release atomic_dec_and_test() -> 0 ref->retire() atomic_inc() -> 1 mutex_unlock() So thread A has acquired the ref->active_count but since the ref was still active at the time, it did not initialise it. By switching the check inside the mutex to an atomic increment only if already active, we close the race. Fixes: c9ad602feabe ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree") Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit ac0e331a628b5ded087eab09fad2ffb082ac61ba) Signed-off-by: Jani Nikula <[email protected]>
2020-02-12drm/i915: Stub out i915_gpu_coredump_putChris Wilson1-0/+4
i915_gpu_coreddump_put is currently only defined if CONFIG_DRM_I915_CAPTURE_ERROR is enabled, provide a stub otherwise. Reported-by: Mike Lothian <[email protected]> Fixes: 742379c0c400 ("drm/i915: Start chopping up the GPU error capture") Signed-off-by: Chris Wilson <[email protected]> Cc: Mike Lothian <[email protected]> Cc: Andi Shyti <[email protected]> Reviewed-by: Andi Shyti <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 7e36505d0cf82f2920f2fd22ebb14a8b540396a3) Signed-off-by: Jani Nikula <[email protected]>
2020-02-12KVM: Disable preemption in kvm_get_running_vcpu()Marc Zyngier2-15/+13
Accessing a per-cpu variable only makes sense when preemption is disabled (and the kernel does check this when the right debug options are switched on). For kvm_get_running_vcpu(), it is fine to return the value after re-enabling preemption, as the preempt notifiers will make sure that this is kept consistent across task migration (the comment above the function hints at it, but lacks the crucial preemption management). While we're at it, move the comment from the ARM code, which explains why the whole thing works. Fixes: 7495e22bb165 ("KVM: Move running VCPU from ARM to common code"). Cc: Paolo Bonzini <[email protected]> Reported-by: Zenghui Yu <[email protected]> Tested-by: Zenghui Yu <[email protected]> Reviewed-by: Peter Xu <[email protected]> Signed-off-by: Marc Zyngier <[email protected]> Link: https://lore.kernel.org/r/[email protected] Message-id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-12KVM: x86: do not reset microcode version on INIT or RESETPaolo Bonzini2-2/+2
Do not initialize the microcode version at RESET or INIT, only on vCPU creation. Microcode updates are not lost during INIT, and exact behavior across a warm RESET is not specified by the architecture. Since we do not support a microcode update directly from the hypervisor, but only as a result of userspace setting the microcode version MSR, it's simpler for userspace if we do nothing in KVM and let userspace emulate behavior for RESET as it sees fit. Userspace can tie the fix to the availability of MSR_IA32_UCODE_REV in the list of emulated MSRs. Reported-by: Alex Williamson <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2020-02-12ALSA: hda/realtek - Fix silent output on MSI-GL73Takashi Iwai1-0/+1
MSI-GL73 laptop with ALC1220 codec requires a similar workaround for Clevo laptops to enforce the DAC/mixer connection path. Set up a quirk entry for that. BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204159 Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2020-02-12ALSA: hda/realtek - Add more codec supported Headset ButtonKailang Yang1-0/+3
Add supported Headset Button for ALC215/ALC285/ALC289. Signed-off-by: Kailang Yang <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2020-02-11Merge branch 'Bug-fixes-for-ENA-Ethernet-driver'David S. Miller5-44/+115
Sameeh Jubran says: ==================== Bug fixes for ENA Ethernet driver Difference from V1: * Started using netdev_rss_key_fill() * Dropped superflous changes that are not related to bug fixes as requested by Jakub ==================== Signed-off-by: David S. Miller <[email protected]>
2020-02-11net: ena: ena-com.c: prevent NULL pointer dereferenceArthur Kiyanovski1-0/+5
comp_ctx can be NULL in a very rare case when an admin command is executed during the execution of ena_remove(). The bug scenario is as follows: * ena_destroy_device() sets the comp_ctx to be NULL * An admin command is executed before executing unregister_netdev(), this can still happen because our device can still receive callbacks from the netdev infrastructure such as ethtool commands. * When attempting to access the comp_ctx, the bug occurs since it's set to NULL Fix: Added a check that comp_ctx is not NULL Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <[email protected]> Signed-off-by: Arthur Kiyanovski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11net: ena: ethtool: use correct value for crc32 hashSameeh Jubran1-2/+2
Up till kernel 4.11 there was no enum defined for crc32 hash in ethtool, thus the xor enum was used for supporting crc32. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11net: ena: make ena rxfh support ETH_RSS_HASH_NO_CHANGEArthur Kiyanovski3-0/+16
As the name suggests ETH_RSS_HASH_NO_CHANGE is received upon changing the key or indirection table using ethtool while keeping the same hash function. Also add a function for retrieving the current hash function from the ena-com layer. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <[email protected]> Signed-off-by: Saeed Bshara <[email protected]> Signed-off-by: Arthur Kiyanovski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11net: ena: fix corruption of dev_idx_to_host_tblArthur Kiyanovski1-28/+0
The function ena_com_ind_tbl_convert_from_device() has an overflow bug as explained below. Either way, this function is not needed at all since we don't retrieve the indirection table from the device at any point which means that this conversion is not needed. The bug: The for loop iterates over all io_sq_queues, when passing the actual number of used queues the io_sq_queues[i].idx equals 0 since they are uninitialized which results in the following code to be executed till the end of the loop: dev_idx_to_host_tbl[0] = i; This results dev_idx_to_host_tbl[0] in being equal to ENA_TOTAL_NUM_QUEUES - 1. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <[email protected]> Signed-off-by: Arthur Kiyanovski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11net: ena: fix incorrectly saving queue numbers when setting RSS indirection ↵Arthur Kiyanovski2-1/+25
table The indirection table has the indices of the Rx queues. When we store it during set indirection operation, we convert the indices to our internal representation of the indices. Our internal representation of the indices is: even indices for Tx and uneven indices for Rx, where every Tx/Rx pair are in a consecutive order starting from 0. For example if the driver has 3 queues (3 for Tx and 3 for Rx) then the indices are as follows: 0 1 2 3 4 5 Tx Rx Tx Rx Tx Rx The BUG: The issue is that when we satisfy a get request for the indirection table, we don't convert the indices back to the original representation. The FIX: Simply apply the inverse function for the indices of the indirection table after we set it. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <[email protected]> Signed-off-by: Arthur Kiyanovski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11net: ena: rss: store hash function as values and not bitsArthur Kiyanovski1-1/+5
The device receives, stores and retrieves the hash function value as bits and not as their enum value. The bug: * In ena_com_set_hash_function() we set cmd.u.flow_hash_func.selected_func to the bit value of rss->hash_func. (1 << rss->hash_func) * In ena_com_get_hash_function() we retrieve the hash function and store it's bit value in rss->hash_func. (Now the bit value of rss->hash_func is stored in rss->hash_func instead of it's enum value) The fix: This commit fixes the issue by converting the retrieved hash function values from the device to the matching enum value of the set bit using ffs(). ffs() finds the first set bit's index in a word. Since the function returns 1 for the LSB's index, we need to subtract 1 from the returned value (note that BIT(0) is 1). Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <[email protected]> Signed-off-by: Arthur Kiyanovski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11net: ena: rss: fix failure to get indirection tableSameeh Jubran1-0/+14
On old hardware, getting / setting the hash function is not supported while gettting / setting the indirection table is. This commit enables us to still show the indirection table on older hardwares by setting the hash function and key to NULL. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11net: ena: rss: do not allocate key when not supportedSameeh Jubran1-3/+21
Currently we allocate the key whether the device supports setting the key or not. This commit adds a check to the allocation function and handles the error accordingly. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11net: ena: fix incorrect default RSS keyArthur Kiyanovski2-0/+16
Bug description: When running "ethtool -x <if_name>" the key shows up as all zeros. When we use "ethtool -X <if_name> hfunc toeplitz hkey <some:random:key>" to set the key and then try to retrieve it using "ethtool -x <if_name>" then we return the correct key because we return the one we saved. Bug cause: We don't fetch the key from the device but instead return the key that we have saved internally which is by default set to zero upon allocation. Fix: This commit fixes the issue by initializing the key to a random value using netdev_rss_key_fill(). Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <[email protected]> Signed-off-by: Arthur Kiyanovski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11net: ena: add missing ethtool TX timestamping indicationArthur Kiyanovski1-0/+1
Current implementation of the driver calls skb_tx_timestamp()to add a software tx timestamp to the skb, however the software-transmit capability is not reported in ethtool -T. This commit updates the ethtool structure to report the software-transmit capability in ethtool -T using the standard ethtool_op_get_ts_info(). This function reports all software timestamping capabilities (tx and rx), as well as setting phc_index = -1. phc_index is the index of the PTP hardware clock device that will be used for hardware timestamps. Since we don't have such a device in ENA, using the default -1 value is the correct setting. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Ezequiel Lara Gomez <[email protected]> Signed-off-by: Arthur Kiyanovski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11net: ena: fix uses of round_jiffies()Arthur Kiyanovski1-3/+3
>From the documentation of round_jiffies(): "Rounds a time delta in the future (in jiffies) up or down to (approximately) full seconds. This is useful for timers for which the exact time they fire does not matter too much, as long as they fire approximately every X seconds. By rounding these timers to whole seconds, all such timers will fire at the same time, rather than at various times spread out. The goal of this is to have the CPU wake up less, which saves power." There are 2 parts to this patch: ================================ Part 1: ------- In our case we need timer_service to be called approximately every X=1 seconds, and the exact time does not matter, so using round_jiffies() is the right way to go. Therefore we add round_jiffies() to the mod_timer() in ena_timer_service(). Part 2: ------- round_jiffies() is used in check_for_missing_keep_alive() when getting the jiffies of the expiration of the keep_alive timeout. Here it is actually a mistake to use round_jiffies() because we want the exact time when keep_alive should expire and not an approximate rounded time, which can cause early, false positive, timeouts. Therefore we remove round_jiffies() in the calculation of keep_alive_expired() in check_for_missing_keep_alive(). Fixes: 82ef30f13be0 ("net: ena: add hardware hints capability to the driver") Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Arthur Kiyanovski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11net: ena: fix potential crash when rxfh key is NULLArthur Kiyanovski1-8/+9
When ethtool -X is called without an hkey, ena_com_fill_hash_function() is called with key=NULL, which is passed to memcpy causing a crash. This commit fixes this issue by checking key is not NULL. Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Sameeh Jubran <[email protected]> Signed-off-by: Arthur Kiyanovski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11net/smc: fix leak of kernel memory to user spaceEric Dumazet1-3/+2
As nlmsg_put() does not clear the memory that is reserved, it this the caller responsability to make sure all of this memory will be written, in order to not reveal prior content. While we are at it, we can provide the socket cookie even if clsock is not set. syzbot reported : BUG: KMSAN: uninit-value in __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline] BUG: KMSAN: uninit-value in __fswab32 include/uapi/linux/swab.h:59 [inline] BUG: KMSAN: uninit-value in __swab32p include/uapi/linux/swab.h:179 [inline] BUG: KMSAN: uninit-value in __be32_to_cpup include/uapi/linux/byteorder/little_endian.h:82 [inline] BUG: KMSAN: uninit-value in get_unaligned_be32 include/linux/unaligned/access_ok.h:30 [inline] BUG: KMSAN: uninit-value in ____bpf_skb_load_helper_32 net/core/filter.c:240 [inline] BUG: KMSAN: uninit-value in ____bpf_skb_load_helper_32_no_cache net/core/filter.c:255 [inline] BUG: KMSAN: uninit-value in bpf_skb_load_helper_32_no_cache+0x14a/0x390 net/core/filter.c:252 CPU: 1 PID: 5262 Comm: syz-executor.5 Not tainted 5.5.0-rc5-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1c9/0x220 lib/dump_stack.c:118 kmsan_report+0xf7/0x1e0 mm/kmsan/kmsan_report.c:118 __msan_warning+0x58/0xa0 mm/kmsan/kmsan_instr.c:215 __arch_swab32 arch/x86/include/uapi/asm/swab.h:10 [inline] __fswab32 include/uapi/linux/swab.h:59 [inline] __swab32p include/uapi/linux/swab.h:179 [inline] __be32_to_cpup include/uapi/linux/byteorder/little_endian.h:82 [inline] get_unaligned_be32 include/linux/unaligned/access_ok.h:30 [inline] ____bpf_skb_load_helper_32 net/core/filter.c:240 [inline] ____bpf_skb_load_helper_32_no_cache net/core/filter.c:255 [inline] bpf_skb_load_helper_32_no_cache+0x14a/0x390 net/core/filter.c:252 Uninit was created at: kmsan_save_stack_with_flags mm/kmsan/kmsan.c:144 [inline] kmsan_internal_poison_shadow+0x66/0xd0 mm/kmsan/kmsan.c:127 kmsan_kmalloc_large+0x73/0xc0 mm/kmsan/kmsan_hooks.c:128 kmalloc_large_node_hook mm/slub.c:1406 [inline] kmalloc_large_node+0x282/0x2c0 mm/slub.c:3841 __kmalloc_node_track_caller+0x44b/0x1200 mm/slub.c:4368 __kmalloc_reserve net/core/skbuff.c:141 [inline] __alloc_skb+0x2fd/0xac0 net/core/skbuff.c:209 alloc_skb include/linux/skbuff.h:1049 [inline] netlink_dump+0x44b/0x1ab0 net/netlink/af_netlink.c:2224 __netlink_dump_start+0xbb2/0xcf0 net/netlink/af_netlink.c:2352 netlink_dump_start include/linux/netlink.h:233 [inline] smc_diag_handler_dump+0x2ba/0x300 net/smc/smc_diag.c:242 sock_diag_rcv_msg+0x211/0x610 net/core/sock_diag.c:256 netlink_rcv_skb+0x451/0x650 net/netlink/af_netlink.c:2477 sock_diag_rcv+0x63/0x80 net/core/sock_diag.c:275 netlink_unicast_kernel net/netlink/af_netlink.c:1302 [inline] netlink_unicast+0xf9e/0x1100 net/netlink/af_netlink.c:1328 netlink_sendmsg+0x1248/0x14d0 net/netlink/af_netlink.c:1917 sock_sendmsg_nosec net/socket.c:639 [inline] sock_sendmsg net/socket.c:659 [inline] kernel_sendmsg+0x433/0x440 net/socket.c:679 sock_no_sendpage+0x235/0x300 net/core/sock.c:2740 kernel_sendpage net/socket.c:3776 [inline] sock_sendpage+0x1e1/0x2c0 net/socket.c:937 pipe_to_sendpage+0x38c/0x4c0 fs/splice.c:458 splice_from_pipe_feed fs/splice.c:512 [inline] __splice_from_pipe+0x539/0xed0 fs/splice.c:636 splice_from_pipe fs/splice.c:671 [inline] generic_splice_sendpage+0x1d5/0x2d0 fs/splice.c:844 do_splice_from fs/splice.c:863 [inline] do_splice fs/splice.c:1170 [inline] __do_sys_splice fs/splice.c:1447 [inline] __se_sys_splice+0x2380/0x3350 fs/splice.c:1427 __x64_sys_splice+0x6e/0x90 fs/splice.c:1427 do_syscall_64+0xb8/0x160 arch/x86/entry/common.c:296 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: f16a7dd5cf27 ("smc: netlink interface for SMC sockets") Signed-off-by: Eric Dumazet <[email protected]> Cc: Ursula Braun <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11i40e: Fix the conditional for i40e_vc_validate_vqs_bitmapsBrett Creeley1-2/+2
Commit d9d6a9aed3f6 ("i40e: Fix virtchnl_queue_select bitmap validation") introduced a necessary change for verifying how queue bitmaps from the iavf driver get validated. Unfortunately, the conditional was reversed. Fix this. Fixes: d9d6a9aed3f6 ("i40e: Fix virtchnl_queue_select bitmap validation") Signed-off-by: Brett Creeley <[email protected]> Tested-by: Andrew Bowers <[email protected]> Signed-off-by: Jeff Kirsher <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11core: Don't skip generic XDP program execution for cloned SKBsToke Høiland-Jørgensen1-2/+2
The current generic XDP handler skips execution of XDP programs entirely if an SKB is marked as cloned. This leads to some surprising behaviour, as packets can end up being cloned in various ways, which will make an XDP program not see all the traffic on an interface. This was discovered by a simple test case where an XDP program that always returns XDP_DROP is installed on a veth device. When combining this with the Scapy packet sniffer (which uses an AF_PACKET) socket on the sending side, SKBs reliably end up in the cloned state, causing them to be passed through to the receiving interface instead of being dropped. A minimal reproducer script for this is included below. This patch fixed the issue by simply triggering the existing linearisation code for cloned SKBs instead of skipping the XDP program execution. This behaviour is in line with the behaviour of the native XDP implementation for the veth driver, which will reallocate and copy the SKB data if the SKB is marked as shared. Reproducer Python script (requires BCC and Scapy): from scapy.all import TCP, IP, Ether, sendp, sniff, AsyncSniffer, Raw, UDP from bcc import BPF import time, sys, subprocess, shlex SKB_MODE = (1 << 1) DRV_MODE = (1 << 2) PYTHON=sys.executable def client(): time.sleep(2) # Sniffing on the sender causes skb_cloned() to be set s = AsyncSniffer() s.start() for p in range(10): sendp(Ether(dst="aa:aa:aa:aa:aa:aa", src="cc:cc:cc:cc:cc:cc")/IP()/UDP()/Raw("Test"), verbose=False) time.sleep(0.1) s.stop() return 0 def server(mode): prog = BPF(text="int dummy_drop(struct xdp_md *ctx) {return XDP_DROP;}") func = prog.load_func("dummy_drop", BPF.XDP) prog.attach_xdp("a_to_b", func, mode) time.sleep(1) s = sniff(iface="a_to_b", count=10, timeout=15) if len(s): print(f"Got {len(s)} packets - should have gotten 0") return 1 else: print("Got no packets - as expected") return 0 if len(sys.argv) < 2: print(f"Usage: {sys.argv[0]} <skb|drv>") sys.exit(1) if sys.argv[1] == "client": sys.exit(client()) elif sys.argv[1] == "server": mode = SKB_MODE if sys.argv[2] == 'skb' else DRV_MODE sys.exit(server(mode)) else: try: mode = sys.argv[1] if mode not in ('skb', 'drv'): print(f"Usage: {sys.argv[0]} <skb|drv>") sys.exit(1) print(f"Running in {mode} mode") for cmd in [ 'ip netns add netns_a', 'ip netns add netns_b', 'ip -n netns_a link add a_to_b type veth peer name b_to_a netns netns_b', # Disable ipv6 to make sure there's no address autoconf traffic 'ip netns exec netns_a sysctl -qw net.ipv6.conf.a_to_b.disable_ipv6=1', 'ip netns exec netns_b sysctl -qw net.ipv6.conf.b_to_a.disable_ipv6=1', 'ip -n netns_a link set dev a_to_b address aa:aa:aa:aa:aa:aa', 'ip -n netns_b link set dev b_to_a address cc:cc:cc:cc:cc:cc', 'ip -n netns_a link set dev a_to_b up', 'ip -n netns_b link set dev b_to_a up']: subprocess.check_call(shlex.split(cmd)) server = subprocess.Popen(shlex.split(f"ip netns exec netns_a {PYTHON} {sys.argv[0]} server {mode}")) client = subprocess.Popen(shlex.split(f"ip netns exec netns_b {PYTHON} {sys.argv[0]} client")) client.wait() server.wait() sys.exit(server.returncode) finally: subprocess.run(shlex.split("ip netns delete netns_a")) subprocess.run(shlex.split("ip netns delete netns_b")) Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices") Reported-by: Stepan Horacek <[email protected]> Suggested-by: Paolo Abeni <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-02-11Merge tag 'dax-fixes-5.6-rc1' of ↵Linus Torvalds6-24/+12
git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull dax fixes from Dan Williams: "A fix for an xfstest failure and some and an update that removes an fsdax dependency on block devices. Summary: - Fix RWF_NOWAIT writes to properly return -EAGAIN - Clean up an unused helper - Update dax_writeback_mapping_range to not need a block_device argument" * tag 'dax-fixes-5.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: dax: pass NOWAIT flag to iomap_apply dax: Get rid of fs_dax_get_by_host() helper dax: Pass dax_dev instead of bdev to dax_writeback_mapping_range()