aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2020-09-16drm/i915: Be wary of data races when reading the active execlistsChris Wilson2-6/+34
To implement preempt-to-busy (and so efficient timeslicing and best utilization of the hardware submission ports) we let the GPU run asynchronously in respect to the ELSP submission queue. This created challenges in keeping and accessing the driver state mirroring the asynchronous GPU execution. The latest occurence of this was spotted by KCSAN: [ 1413.563200] BUG: KCSAN: data-race in __await_execution+0x217/0x370 [i915] [ 1413.563221] [ 1413.563236] race at unknown origin, with read to 0xffff88885bb6c478 of 8 bytes by task 9654 on cpu 1: [ 1413.563548] __await_execution+0x217/0x370 [i915] [ 1413.563891] i915_request_await_dma_fence+0x4eb/0x6a0 [i915] [ 1413.564235] i915_request_await_object+0x421/0x490 [i915] [ 1413.564577] i915_gem_do_execbuffer+0x29b7/0x3c40 [i915] [ 1413.564967] i915_gem_execbuffer2_ioctl+0x22f/0x5c0 [i915] [ 1413.564998] drm_ioctl_kernel+0x156/0x1b0 [ 1413.565022] drm_ioctl+0x2ff/0x480 [ 1413.565046] __x64_sys_ioctl+0x87/0xd0 [ 1413.565069] do_syscall_64+0x4d/0x80 [ 1413.565094] entry_SYSCALL_64_after_hwframe+0x44/0xa9 To complicate matters, we have to both avoid the read tearing of *active and avoid any write tearing as perform the pending[] -> inflight[] promotion of the execlists. This is because we cannot rely on the memcpy doing u64 aligned copies on all kernels/platforms and so we opt to open-code it with explicit WRITE_ONCE annotations to satisfy KCSAN. v2: When in doubt, write the same comment again. v3: Expanded commit message. Fixes: b55230e5e800 ("drm/i915: Check for awaits on still currently executing requests") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> [Joonas: Rebased and reordered into drm-intel-gt-next branch] [Joonas: Added expanded commit message from Tvrtko and Chris] Signed-off-by: Joonas Lahtinen <[email protected]> (cherry picked from commit b4d9145b0154f8c71dafc2db5fd445f1f3db9426) Signed-off-by: Jani Nikula <[email protected]>
2020-09-16drm/i915/gem: Reduce context termination list iteration guard to RCUChris Wilson1-13/+19
As we now protect the timeline list using RCU, we can drop the timeline->mutex for guarding the list iteration during context close, as we are searching for an inflight request. Any new request will see the context is banned and not be submitted. In doing so, pull the checks for a concurrent submission of the request (notably the i915_request_completed()) under the engine spinlock, to fully serialise with __i915_request_submit()). That is in the case of preempt-to-busy where the request may be completed during the __i915_request_submit(), we need to be careful that we sample the request status after serialising so that we don't miss the request the engine is actually submitting. Fixes: 4a3174152147 ("drm/i915/gem: Refine occupancy test in kill_context()") References: d22d2d073ef8 ("drm/i915: Protect i915_request_await_start from early waits") # rcu protection of timeline->requests References: https://gitlab.freedesktop.org/drm/intel/-/issues/1622 References: https://gitlab.freedesktop.org/drm/intel/-/issues/2158 Signed-off-by: Chris Wilson <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]> (cherry picked from commit 736e785f9b28cd9ef2d16a80960a04fd00e64b22) Signed-off-by: Jani Nikula <[email protected]>
2020-09-16drm/i915/gem: Delay tracking the GEM context until it is registeredChris Wilson1-5/+11
Avoid exposing a partially constructed context by deferring the list_add() from the initial construction to the end of registration. Otherwise, if we peek into the list of contexts from inside debugfs, we may see the partially constructed context and chase down some dangling incomplete pointers. Reported-by: CQ Tang <[email protected]> Fixes: 3aa9945a528e ("drm/i915: Separate GEM context construction and registration to userspace") References: f6e8aa387171 ("drm/i915: Report the number of closed vma held by each context in debugfs") Signed-off-by: Chris Wilson <[email protected]> Cc: Tvrtko Ursulin <[email protected]> Cc: CQ Tang <[email protected]> Cc: <[email protected]> # v5.2+ Reviewed-by: Mika Kuoppala <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] Signed-off-by: Rodrigo Vivi <[email protected]> Signed-off-by: Joonas Lahtinen <[email protected]> (cherry picked from commit eb4dedae920a07c485328af3da2202ec5184fb17) Signed-off-by: Jani Nikula <[email protected]>
2020-09-15Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfDavid S. Miller10-38/+47
Alexei Starovoitov says: ==================== pull-request: bpf 2020-09-15 The following pull-request contains BPF updates for your *net* tree. We've added 12 non-merge commits during the last 19 day(s) which contain a total of 10 files changed, 47 insertions(+), 38 deletions(-). The main changes are: 1) docs/bpf fixes, from Andrii. 2) ld_abs fix, from Daniel. 3) socket casting helpers fix, from Martin. 4) hash iterator fixes, from Yonghong. ==================== Signed-off-by: David S. Miller <[email protected]>
2020-09-15bpf: Fix a rcu warning for bpffs map pretty-printYonghong Song1-1/+3
Running selftest ./btf_btf -p the kernel had the following warning: [ 51.528185] WARNING: CPU: 3 PID: 1756 at kernel/bpf/hashtab.c:717 htab_map_get_next_key+0x2eb/0x300 [ 51.529217] Modules linked in: [ 51.529583] CPU: 3 PID: 1756 Comm: test_btf Not tainted 5.9.0-rc1+ #878 [ 51.530346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.9.3-1.el7.centos 04/01/2014 [ 51.531410] RIP: 0010:htab_map_get_next_key+0x2eb/0x300 ... [ 51.542826] Call Trace: [ 51.543119] map_seq_next+0x53/0x80 [ 51.543528] seq_read+0x263/0x400 [ 51.543932] vfs_read+0xad/0x1c0 [ 51.544311] ksys_read+0x5f/0xe0 [ 51.544689] do_syscall_64+0x33/0x40 [ 51.545116] entry_SYSCALL_64_after_hwframe+0x44/0xa9 The related source code in kernel/bpf/hashtab.c: 709 static int htab_map_get_next_key(struct bpf_map *map, void *key, void *next_key) 710 { 711 struct bpf_htab *htab = container_of(map, struct bpf_htab, map); 712 struct hlist_nulls_head *head; 713 struct htab_elem *l, *next_l; 714 u32 hash, key_size; 715 int i = 0; 716 717 WARN_ON_ONCE(!rcu_read_lock_held()); In kernel/bpf/inode.c, bpffs map pretty print calls map->ops->map_get_next_key() without holding a rcu_read_lock(), hence causing the above warning. To fix the issue, just surrounding map->ops->map_get_next_key() with rcu read lock. Fixes: a26ca7c982cb ("bpf: btf: Add pretty print support to the basic arraymap") Reported-by: Alexei Starovoitov <[email protected]> Signed-off-by: Yonghong Song <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Andrii Nakryiko <[email protected]> Cc: Martin KaFai Lau <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-15bpf: Bpf_skc_to_* casting helpers require a NULL check on skMartin KaFai Lau1-7/+7
The bpf_skc_to_* type casting helpers are available to BPF_PROG_TYPE_TRACING. The traced PTR_TO_BTF_ID may be NULL. For example, the skb->sk may be NULL. Thus, these casting helpers need to check "!sk" also and this patch fixes them. Fixes: 0d4fad3e57df ("bpf: Add bpf_skc_to_udp6_sock() helper") Fixes: 478cfbdf5f13 ("bpf: Add bpf_skc_to_{tcp, tcp_timewait, tcp_request}_sock() helpers") Fixes: af7ec1383361 ("bpf: Add bpf_skc_to_tcp6_sock() helper") Signed-off-by: Martin KaFai Lau <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Acked-by: Yonghong Song <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-15scsi: sd: sd_zbc: Fix ZBC disk initializationDamien Le Moal3-35/+35
Make sure to call sd_zbc_init_disk() when the sdkp->zoned field is known, that is, once sd_read_block_characteristics() is executed in sd_revalidate_disk(), so that host-aware disks also get initialized. To do so, move sd_zbc_init_disk() call in sd_zbc_revalidate_zones() and make sure to execute it for all zoned disks, including for host-aware disks used as regular disks as these disk zoned model may be changed back to BLK_ZONED_HA when partitions are deleted. Link: https://lore.kernel.org/r/[email protected] Fixes: 5795eb443060 ("scsi: sd_zbc: emulate ZONE_APPEND commands") Cc: <[email protected]> # v5.8+ Reported-by: Borislav Petkov <[email protected]> Tested-by: Borislav Petkov <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2020-09-15scsi: sd: sd_zbc: Fix handling of host-aware ZBC disksDamien Le Moal5-14/+72
When CONFIG_BLK_DEV_ZONED is disabled, allow using host-aware ZBC disks as regular disks. In this case, ensure that command completion is correctly executed by changing sd_zbc_complete() to return good_bytes instead of 0 and causing a hang during device probe (endless retries). When CONFIG_BLK_DEV_ZONED is enabled and a host-aware disk is detected to have partitions, it will be used as a regular disk. In this case, make sure to not do anything in sd_zbc_revalidate_zones() as that triggers warnings. Since all these different cases result in subtle settings of the disk queue zoned model, introduce the block layer helper function blk_queue_set_zoned() to generically implement setting up the effective zoned model according to the disk type, the presence of partitions on the disk and CONFIG_BLK_DEV_ZONED configuration. Link: https://lore.kernel.org/r/[email protected] Fixes: b72053072c0b ("block: allow partitions on host aware zone devices") Cc: <[email protected]> Reported-by: Borislav Petkov <[email protected]> Suggested-by: Christoph Hellwig <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Damien Le Moal <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2020-09-15Merge tag 'scsi-fixes' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "Just one fix in libsas for a resource leak in an error path" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: libsas: Fix error path in sas_notify_lldd_dev_found()
2020-09-15Merge tag 'fixes-v5.9a' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security Pull security layer fix from James Morris: "A device_cgroup RCU warning fix from Amol Grover" * tag 'fixes-v5.9a' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: device_cgroup: Fix RCU list debugging warning
2020-09-15Merge tag 'hyperv-fixes-signed' of ↵Linus Torvalds2-4/+12
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv fixes from Wei Liu: "Two patches from Michael and Dexuan to fix vmbus hanging issues" * tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: Drivers: hv: vmbus: Add timeout to vmbus_wait_for_unload Drivers: hv: vmbus: hibernation: do not hang forever in vmbus_bus_resume()
2020-09-15scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supportedJames Smart1-24/+52
Initial FLOGIs are failing with the following message: lpfc 0000:13:00.1: 1:(0):0820 FLOGI Failed (x300). BBCredit Not Supported In a prior patch, the READ_SPARAM command was re-ordered to post after CONFIG_LINK as the driver is expected to update the driver's copy of the service parameters for the FLOGI payload. If the bb-credit recovery feature is enabled, this is fine. But on adapters were bb-credit recovery isn't enabled, it would cause the FLOGI to fail. Fix by restoring the original command order (READ_SPARAM before CONFIG_LINK), and after issuing CONFIG_LINK, detect bb-credit recovery support and reissuing READ_SPARAM to obtain the updated service parameters (effectively adding in the fix command order). [mkp: corrected SHA] Link: https://lore.kernel.org/r/[email protected] Fixes: 835214f5d5f5 ("scsi: lpfc: Fix broken Credit Recovery after driver load") CC: <[email protected]> # v5.7+ Co-developed-by: Dick Kennedy <[email protected]> Signed-off-by: Dick Kennedy <[email protected]> Signed-off-by: James Smart <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2020-09-15ipv4: Update exception handling for multipath routes via same deviceDavid Ahern1-5/+8
Kfir reported that pmtu exceptions are not created properly for deployments where multipath routes use the same device. After some digging I see 2 compounding problems: 1. ip_route_output_key_hash_rcu is updating the flowi4_oif *after* the route lookup. This is the second use case where this has been a problem (the first is related to use of vti devices with VRF). I can not find any reason for the oif to be changed after the lookup; the code goes back to the start of git. It does not seem logical so remove it. 2. fib_lookups for exceptions do not call fib_select_path to handle multipath route selection based on the hash. The end result is that the fib_lookup used to add the exception always creates it based using the first leg of the route. An example topology showing the problem: | host1 +------+ | eth0 | .209 +------+ | +------+ switch | br0 | +------+ | +---------+---------+ | host2 | host3 +------+ +------+ | eth0 | .250 | eth0 | 192.168.252.252 +------+ +------+ +-----+ +-----+ | vti | .2 | vti | 192.168.247.3 +-----+ +-----+ \ / ================================= tunnels 192.168.247.1/24 for h in host1 host2 host3; do ip netns add ${h} ip -netns ${h} link set lo up ip netns exec ${h} sysctl -wq net.ipv4.ip_forward=1 done ip netns add switch ip -netns switch li set lo up ip -netns switch link add br0 type bridge stp 0 ip -netns switch link set br0 up for n in 1 2 3; do ip -netns switch link add eth-sw type veth peer name eth-h${n} ip -netns switch li set eth-h${n} master br0 up ip -netns switch li set eth-sw netns host${n} name eth0 done ip -netns host1 addr add 192.168.252.209/24 dev eth0 ip -netns host1 link set dev eth0 up ip -netns host1 route add 192.168.247.0/24 \ nexthop via 192.168.252.250 dev eth0 nexthop via 192.168.252.252 dev eth0 ip -netns host2 addr add 192.168.252.250/24 dev eth0 ip -netns host2 link set dev eth0 up ip -netns host2 addr add 192.168.252.252/24 dev eth0 ip -netns host3 link set dev eth0 up ip netns add tunnel ip -netns tunnel li set lo up ip -netns tunnel li add br0 type bridge ip -netns tunnel li set br0 up for n in $(seq 11 20); do ip -netns tunnel addr add dev br0 192.168.247.${n}/24 done for n in 2 3 do ip -netns tunnel link add vti${n} type veth peer name eth${n} ip -netns tunnel link set eth${n} mtu 1360 master br0 up ip -netns tunnel link set vti${n} netns host${n} mtu 1360 up ip -netns host${n} addr add dev vti${n} 192.168.247.${n}/24 done ip -netns tunnel ro add default nexthop via 192.168.247.2 nexthop via 192.168.247.3 ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.11 ip netns exec host1 ping -M do -s 1400 -c3 -I 192.168.252.209 192.168.247.15 ip -netns host1 ro ls cache Before this patch the cache always shows exceptions against the first leg in the multipath route; 192.168.252.250 per this example. Since the hash has an initial random seed, you may need to vary the final octet more than what is listed. In my tests, using addresses between 11 and 19 usually found 1 that used both legs. With this patch, the cache will have exceptions for both legs. Fixes: 4895c771c7f0 ("ipv4: Add FIB nexthop exceptions") Reported-by: Kfir Itzhak <[email protected]> Signed-off-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-15drm/amdgpu/dc: Require primary plane to be enabled whenever the CRTC isMichel Dänzer1-22/+10
Don't check drm_crtc_state::active for this either, per its documentation in include/drm/drm_crtc.h: * Hence drivers must not consult @active in their various * &drm_mode_config_funcs.atomic_check callback to reject an atomic * commit. atomic_remove_fb disables the CRTC as needed for disabling the primary plane. This prevents at least the following problems if the primary plane gets disabled (e.g. due to destroying the FB assigned to the primary plane, as happens e.g. with mutter in Wayland mode): * The legacy cursor ioctl returned EINVAL for a non-0 cursor FB ID (which enables the cursor plane). * If the cursor plane was enabled, changing the legacy DPMS property value from off to on returned EINVAL. v2: * Minor changes to code comment and commit log, per review feedback. GitLab: https://gitlab.gnome.org/GNOME/mutter/-/issues/1108 GitLab: https://gitlab.gnome.org/GNOME/mutter/-/issues/1165 GitLab: https://gitlab.gnome.org/GNOME/mutter/-/issues/1344 Suggested-by: Daniel Vetter <[email protected]> Acked-by: Daniel Vetter <[email protected]> Reviewed-by: Nicholas Kazlauskas <[email protected]> Signed-off-by: Michel Dänzer <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-09-15drm/radeon: revert "Prefer lower feedback dividers"Christian König1-1/+1
Turns out this breaks a lot of different hardware. This reverts commit fc8c70526bd30733ea8667adb8b8ffebea30a8ed. Signed-off-by: Christian König <[email protected]> Acked-by: Nirmoy Das <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-09-15drm/amdgpu: Include sienna_cichlid in USBC PD FW support.Andrey Grodzovsky1-1/+1
Create sysfs interface also for sienna_cichlid. Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Andrey Grodzovsky <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-09-15drm/amd/display: update nv1x stutter latenciesJun Lei1-2/+2
[why] Recent characterization shows increased stutter latencies on some SKUs, leading to underflow. [how] Update SOC params to account for this worst case latency. Signed-off-by: Jun Lei <[email protected]> Acked-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-09-15drm/amd/display: Don't use DRM_ERROR() for DTM add topologyBhawanpreet Lakha1-1/+1
[Why] Previously we were only calling add_topology when hdcp was being enabled. Now we call add_topology by default so the ERROR messages are printed if the firmware is not loaded. This error message is not relevant for normal display functionality so no need to print a ERROR message. [How] Change DRM_ERROR to DRM_INFO Signed-off-by: Bhawanpreet Lakha <[email protected]> Acked-by: Aurabindo Pillai <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-09-15drm/amd/pm: support runtime pptable update for sienna_cichlid etc.Jiansong Chen1-3/+9
This avoids smu issue when enabling runtime pptable update for sienna_cichlid and so on. Runtime pptable udpate is needed for test and debug purpose. Signed-off-by: Jiansong Chen <[email protected]> Reviewed-by: Kenneth Feng <[email protected]> Reviewed-by: Evan Quan <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-09-15drm/amdkfd: fix a memory leak issueDennis Li1-0/+2
In the resume stage of GPU recovery, start_cpsch will call pm_init which set pm->allocated as false, cause the next pm_release_ib has no chance to release ib memory. Add pm_release_ib in stop_cpsch which will be called in the suspend stage of GPU recovery. Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2020-09-15drm/kfd: fix a system crash issue during GPU recoveryDennis Li1-1/+1
The crash log as the below: [Thu Aug 20 23:18:14 2020] general protection fault: 0000 [#1] SMP NOPTI [Thu Aug 20 23:18:14 2020] CPU: 152 PID: 1837 Comm: kworker/152:1 Tainted: G OE 5.4.0-42-generic #46~18.04.1-Ubuntu [Thu Aug 20 23:18:14 2020] Hardware name: GIGABYTE G482-Z53-YF/MZ52-G40-00, BIOS R12 05/13/2020 [Thu Aug 20 23:18:14 2020] Workqueue: events amdgpu_ras_do_recovery [amdgpu] [Thu Aug 20 23:18:14 2020] RIP: 0010:evict_process_queues_cpsch+0xc9/0x130 [amdgpu] [Thu Aug 20 23:18:14 2020] Code: 49 8d 4d 10 48 39 c8 75 21 eb 44 83 fa 03 74 36 80 78 72 00 74 0c 83 ab 68 01 00 00 01 41 c6 45 41 00 48 8b 00 48 39 c8 74 25 <80> 78 70 00 c6 40 6d 01 74 ee 8b 50 28 c6 40 70 00 83 ab 60 01 00 [Thu Aug 20 23:18:14 2020] RSP: 0018:ffffb29b52f6fc90 EFLAGS: 00010213 [Thu Aug 20 23:18:14 2020] RAX: 1c884edb0a118914 RBX: ffff8a0d45ff3c00 RCX: ffff8a2d83e41038 [Thu Aug 20 23:18:14 2020] RDX: 0000000000000000 RSI: 0000000000000082 RDI: ffff8a0e2e4178c0 [Thu Aug 20 23:18:14 2020] RBP: ffffb29b52f6fcb0 R08: 0000000000001b64 R09: 0000000000000004 [Thu Aug 20 23:18:14 2020] R10: ffffb29b52f6fb78 R11: 0000000000000001 R12: ffff8a0d45ff3d28 [Thu Aug 20 23:18:14 2020] R13: ffff8a2d83e41028 R14: 0000000000000000 R15: 0000000000000000 [Thu Aug 20 23:18:14 2020] FS: 0000000000000000(0000) GS:ffff8a0e2e400000(0000) knlGS:0000000000000000 [Thu Aug 20 23:18:14 2020] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [Thu Aug 20 23:18:14 2020] CR2: 000055c783c0e6a8 CR3: 00000034a1284000 CR4: 0000000000340ee0 [Thu Aug 20 23:18:14 2020] Call Trace: [Thu Aug 20 23:18:14 2020] kfd_process_evict_queues+0x43/0xd0 [amdgpu] [Thu Aug 20 23:18:14 2020] kfd_suspend_all_processes+0x60/0xf0 [amdgpu] [Thu Aug 20 23:18:14 2020] kgd2kfd_suspend.part.7+0x43/0x50 [amdgpu] [Thu Aug 20 23:18:14 2020] kgd2kfd_pre_reset+0x46/0x60 [amdgpu] [Thu Aug 20 23:18:14 2020] amdgpu_amdkfd_pre_reset+0x1a/0x20 [amdgpu] [Thu Aug 20 23:18:14 2020] amdgpu_device_gpu_recover+0x377/0xf90 [amdgpu] [Thu Aug 20 23:18:14 2020] ? amdgpu_ras_error_query+0x1b8/0x2a0 [amdgpu] [Thu Aug 20 23:18:14 2020] amdgpu_ras_do_recovery+0x159/0x190 [amdgpu] [Thu Aug 20 23:18:14 2020] process_one_work+0x20f/0x400 [Thu Aug 20 23:18:14 2020] worker_thread+0x34/0x410 When GPU hang, user process will fail to create a compute queue whose struct object will be freed later, but driver wrongly add this queue to queue list of the proccess. And then kfd_process_evict_queues will access a freed memory, which cause a system crash. v2: The failure to execute_queues should probably not be reported to the caller of create_queue, because the queue was already created. Therefore change to ignore the return value from execute_queues. Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Dennis Li <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2020-09-15net: tipc: kerneldoc fixesLu Wei1-1/+2
Fix parameter description of tipc_link_bc_create() Reported-by: Hulk Robot <[email protected]> Fixes: 16ad3f4022bb ("tipc: introduce variable window congestion control") Signed-off-by: Lu Wei <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-15ibmvnic: update MAINTAINERSDany Madden1-1/+3
Update supporters for IBM Power SRIOV Virtual NIC Device Driver. Thomas Falcon is moving on to other works. Dany Madden, Lijun Pan and Sukadev Bhattiprolu are the current supporters. Signed-off-by: Dany Madden <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-15spi: bcm2835: Make polling_limit_us staticJason Yan1-1/+1
This eliminates the following sparse warning: drivers/spi/spi-bcm2835.c:78:14: warning: symbol 'polling_limit_us' was not declared. Should it be static? Reported-by: Hulk Robot <[email protected]> Signed-off-by: Jason Yan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2020-09-15efi: efibc: check for efivars write capabilityArd Biesheuvel1-1/+1
Branden reports that commit f88814cc2578c1 ("efi/efivars: Expose RT service availability via efivars abstraction") regresses UEFI platforms that implement GetVariable but not SetVariable when booting kernels that have EFIBC (bootloader control) enabled. The reason is that EFIBC is a user of the efivars abstraction, which was updated to permit users that rely only on the read capability, but not on the write capability. EFIBC is in the latter category, so it has to check explicitly whether efivars supports writes. Fixes: f88814cc2578c1 ("efi/efivars: Expose RT service availability via efivars abstraction") Tested-by: Branden Sherrell <[email protected]> Link: https://lore.kernel.org/linux-efi/[email protected]/ Signed-off-by: Ard Biesheuvel <[email protected]>
2020-09-15perf test: Free formats for perf pmu parse testNamhyung Kim3-0/+13
The following leaks were detected by ASAN: Indirect leak of 360 byte(s) in 9 object(s) allocated from: #0 0x7fecc305180e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) #1 0x560578f6dce5 in perf_pmu__new_format util/pmu.c:1333 #2 0x560578f752fc in perf_pmu_parse util/pmu.y:59 #3 0x560578f6a8b7 in perf_pmu__format_parse util/pmu.c:73 #4 0x560578e07045 in test__pmu tests/pmu.c:155 #5 0x560578de109b in run_test tests/builtin-test.c:410 #6 0x560578de109b in test_and_print tests/builtin-test.c:440 #7 0x560578de401a in __cmd_test tests/builtin-test.c:661 #8 0x560578de401a in cmd_test tests/builtin-test.c:807 #9 0x560578e49354 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #10 0x560578ce71a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #11 0x560578ce71a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #12 0x560578ce71a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #13 0x7fecc2b7acc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: cff7f956ec4a1 ("perf tests: Move pmu tests into separate object") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-09-15perf metric: Do not free metric when failed to resolveNamhyung Kim1-3/+6
It's dangerous to free the original metric when it's called from resolve_metric() as it's already in the metric_list and might have other resources too. Instead, it'd better let them bail out and be released properly at the later stage. So add a check when it's called from metricgroup__add_metric() and release it. Also make sure that mp is set properly. Fixes: 83de0b7d535de ("perf metric: Collect referenced metrics in struct metric_ref_node") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-09-15perf metric: Free metric when it failed to resolveNamhyung Kim1-5/+12
The metricgroup__add_metric() can find multiple match for a metric group and it's possible to fail. Also it can fail in the middle like in resolve_metric() even for single metric. In those cases, the intermediate list and ids will be leaked like: Direct leak of 3 byte(s) in 1 object(s) allocated from: #0 0x7f4c938f40b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5) #1 0x55f7e71c1bef in __add_metric util/metricgroup.c:683 #2 0x55f7e71c31d0 in add_metric util/metricgroup.c:906 #3 0x55f7e71c3844 in metricgroup__add_metric util/metricgroup.c:940 #4 0x55f7e71c488d in metricgroup__add_metric_list util/metricgroup.c:993 #5 0x55f7e71c488d in parse_groups util/metricgroup.c:1045 #6 0x55f7e71c60a4 in metricgroup__parse_groups_test util/metricgroup.c:1087 #7 0x55f7e71235ae in __compute_metric tests/parse-metric.c:164 #8 0x55f7e7124650 in compute_metric tests/parse-metric.c:196 #9 0x55f7e7124650 in test_recursion_fail tests/parse-metric.c:318 #10 0x55f7e7124650 in test__parse_metric tests/parse-metric.c:356 #11 0x55f7e70be09b in run_test tests/builtin-test.c:410 #12 0x55f7e70be09b in test_and_print tests/builtin-test.c:440 #13 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661 #14 0x55f7e70c101a in cmd_test tests/builtin-test.c:807 #15 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #16 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #17 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #18 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #19 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: 83de0b7d535de ("perf metric: Collect referenced metrics in struct metric_ref_node") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-09-15perf metric: Release expr_parse_ctx after testingNamhyung Kim1-3/+5
The test_generic_metric() missed to release entries in the pctx. Asan reported following leak (and more): Direct leak of 128 byte(s) in 1 object(s) allocated from: #0 0x7f4c9396980e in calloc (/lib/x86_64-linux-gnu/libasan.so.5+0x10780e) #1 0x55f7e748cc14 in hashmap_grow (/home/namhyung/project/linux/tools/perf/perf+0x90cc14) #2 0x55f7e748d497 in hashmap__insert (/home/namhyung/project/linux/tools/perf/perf+0x90d497) #3 0x55f7e7341667 in hashmap__set /home/namhyung/project/linux/tools/perf/util/hashmap.h:111 #4 0x55f7e7341667 in expr__add_ref util/expr.c:120 #5 0x55f7e7292436 in prepare_metric util/stat-shadow.c:783 #6 0x55f7e729556d in test_generic_metric util/stat-shadow.c:858 #7 0x55f7e712390b in compute_single tests/parse-metric.c:128 #8 0x55f7e712390b in __compute_metric tests/parse-metric.c:180 #9 0x55f7e712446d in compute_metric tests/parse-metric.c:196 #10 0x55f7e712446d in test_dcache_l2 tests/parse-metric.c:295 #11 0x55f7e712446d in test__parse_metric tests/parse-metric.c:355 #12 0x55f7e70be09b in run_test tests/builtin-test.c:410 #13 0x55f7e70be09b in test_and_print tests/builtin-test.c:440 #14 0x55f7e70c101a in __cmd_test tests/builtin-test.c:661 #15 0x55f7e70c101a in cmd_test tests/builtin-test.c:807 #16 0x55f7e7126214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #17 0x55f7e6fc41a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #18 0x55f7e6fc41a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #19 0x55f7e6fc41a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #20 0x7f4c93492cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: 6d432c4c8aa56 ("perf tools: Add test_generic_metric function") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-09-15perf test: Fix memory leaks in parse-metric testNamhyung Kim1-5/+9
It didn't release resources when there's an error so the test_recursion_fail() will leak some memory. Fixes: 0a507af9c681a ("perf tests: Add parse metric test for ipc metric") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-09-15perf parse-event: Fix memory leak in evsel->unitNamhyung Kim1-1/+1
The evsel->unit borrows a pointer of pmu event or alias instead of owns a string. But tool event (duration_time) passes a result of strdup() caused a leak. It was found by ASAN during metric test: Direct leak of 210 byte(s) in 70 object(s) allocated from: #0 0x7fe366fca0b5 in strdup (/lib/x86_64-linux-gnu/libasan.so.5+0x920b5) #1 0x559fbbcc6ea3 in add_event_tool util/parse-events.c:414 #2 0x559fbbcc6ea3 in parse_events_add_tool util/parse-events.c:1414 #3 0x559fbbd8474d in parse_events_parse util/parse-events.y:439 #4 0x559fbbcc95da in parse_events__scanner util/parse-events.c:2096 #5 0x559fbbcc95da in __parse_events util/parse-events.c:2141 #6 0x559fbbc28555 in check_parse_id tests/pmu-events.c:406 #7 0x559fbbc28555 in check_parse_id tests/pmu-events.c:393 #8 0x559fbbc28555 in check_parse_cpu tests/pmu-events.c:415 #9 0x559fbbc28555 in test_parsing tests/pmu-events.c:498 #10 0x559fbbc0109b in run_test tests/builtin-test.c:410 #11 0x559fbbc0109b in test_and_print tests/builtin-test.c:440 #12 0x559fbbc03e69 in __cmd_test tests/builtin-test.c:695 #13 0x559fbbc03e69 in cmd_test tests/builtin-test.c:807 #14 0x559fbbc691f4 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #15 0x559fbbb071a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #16 0x559fbbb071a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #17 0x559fbbb071a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #18 0x7fe366b68cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: f0fbb114e3025 ("perf stat: Implement duration_time as a proper event") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-09-15perf evlist: Fix cpu/thread map leakNamhyung Kim1-3/+8
Asan reported leak of cpu and thread maps as they have one more refcount than released. I found that after setting evlist maps it should release it's refcount. It seems to be broken from the beginning so I chose the original commit as the culprit. But not sure how it's applied to stable trees since there are many changes in the code after that. Fixes: 7e2ed097538c5 ("perf evlist: Store pointer to the cpu and thread maps") Fixes: 4112eb1899c0e ("perf evlist: Default to syswide target when no thread/cpu maps set") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-09-15perf metric: Fix some memory leaks - part 2Namhyung Kim1-0/+2
The metric_event_delete() missed to free expr->metric_events and it should free an expr when metric_refs allocation failed. Fixes: 4ea2896715e67 ("perf metric: Collect referenced metrics in struct metric_expr") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-09-15perf metric: Fix some memory leaksNamhyung Kim1-2/+5
I found some memory leaks while reading the metric code. Some are real and others only occur in the error path. When it failed during metric or event parsing, it should release all resources properly. Fixes: b18f3e365019d ("perf stat: Support JSON metrics in perf stat") Signed-off-by: Namhyung Kim <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: John Garry <[email protected]> Cc: Kajol Jain <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-09-15perf test: Free aliases for PMU event map aliases testNamhyung Kim3-1/+7
The aliases were never released causing the following leaks: Indirect leak of 1224 byte(s) in 9 object(s) allocated from: #0 0x7feefb830628 in malloc (/lib/x86_64-linux-gnu/libasan.so.5+0x107628) #1 0x56332c8f1b62 in __perf_pmu__new_alias util/pmu.c:322 #2 0x56332c8f401f in pmu_add_cpu_aliases_map util/pmu.c:778 #3 0x56332c792ce9 in __test__pmu_event_aliases tests/pmu-events.c:295 #4 0x56332c792ce9 in test_aliases tests/pmu-events.c:367 #5 0x56332c76a09b in run_test tests/builtin-test.c:410 #6 0x56332c76a09b in test_and_print tests/builtin-test.c:440 #7 0x56332c76ce69 in __cmd_test tests/builtin-test.c:695 #8 0x56332c76ce69 in cmd_test tests/builtin-test.c:807 #9 0x56332c7d2214 in run_builtin /home/namhyung/project/linux/tools/perf/perf.c:312 #10 0x56332c6701a8 in handle_internal_command /home/namhyung/project/linux/tools/perf/perf.c:364 #11 0x56332c6701a8 in run_argv /home/namhyung/project/linux/tools/perf/perf.c:408 #12 0x56332c6701a8 in main /home/namhyung/project/linux/tools/perf/perf.c:538 #13 0x7feefb359cc9 in __libc_start_main ../csu/libc-start.c:308 Fixes: 956a78356c24c ("perf test: Test pmu-events aliases") Signed-off-by: Namhyung Kim <[email protected]> Reviewed-by: John Garry <[email protected]> Acked-by: Jiri Olsa <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Ian Rogers <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Stephane Eranian <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-09-15perf vendor events amd: Remove trailing commasHenry Burns2-2/+2
The amdzen2/core.json and amdzen/core.json vendor events files have the occasional trailing comma. Since that goes against the JSON standard, lets remove it. Signed-off-by: Henry Burns <[email protected]> Acked-by: Kim Phillips <[email protected]> Acked-by: Namhyung Kim <[email protected]> Cc: Alexander Shishkin <[email protected]> Cc: Andi Kleen <[email protected]> Cc: Jiri Olsa <[email protected]> Cc: Mark Rutland <[email protected]> Cc: Peter Zijlstra <[email protected]> Cc: Vijay Thakkar <[email protected]> Link: http://lore.kernel.org/lkml/[email protected] Signed-off-by: Arnaldo Carvalho de Melo <[email protected]>
2020-09-15Merge tag 'thunderbolt-for-v5.9-rc6' of ↵Greg Kroah-Hartman1-4/+16
git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-linus Mika writes: thunderbolt: Fix for v5.9-rc6 One more fix that makes ASUS PA27AC Thunderbolt 3 monitor work more reliably. This has been in linux-next with no reported issues. * tag 'thunderbolt-for-v5.9-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: thunderbolt: Retry DROM read once if parsing fails
2020-09-15MIPS: SNI: Fix MIPS_L1_CACHE_SHIFTThomas Bogendoerfer1-0/+1
Commit 930beb5ac09a ("MIPS: introduce MIPS_L1_CACHE_SHIFT_<N>") forgot to select the correct MIPS_L1_CACHE_SHIFT for SNI RM. This breaks non coherent DMA because of a wrong allocation alignment. Fixes: 930beb5ac09a ("MIPS: introduce MIPS_L1_CACHE_SHIFT_<N>") Signed-off-by: Thomas Bogendoerfer <[email protected]>
2020-09-15batman-adv: mcast: fix duplicate mcast packets from BLA backbone to meshLinus Lüssing1-16/+87
Scenario: * Multicast frame send from BLA backbone gateways (multiple nodes with their bat0 bridged together, with BLA enabled) sharing the same LAN to nodes in the mesh Issue: * Nodes receive the frame multiple times on bat0 from the mesh, once from each foreign BLA backbone gateway which shares the same LAN with another For multicast frames via batman-adv broadcast packets coming from the same BLA backbone but from different backbone gateways duplicates are currently detected via a CRC history of previously received packets. However this CRC so far was not performed for multicast frames received via batman-adv unicast packets. Fixing this by appyling the same check for such packets, too. Room for improvements in the future: Ideally we would introduce the possibility to not only claim a client, but a complete originator, too. This would allow us to only send a multicast-in-unicast packet from a BLA backbone gateway claiming the node and by that avoid potential redundant transmissions in the first place. Fixes: 279e89b2281a ("batman-adv: add broadcast duplicate check") Signed-off-by: Linus Lüssing <[email protected]> Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2020-09-15batman-adv: mcast: fix duplicate mcast packets in BLA backbone from meshLinus Lüssing3-14/+30
Scenario: * Multicast frame send from mesh to a BLA backbone (multiple nodes with their bat0 bridged together, with BLA enabled) Issue: * BLA backbone nodes receive the frame multiple times on bat0, once from mesh->bat0 and once from each backbone_gw from LAN For unicast, a node will send only to the best backbone gateway according to the TQ. However for multicast we currently cannot determine if multiple destination nodes share the same backbone if they don't share the same backbone with us. So we need to keep sending the unicasts to all backbone gateways and let the backbone gateways decide which one will forward the frame. We can use the CLAIM mechanism to make this decision. One catch: The batman-adv gateway feature for DHCP packets potentially sends multicast packets in the same batman-adv unicast header as the multicast optimizations code. And we are not allowed to drop those even if we did not claim the source address of the sender, as for such packets there is only this one multicast-in-unicast packet. How can we distinguish the two cases? The gateway feature uses a batman-adv unicast 4 address header. While the multicast-to-unicasts feature uses a simple, 3 address batman-adv unicast header. So let's use this to distinguish. Fixes: fe2da6ff27c7 ("batman-adv: check incoming packet type for bla") Signed-off-by: Linus Lüssing <[email protected]> Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2020-09-15batman-adv: mcast: fix duplicate mcast packets in BLA backbone from LANLinus Lüssing3-13/+53
Scenario: * Multicast frame send from a BLA backbone (multiple nodes with their bat0 bridged together, with BLA enabled) Issue: * BLA backbone nodes receive the frame multiple times on bat0 For multicast frames received via batman-adv broadcast packets the originator of the broadcast packet is checked before decapsulating and forwarding the frame to bat0 (batadv_bla_is_backbone_gw()-> batadv_recv_bcast_packet()). If it came from a node which shares the same BLA backbone with us then it is not forwarded to bat0 to avoid a loop. When sending a multicast frame in a non-4-address batman-adv unicast packet we are currently missing this check - and cannot do so because the batman-adv unicast packet has no originator address field. However, we can simply fix this on the sender side by only sending the multicast frame via unicasts to interested nodes which do not share the same BLA backbone with us. This also nicely avoids some unnecessary transmissions on mesh side. Note that no infinite loop was observed, probably because of dropping via batadv_interface_tx()->batadv_bla_tx(). However the duplicates still utterly confuse switches/bridges, ICMPv6 duplicate address detection and neighbor discovery and therefore leads to long delays before being able to establish TCP connections, for instance. And it also leads to the Linux bridge printing messages like: "br-lan: received packet on eth1 with own address as source address ..." Fixes: 2d3f6ccc4ea5 ("batman-adv: Modified forwarding behaviour for multicast packets") Signed-off-by: Linus Lüssing <[email protected]> Signed-off-by: Sven Eckelmann <[email protected]> Signed-off-by: Simon Wunderlich <[email protected]>
2020-09-15EDAC/ghes: Check whether the driver is on the safe list correctlyBorislav Petkov1-0/+4
With CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, a system would try to probe, unregister and probe again a driver. When ghes_edac is attempted to be loaded on a system which is not on the safe platforms list, ghes_edac_register() would return early. The unregister counterpart ghes_edac_unregister() would still attempt to unregister and exit early at the refcount test, leading to the refcount underflow below. In order to not do *anything* on the unregister path too, reuse the force_load parameter and check it on that path too, before fumbling with the refcount. ghes_edac: ghes_edac_register: entry ghes_edac: ghes_edac_register: return -ENODEV ------------[ cut here ]------------ refcount_t: underflow; use-after-free. WARNING: CPU: 10 PID: 1 at lib/refcount.c:28 refcount_warn_saturate+0xb9/0x100 Modules linked in: CPU: 10 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #12 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 RIP: 0010:refcount_warn_saturate+0xb9/0x100 Code: 82 e8 fb 8f 4d 00 90 0f 0b 90 90 c3 80 3d 55 4c f5 00 00 75 88 c6 05 4c 4c f5 00 01 90 48 c7 c7 d0 8a 10 82 e8 d8 8f 4d 00 90 <0f> 0b 90 90 c3 80 3d 30 4c f5 00 00 0f 85 61 ff ff ff c6 05 23 4c RSP: 0018:ffffc90000037d58 EFLAGS: 00010292 RAX: 0000000000000026 RBX: ffff88840b8da000 RCX: 0000000000000000 RDX: 0000000000000001 RSI: ffffffff8216b24f RDI: 00000000ffffffff RBP: ffff88840c662e00 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000046 R12: 0000000000000000 R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 FS: 0000000000000000(0000) GS:ffff88840ee80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000800002211000 CR4: 00000000003506e0 Call Trace: ghes_edac_unregister ghes_remove platform_drv_remove really_probe driver_probe_device device_driver_attach __driver_attach ? device_driver_attach ? device_driver_attach bus_for_each_dev bus_add_driver driver_register ? bert_init ghes_init do_one_initcall ? rcu_read_lock_sched_held kernel_init_freeable ? rest_init kernel_init ret_from_fork ... ghes_edac: ghes_edac_unregister: FALSE, refcount: -1073741824 Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-15EDAC/ghes: Clear scanned data on unloadBorislav Petkov1-0/+1
Commit b972fdba8665 ("EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()") didn't clear all the information from the scanned system and, more specifically, left ghes_hw.num_dimms to its previous value. On a second load (CONFIG_DEBUG_TEST_DRIVER_REMOVE=y), the driver would use the leftover num_dimms value which is not 0 and thus the 0 check in enumerate_dimms() will get bypassed and it would go directly to the pointer deref: d = &hw->dimms[hw->num_dimms]; which is, of course, NULL: #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP CPU: 7 PID: 1 Comm: swapper/0 Not tainted 5.9.0-rc4+ #7 Hardware name: GIGABYTE MZ01-CE1-00/MZ01-CE1-00, BIOS F02 08/29/2018 RIP: 0010:enumerate_dimms.cold+0x7b/0x375 Reset the whole ghes_hw on driver unregister so that no stale values are used on a second system scan. Fixes: b972fdba8665 ("EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()") Cc: Shiju Jose <[email protected]> Signed-off-by: Borislav Petkov <[email protected]> Link: https://lkml.kernel.org/r/[email protected]
2020-09-15nvme-tcp: fix kconfig dependency warning when !CRYPTONecip Fazil Yildiran1-0/+1
When NVME_TCP is enabled and CRYPTO is disabled, it results in the following Kbuild warning: WARNING: unmet direct dependencies detected for CRYPTO_CRC32C Depends on [n]: CRYPTO [=n] Selected by [y]: - NVME_TCP [=y] && INET [=y] && BLK_DEV_NVME [=y] The reason is that NVME_TCP selects CRYPTO_CRC32C without depending on or selecting CRYPTO while CRYPTO_CRC32C is subordinate to CRYPTO. Honor the kconfig menu hierarchy to remove kconfig dependency warnings. Fixes: 79fd751d61aa ("nvme: tcp: selects CRYPTO_CRC32C for nvme-tcp") Signed-off-by: Necip Fazil Yildiran <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-09-15nvme-pci: disable the write zeros command for Intel 600P/P3100David Milburn1-1/+2
The write zeros command does not work with 4k range. bash-4.4# ./blkdiscard /dev/nvme0n1p2 bash-4.4# strace -efallocate xfs_io -c "fzero 536895488 2048" /dev/nvme0n1p2 fallocate(3, FALLOC_FL_ZERO_RANGE, 536895488, 2048) = 0 +++ exited with 0 +++ bash-4.4# dd bs=1 if=/dev/nvme0n1p2 skip=536895488 count=512 | hexdump -C 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00000200 bash-4.4# ./blkdiscard /dev/nvme0n1p2 bash-4.4# strace -efallocate xfs_io -c "fzero 536895488 4096" /dev/nvme0n1p2 fallocate(3, FALLOC_FL_ZERO_RANGE, 536895488, 4096) = 0 +++ exited with 0 +++ bash-4.4# dd bs=1 if=/dev/nvme0n1p2 skip=536895488 count=512 | hexdump -C 00000000 5c 61 5c b0 96 21 1b 5e 85 0c 07 32 9c 8c eb 3c |\a\..!.^...2...<| 00000010 4a a2 06 ca 67 15 2d 8e 29 8d a8 a0 7e 46 8c 62 |J...g.-.)...~F.b| 00000020 bb 4c 6c c1 6b f5 ae a5 e4 a9 bc 93 4f 60 ff 7a |.Ll.k.......O`.z| Reported-by: Eric Sandeen <[email protected]> Signed-off-by: David Milburn <[email protected]> Tested-by: Eric Sandeen <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2020-09-14docs/bpf: Remove source code linksAndrii Nakryiko1-1/+1
Make path to bench_ringbufs.c just a text, not a special link. Fixes: 97abb2b39682 ("docs/bpf: Add BPF ring buffer design notes") Reported-by: Mauro Carvalho Chehab <[email protected]> Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-14s390/dasd: Fix zero write for FBA devicesJan Höppner1-1/+8
A discard request that writes zeros using the global kernel internal ZERO_PAGE will fail for machines with more than 2GB of memory due to the location of the ZERO_PAGE. Fix this by using a driver owned global zero page allocated with GFP_DMA flag set. Fixes: 28b841b3a7cb ("s390/dasd: Add discard support for FBA devices") Signed-off-by: Jan Höppner <[email protected]> Reviewed-by: Stefan Haberland <[email protected]> Cc: <[email protected]> # 4.14+ Signed-off-by: Jens Axboe <[email protected]>
2020-09-14xsk: Fix number of pinned pages/umem size discrepancyBjörn Töpel1-9/+8
For AF_XDP sockets, there was a discrepancy between the number of of pinned pages and the size of the umem region. The size of the umem region is used to validate the AF_XDP descriptor addresses. The logic that pinned the pages covered by the region only took whole pages into consideration, creating a mismatch between the size and pinned pages. A user could then pass AF_XDP addresses outside the range of pinned pages, but still within the size of the region, crashing the kernel. This change correctly calculates the number of pages to be pinned. Further, the size check for the aligned mode is simplified. Now the code simply checks if the size is divisible by the chunk size. Fixes: bbff2f321a86 ("xsk: new descriptor addressing scheme") Reported-by: Ciara Loftus <[email protected]> Signed-off-by: Björn Töpel <[email protected]> Signed-off-by: Alexei Starovoitov <[email protected]> Tested-by: Ciara Loftus <[email protected]> Acked-by: Song Liu <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2020-09-14net: sched: initialize with 0 before setting erspan md->uXin Long1-0/+1
In fl_set_erspan_opt(), all bits of erspan md was set 1, as this function is also used to set opt MASK. However, when setting for md->u.index for opt VALUE, the rest bits of the union md->u will be left 1. It would cause to fail the match of the whole md when version is 1 and only index is set. This patch is to fix by initializing with 0 before setting erspan md->u. Reported-by: Shuang Li <[email protected]> Fixes: 79b1011cb33d ("net: sched: allow flower to match erspan options") Signed-off-by: Xin Long <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2020-09-14Merge branch 'net-improve-vxlan-option-process-in-net_sched-and-lwtunnel'David S. Miller4-1/+8
Xin Long says: ==================== net: improve vxlan option process in net_sched and lwtunnel This patch is to do some mask when setting vxlan option in net_sched and lwtunnel, so that only available bits can be set on vxlan md gbp. This would help when users don't know exactly vxlan's gbp bits, and avoid some mismatch because of some unavailable bits set by users. ==================== Signed-off-by: David S. Miller <[email protected]>