aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2021-07-13sfc: fix lack of XDP TX queues - error XDP TX failed (-22)Íñigo Huguet1-1/+2
Fixes: e26ca4b53582 sfc: reduce the number of requested xdp ev queues The buggy commit intended to allocate less channels for XDP in order to be more unlikely to reach the limit of 32 channels of the driver. The idea was to use each IRQ/eventqeue for more XDP TX queues than before, calculating which is the maximum number of TX queues that one event queue can handle. For example, in EF10 each event queue could handle up to 8 queues, better than the 4 they were handling before the change. This way, it would have to allocate half of channels than before for XDP TX. The problem is that the TX queues are also contained inside the channel structs, and there are only 4 queues per channel. Reducing the number of channels means also reducing the number of queues, resulting in not having the desired number of 1 queue per CPU. This leads to getting errors on XDP_TX and XDP_REDIRECT if they're executed from a high numbered CPU, because there only exist queues for the low half of CPUs, actually. If XDP_TX/REDIRECT is executed in a low numbered CPU, the error doesn't happen. This is the error in the logs (repeated many times, even rate limited): sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22) This errors happens in function efx_xdp_tx_buffers, where it expects to have a dedicated XDP TX queue per CPU. Reverting the change makes again more likely to reach the limit of 32 channels in machines with many CPUs. If this happen, no XDP_TX/REDIRECT will be possible at all, and we will have this log error messages: At interface probe: sfc 0000:5e:00.0: Insufficient resources for 12 XDP event queues (24 other channels, max 32) At every subsequent XDP_TX/REDIRECT failure, rate limited: sfc 0000:5e:00.0 ens3f0np0: XDP TX failed (-22) However, without reverting the change, it makes the user to think that everything is OK at probe time, but later it fails in an unpredictable way, depending on the CPU that handles the packet. It is better to restore the predictable behaviour. If the user sees the error message at probe time, he/she can try to configure the best way it fits his/her needs. At least, he/she will have 2 options: - Accept that XDP_TX/REDIRECT is not available (he/she may not need it) - Load sfc module with modparam 'rss_cpus' with a lower number, thus creating less normal RX queues/channels, letting more free resources for XDP, with some performance penalty. Anyway, let the calculation of maximum TX queues that can be handled by a single event queue, and use it only if it's less than the number of TX queues per channel. This doesn't happen in practice, but could happen if some constant values are tweaked in the future, such us EFX_MAX_TXQ_PER_CHANNEL, EFX_MAX_EVQ_SIZE or EFX_MAX_DMAQ_SIZE. Related mailing list thread: https://lore.kernel.org/bpf/20201215104327.2be76156@carbon/ Signed-off-by: Íñigo Huguet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-07-13cpufreq: Fix fall-through warning for ClangGustavo A. R. Silva1-2/+0
In preparation to enable -Wimplicit-fallthrough for Clang, fix a fallthrough warning by simply dropping the empty default case at the bottom. Link: https://github.com/KSPP/linux/issues/115 Signed-off-by: Gustavo A. R. Silva <[email protected]>
2021-07-13net: fddi: fix UAF in fza_probePavel Skripkin1-2/+1
fp is netdev private data and it cannot be used after free_netdev() call. Using fp after free_netdev() can cause UAF bug. Fix it by moving free_netdev() after error message. Fixes: 61414f5ec983 ("FDDI: defza: Add support for DEC FDDIcontroller 700 TURBOchannel adapter") Signed-off-by: Pavel Skripkin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-07-13net: dsa: sja1105: fix address learning getting disabled on the CPU portVladimir Oltean1-8/+6
In May 2019 when commit 640f763f98c2 ("net: dsa: sja1105: Add support for Spanning Tree Protocol") was introduced, the comment that "STP does not get called for the CPU port" was true. This changed after commit 0394a63acfe2 ("net: dsa: enable and disable all ports") in August 2019 and went largely unnoticed, because the sja1105_bridge_stp_state_set() method did nothing different compared to the static setup done by sja1105_init_mac_settings(). With the ability to turn address learning off introduced by the blamed commit, there is a new priv->learn_ena port mask in the driver. When sja1105_bridge_stp_state_set() gets called and we are in BR_STATE_LEARNING or later, address learning is enabled or not depending on priv->learn_ena & BIT(port). So what happens is that priv->learn_ena is not being set from anywhere for the CPU port, and the static configuration done by sja1105_init_mac_settings() is being overwritten. To solve this, acknowledge that the static configuration of STP state is no longer necessary because the STP state is being set by the DSA core now, but what is necessary is to set priv->learn_ena for the CPU port. Fixes: 4d9423549501 ("net: dsa: sja1105: offload bridge port flags to device") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-07-13net: ocelot: fix switchdev objects synced for wrong netdev with LAG offloadVladimir Oltean1-4/+5
The point with a *dev and a *brport_dev is that when we have a LAG net device that is a bridge port, *dev is an ocelot net device and *brport_dev is the bonding/team net device. The ocelot net device beneath the LAG does not exist from the bridge's perspective, so we need to sync the switchdev objects belonging to the brport_dev and not to the dev. Fixes: e4bd44e89dcf ("net: ocelot: replay switchdev events when joining bridge") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-07-13net: Use nlmsg_unicast() instead of netlink_unicast()Yajun Deng8-27/+13
It has 'if (err >0 )' statement in nlmsg_unicast(), so use nlmsg_unicast() instead of netlink_unicast(), this looks more concise. v2: remove the change in netfilter. Signed-off-by: Yajun Deng <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2021-07-13drm/amd/pm: Add waiting for response of mode-reset message for yellow carpAaron Liu1-7/+3
Remove mdelay process and use smu_cmn_send_smc_msg_with_param to send mode-reset message to SMC. Signed-off-by: Aaron Liu <[email protected]> Reviewed-by: Evan Quan <[email protected]> Reviewed-by: Huang Rui <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13Revert "drm/amdkfd: Add heavy-weight TLB flush after unmapping"Eric Huang1-1/+0
This reverts commit 1098d658bef05e5fee634aab0b6a1fa590cfca24. Reason for revert: it causes regressions on several Asics. Signed-off-by: Eric Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13Revert "drm/amdgpu: Add table_freed parameter to amdgpu_vm_bo_update"Eric Huang4-10/+10
This reverts commit 075e8080c1a7571563171a07fa9ce47c4bc80044. Reason for revert: the related commit is reverted. Signed-off-by: Eric Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13Revert "drm/amdkfd: Make TLB flush conditional on mapping"Eric Huang4-27/+20
This reverts commit 31f33243788dcbae8bd2819ed83923a73f7dfd30. Reason for revert: it causes regressions on several Asics. Signed-off-by: Eric Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13Revert "drm/amdgpu: Fix warning of Function parameter or member not described"Eric Huang1-1/+0
This reverts commit 7a68d188d1c4a9d947369acaa19040a58baaaeda. Reason for revert: the related commit is reverted. Signed-off-by: Eric Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13Revert "drm/amdkfd: Add memory sync before TLB flush on unmap"Eric Huang1-20/+3
This reverts commit 3be4dca197010d1328df8b11febc8c40491be498. Reason for revert: it causes regressions on several Asics. Signed-off-by: Eric Huang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13drm/amd/pm: Fix BACO state setting for Beige_GobyChengming Gui1-0/+1
Correct BACO state setting for Beige_Goby Signed-off-by: Chengming Gui <[email protected]> Reviewed-by: Jiansong Chen <[email protected]> Reviewed-by: Guchun Chen <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13drm/amdgpu: Restore msix after FLREmily.Deng1-0/+18
After FLR, the msix will be cleared, so need to re-enable it. Signed-off-by: Peng Ju Zhou <[email protected]> Signed-off-by: Emily.Deng <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13drm/amdkfd: Allow CPU access for all VRAM BOsFelix Kuehling1-2/+1
The thunk needs to mmap all BOs for CPU access to allow the debugger to access them. Invisible ones are mapped with PROT_NONE. Fixes: 71df0368e9b6 ("drm/amdgpu: Implement mmap as GEM object function") Signed-off-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13drm/amdgpu/display - only update eDP's backlight level when necessaryZhan Liu1-1/+1
[Why] The original logic is to update eDP's backlight level on every amdgpu dm atomic commit, which causes excessive DMUB write. As a result, when playing game or moving window around, DMUB timeout and system lagging are observed. [How] We only need to update eDP's backlight level when current level doesn't match requested level. Signed-off-by: Zhan Liu <[email protected]> Reviewed-by: Alex Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13drm/amdkfd: handle fault counters on invalid addressPhilip Yang1-12/+18
prange is NULL if vm fault retry on invalid address, for this case, can not use prange to get pdd, use adev to get gpuidx and then get pdd instead, then increase pdd vm fault counter. Signed-off-by: Philip Yang <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13drm/amdgpu: Correct the irq numbers for virtual crtcEmily Deng1-1/+1
The irq number should be decided by num_crtc, and the num_crtc could change by parameter. Signed-off-by: Emily Deng <[email protected]> Reviewed by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13drm/amd/display: update header file nameXiaomeng Hou1-2/+2
Update the register header file name. Signed-off-by: Xiaomeng Hou <[email protected]> Reviewed-by: Aaron Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13drm/amd/pm: drop smu_v13_0_1.c|h files for yellow carpXiaomeng Hou8-1263/+57
Since there's nothing special in smu implementation for yellow carp, it's better to reuse the common smu_v13_0 interfaces and drop the specific smu_v13_0_1.c|h files. v2: remove the duplicate register offset and shift mask header files as well. Signed-off-by: Xiaomeng Hou <[email protected]> Reviewed-by: Lijo Lazar <[email protected]> Reviewed-by: Kevin Wang <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13drm/amd/display: remove faulty assertDmytro Laktyushkin1-1/+0
Signed-off-by: Dmytro Laktyushkin <[email protected]> Reviewed-by: Wenjing Liu <[email protected]> Acked-by: Rodrigo Siqueira <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13Revert "drm/amd/display: Always write repeater mode regardless of LTTPR"Wesley Chalmers1-3/+4
This reverts commit 2b7605d73b97e2fa28e0817242e66ca968d2a7cb Some displays are not lighting up when put in LTTPR Transparent Mode Signed-off-by: Wesley Chalmers <[email protected]> Reviewed-by: Jun Lei <[email protected]> Acked-by: Rodrigo Siqueira <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Cc: [email protected]
2021-07-13drm/amd/display: Fix updating infoframe for DCN3.1 eDPNicholas Kazlauskas1-1/+1
[Why] We're only treating TMDS as a valid target for infoframe updates which results in PSR being unable to transition from state 4 to state 5. [How] Also allow infoframe updates for DCN3.1 - following how we handle this path for earlier ASIC as well. Signed-off-by: Nicholas Kazlauskas <[email protected]> Reviewed-by: Eric Yang <[email protected]> Acked-by: Rodrigo Siqueira <[email protected]> Tested-by: Daniel Wheeler <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13drm/amdgpu: Return error if no RASLuben Tuikov2-17/+38
In amdgpu_ras_query_error_count() return an error if the device doesn't support RAS. This prevents that function from having to always set the values of the integer pointers (if set), and thus prevents function side effects--always to have to set values of integers if integer pointers set, regardless of whether RAS is supported or not--with this change this side effect is mitigated. Also, if no pointers are set, don't count, since we've no way of reporting the counts. Also, give this function a kernel-doc. Cc: Alexander Deucher <[email protected]> Cc: John Clements <[email protected]> Cc: Hawking Zhang <[email protected]> Reported-by: Tom Rix <[email protected]> Fixes: a46751fbcde505 ("drm/amdgpu: Fix RAS function interface") Signed-off-by: Luben Tuikov <[email protected]> Reviewed-by: Alexander Deucher <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13drm/amdgpu: SRIOV flr_work should take write_lockJingwen Chen2-4/+4
[Why] If flr_work takes read_lock, then other threads who takes read_lock can access hardware when host is doing vf flr. [How] flr_work should take write_lock to avoid this case. Signed-off-by: Jingwen Chen <[email protected]> Reviewed-by: Monk Liu <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2021-07-13arm64: Restrict ARM64_BTI_KERNEL to clang 12.0.0 and newerNathan Chancellor1-1/+2
Commit 97fed779f2a6 ("arm64: bti: Provide Kconfig for kernel mode BTI") disabled CONFIG_ARM64_BTI_KERNEL when CONFIG_GCOV_KERNEL was enabled and compiling with clang because of warnings that were seen with allmodconfig because LLVM was not emitting PAC/BTI instructions for compiler generated functions: | warning: some functions compiled with BTI and some compiled without BTI | warning: not setting BTI in feature flags This dependency was fine for avoiding the warnings with allmodconfig until commit 51c2ee6d121c ("Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTR"), which prevents CONFIG_GCOV_KERNEL from being enabled with clang 12.0.0 or older because those versions do not support the no_profile_instrument_function attribute. As a result, CONFIG_ARM64_BTI_KERNEL gets enabled with allmodconfig and there are more warnings like the ones above due to CONFIG_KASAN, which suffers from the same problem as CONFIG_GCOV_KERNEL. This was most likely not noticed at the time because allmodconfig + CONFIG_GCOV_KERNEL=n was not tested. defconfig + CONFIG_KASAN=y is enough to reproduce the same warnings as above. The root cause of the warnings was resolved in LLVM during the 12.0.0 release so rather than play whack-a-mole with the dependencies, just update CONFIG_ARM64_BTI_KERNEL to require clang 12.0.0, which will have all of the issues ironed out. Link: https://github.com/ClangBuiltLinux/linux/issues/1428 Link: https://github.com/ClangBuiltLinux/continuous-integration2/runs/3010034706?check_suite_focus=true Link: https://github.com/ClangBuiltLinux/continuous-integration2/runs/3010035725?check_suite_focus=true Link: https://github.com/llvm/llvm-project/commit/a88c722e687e6780dcd6a58718350dc76fcc4cc9 Signed-off-by: Nathan Chancellor <[email protected]> Reviewed-by: Nick Desaulniers <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Will Deacon <[email protected]>
2021-07-13fbmem: Do not delete the mode that is still in useZhen Lei1-7/+5
The execution of fb_delete_videomode() is not based on the result of the previous fbcon_mode_deleted(). As a result, the mode is directly deleted, regardless of whether it is still in use, which may cause UAF. ================================================================== BUG: KASAN: use-after-free in fb_mode_is_equal+0x36e/0x5e0 \ drivers/video/fbdev/core/modedb.c:924 Read of size 4 at addr ffff88807e0ddb1c by task syz-executor.0/18962 CPU: 2 PID: 18962 Comm: syz-executor.0 Not tainted 5.10.45-rc1+ #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ... Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x137/0x1be lib/dump_stack.c:118 print_address_description+0x6c/0x640 mm/kasan/report.c:385 __kasan_report mm/kasan/report.c:545 [inline] kasan_report+0x13d/0x1e0 mm/kasan/report.c:562 fb_mode_is_equal+0x36e/0x5e0 drivers/video/fbdev/core/modedb.c:924 fbcon_mode_deleted+0x16a/0x220 drivers/video/fbdev/core/fbcon.c:2746 fb_set_var+0x1e1/0xdb0 drivers/video/fbdev/core/fbmem.c:975 do_fb_ioctl+0x4d9/0x6e0 drivers/video/fbdev/core/fbmem.c:1108 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Freed by task 18960: kasan_save_stack mm/kasan/common.c:48 [inline] kasan_set_track+0x3d/0x70 mm/kasan/common.c:56 kasan_set_free_info+0x17/0x30 mm/kasan/generic.c:355 __kasan_slab_free+0x108/0x140 mm/kasan/common.c:422 slab_free_hook mm/slub.c:1541 [inline] slab_free_freelist_hook+0xd6/0x1a0 mm/slub.c:1574 slab_free mm/slub.c:3139 [inline] kfree+0xca/0x3d0 mm/slub.c:4121 fb_delete_videomode+0x56a/0x820 drivers/video/fbdev/core/modedb.c:1104 fb_set_var+0x1f3/0xdb0 drivers/video/fbdev/core/fbmem.c:978 do_fb_ioctl+0x4d9/0x6e0 drivers/video/fbdev/core/fbmem.c:1108 vfs_ioctl fs/ioctl.c:48 [inline] __do_sys_ioctl fs/ioctl.c:753 [inline] __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:739 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9 Fixes: 13ff178ccd6d ("fbcon: Call fbcon_mode_deleted/new_modelist directly") Signed-off-by: Zhen Lei <[email protected]> Cc: <[email protected]> # v5.3+ Signed-off-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2021-07-13Merge remote-tracking branch 'drm-misc/drm-misc-next-fixes' into drm-misc-fixesThomas Zimmermann3-2/+2
Picking up left-over patches in drm-misc-next-fixes.
2021-07-13Merge tag 'drm-misc-fixes-2021-07-13' of ↵Daniel Vetter2-9/+8
git://anongit.freedesktop.org/drm/drm-misc into drm-fixes Short summary of fixes pull: * dma-buf: Fix fence leak in sync_file_merge() error code * drm/panel: nt35510: Don't fail on DSI reads Signed-off-by: Daniel Vetter <[email protected]> From: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/YO07pEfweKVO+7y0@linux-uq9g
2021-07-13firmware: arm_scmi: Fix range check for the maximum number of pending messagesCristian Marussi1-2/+3
SCMI message headers carry a sequence number and such field is sized to allow for MSG_TOKEN_MAX distinct numbers; moreover zero is not really an acceptable maximum number of pending in-flight messages. Fix accordingly the checks performed on the value exported by transports in scmi_desc.max_msg Link: https://lore.kernel.org/r/[email protected] Reported-by: Vincent Guittot <[email protected]> Signed-off-by: Cristian Marussi <[email protected]> [sudeep.holla: updated the patch title and error message] Signed-off-by: Sudeep Holla <[email protected]>
2021-07-13firmware: arm_scmi: Avoid padding in sensor message structureCristian Marussi1-2/+4
scmi_resp_sensor_reading_complete structure is meant to represent an SCMI asynchronous reading complete message. The readings field with a 64bit type forces padding and breaks reads in scmi_sensor_reading_get. Split it in two adjacent 32bit readings_low/high subfields to avoid the padding within the structure. Alternatively we could to mark the structure packed. Link: https://lore.kernel.org/r/[email protected] Fixes: e2083d3673916 ("firmware: arm_scmi: Add SCMI v3.0 sensors timestamped reads") Signed-off-by: Cristian Marussi <[email protected]> Signed-off-by: Sudeep Holla <[email protected]>
2021-07-13firmware: arm_scmi: Fix kernel doc warnings about return valuesCristian Marussi2-0/+6
Kernel doc validation script still complains about the following: |No description found for return value of 'scmi_get_protocol_device' |No description found for return value of 'scmi_devm_notifier_register' |No description found for return value of 'scmi_devm_notifier_unregister' Fix adding missing Return kernel-doc statements. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Cristian Marussi <[email protected]> Signed-off-by: Sudeep Holla <[email protected]>
2021-07-13firmware: arm_scpi: Fix kernel doc warningsSudeep Holla1-0/+8
Kernel doc validation script is unhappy and complains with the below set of warnings. | Function parameter or member 'device_domain_id' not described in 'scpi_ops' | Function parameter or member 'get_transition_latency' not described in 'scpi_ops' | Function parameter or member 'add_opps_to_device' not described in 'scpi_ops' | Function parameter or member 'sensor_get_capability' not described in 'scpi_ops' | Function parameter or member 'sensor_get_info' not described in 'scpi_ops' | Function parameter or member 'sensor_get_value' not described in 'scpi_ops' | Function parameter or member 'device_get_power_state' not described in 'scpi_ops' | Function parameter or member 'device_set_power_state' not described in 'scpi_ops' Fix them adding appropriate documents or missing keywords. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Cristian Marussi <[email protected]> Signed-off-by: Sudeep Holla <[email protected]>
2021-07-13firmware: arm_scmi: Fix kernel doc warningsSudeep Holla1-5/+9
Kernel doc validation script is unhappy and complains with the below set of warnings. | Function parameter or member 'fast_switch_possible' not described in 'scmi_perf_proto_ops' | Function parameter or member 'power_scale_mw_get' not described in 'scmi_perf_proto_ops' | cannot understand function prototype: 'struct scmi_sensor_reading ' | cannot understand function prototype: 'struct scmi_range_attrs ' | cannot understand function prototype: 'struct scmi_sensor_axis_info ' | cannot understand function prototype: 'struct scmi_sensor_intervals_info ' Fix them adding appropriate documents or missing keywords. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Cristian Marussi <[email protected]> Signed-off-by: Sudeep Holla <[email protected]>
2021-07-13nvme-pci: do not call nvme_dev_remove_admin from nvme_removeCasey Chen1-1/+0
nvme_dev_remove_admin could free dev->admin_q and the admin_tagset while they are being accessed by nvme_dev_disable(), which can be called by nvme_reset_work via nvme_remove_dead_ctrl. Commit cb4bfda62afa ("nvme-pci: fix hot removal during error handling") intended to avoid requests being stuck on a removed controller by killing the admin queue. But the later fix c8e9e9b7646e ("nvme-pci: unquiesce admin queue on shutdown"), together with nvme_dev_disable(dev, true) right before nvme_dev_remove_admin() could help dispatch requests and fail them early, so we don't need nvme_dev_remove_admin() any more. Fixes: cb4bfda62afa ("nvme-pci: fix hot removal during error handling") Signed-off-by: Casey Chen <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2021-07-13nvme-pci: fix multiple races in nvme_setup_io_queuesCasey Chen1-8/+58
Below two paths could overlap each other if we power off a drive quickly after powering it on. There are multiple races in nvme_setup_io_queues() because of shutdown_lock missing and improper use of NVMEQ_ENABLED bit. nvme_reset_work() nvme_remove() nvme_setup_io_queues() nvme_dev_disable() ... ... A1 clear NVMEQ_ENABLED bit for admin queue lock retry: B1 nvme_suspend_io_queues() A2 pci_free_irq() admin queue B2 nvme_suspend_queue() admin queue A3 pci_free_irq_vectors() nvme_pci_disable() A4 nvme_setup_irqs(); B3 pci_free_irq_vectors() ... unlock A5 queue_request_irq() for admin queue set NVMEQ_ENABLED bit ... nvme_create_io_queues() A6 result = queue_request_irq(); set NVMEQ_ENABLED bit ... fail to allocate enough IO queues: A7 nvme_suspend_io_queues() goto retry If B3 runs in between A1 and A2, it will crash if irqaction haven't been freed by A2. B2 is supposed to free admin queue IRQ but it simply can't fulfill the job as A1 has cleared NVMEQ_ENABLED bit. Fix: combine A1 A2 so IRQ get freed as soon as the NVMEQ_ENABLED bit gets cleared. After solved #1, A2 could race with B3 if A2 is freeing IRQ while B3 is checking irqaction. A3 also could race with B2 if B2 is freeing IRQ while A3 is checking irqaction. Fix: A2 and A3 take lock for mutual exclusion. A3 could race with B3 since they could run free_msi_irqs() in parallel. Fix: A3 takes lock for mutual exclusion. A4 could fail to allocate all needed IRQ vectors if A3 and A4 are interrupted by B3. Fix: A4 takes lock for mutual exclusion. If A5/A6 happened after B2/B1, B3 will crash since irqaction is not NULL. They are just allocated by A5/A6. Fix: Lock queue_request_irq() and setting of NVMEQ_ENABLED bit. A7 could get chance to pci_free_irq() for certain IO queue while B3 is checking irqaction. Fix: A7 takes lock. nvme_dev->online_queues need to be protected by shutdown_lock. Since it is not atomic, both paths could modify it using its own copy. Co-developed-by: Yuanyuan Zhong <[email protected]> Signed-off-by: Casey Chen <[email protected]> Reviewed-by: Keith Busch <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2021-07-13nvme-tcp: use __dev_get_by_name instead dev_get_by_name for OPT_HOST_IFACEPrabhakar Kushwaha1-3/+1
dev_get_by_name() finds network device by name but it also increases the reference count. If a nvme-tcp queue is present and the network device driver is removed before nvme_tcp, we will face the following continuous log: "kernel:unregister_netdevice: waiting for <eth> to become free. Usage count = 2" And rmmod further halts. Similar case arises during reboot/shutdown with nvme-tcp queue present and both never completes. To fix this, use __dev_get_by_name() which finds network device by name without increasing any reference counter. Fixes: 3ede8f72a9a2 ("nvme-tcp: allow selecting the network interface for connections") Signed-off-by: Omkar Kulkarni <[email protected]> Signed-off-by: Shai Malin <[email protected]> Signed-off-by: Prabhakar Kushwaha <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> [hch: remove the ->ndev member entirely] Signed-off-by: Christoph Hellwig <[email protected]>
2021-07-13ARM: shmobile: defconfig: Restore graphical consolesGeert Uytterhoeven1-0/+1
As of commit f611b1e7624ccdbd ("drm: Avoid circular dependencies for CONFIG_FB"), CONFIG_FB is no longer auto-enabled. While CONFIG_FB may be considered unneeded for systems where graphics is provided by a DRM driver, R-Mobile A1 still relies on a frame buffer device driver for graphics support. Restore support for graphics on R-Mobile A1 and graphical consoles on DRM-based systems by explicitly enabling CONFIG_FB in the defconfig for Renesas ARM systems. Fixes: f611b1e7624ccdbd ("drm: Avoid circular dependencies for CONFIG_FB") Signed-off-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/2a4474be1d2c00c6ca97c2714844ea416a9ea9a9.1626084948.git.geert+renesas@glider.be
2021-07-12drm/msm: Fix fall-through warning in msm_gem_new_impl()Gustavo A. R. Silva1-1/+1
Fix the following fall-through warning: drivers/gpu/drm/msm/msm_gem.c: In function 'msm_gem_new_impl': drivers/gpu/drm/msm/msm_gem.c:1170:6: warning: this statement may fall through [-Wimplicit-fallthrough=] 1170 | if (priv->has_cached_coherent) | ^ drivers/gpu/drm/msm/msm_gem.c:1173:2: note: here 1173 | default: | ^~~~~~~ by replacing the /* fallthrough */ comment with fallthrough; Reported-by: Stephen Rothwell <[email protected]> Signed-off-by: Gustavo A. R. Silva <[email protected]>
2021-07-12scsi: ufs: core: Add missing host_lock in ufshcd_vops_setup_xfer_req()Jaegeuk Kim1-2/+7
This patch adds a host_lock which existed before on ufshcd_vops_setup_xfer_req(). Link: https://lore.kernel.org/r/[email protected] Fixes: a45f937110fa ("scsi: ufs: Optimize host lock on transfer requests send/compl paths") Cc: Stanley Chu <[email protected]> Cc: Can Guo <[email protected]> Cc: Bean Huo <[email protected]> Cc: Bart Van Assche <[email protected]> Cc: Asutosh Das <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: Jaegeuk Kim <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2021-07-12scsi: mpi3mr: Fix W=1 compilation warningsSreekanth Reddy1-8/+7
Fix for the following W=1 compilation warning: 'strncpy' output may be truncated copying 16 bytes from a string of length 64 Link: https://lore.kernel.org/r/[email protected] Reported-by: kernel test robot <[email protected]> Signed-off-by: Sreekanth Reddy <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2021-07-12scsi: pm8001: Clean up kernel-doc and commentsRandy Dunlap5-89/+97
Fix kernel-doc warnings then test again, wash, rinse, find more, then repeat more/again. Also fix spellos, some grammar, and some punctuation. ../drivers/scsi/pm8001/pm8001_ctl.c:557: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst ** pm8001_ctl_fatal_log_show - fatal error logging ../drivers/scsi/pm8001/pm8001_ctl.c:577: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst ** non_fatal_log_show - non fatal error logging ../drivers/scsi/pm8001/pm8001_ctl.c:622: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst ** pm8001_ctl_gsm_log_show - gsm dump collection Link: https://lore.kernel.org/r/[email protected] Cc: Jack Wang <[email protected]> Cc: [email protected] Cc: "James E.J. Bottomley" <[email protected]> Cc: "Martin K. Petersen" <[email protected]> Acked-by: Jack Wang <[email protected]> Signed-off-by: Randy Dunlap <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2021-07-12scsi: zfcp: Report port fc_security as unknown early during remote cable pullSteffen Maier1-0/+1
On remote cable pull, a zfcp_port keeps its status and only gets ZFCP_STATUS_PORT_LINK_TEST added. Only after an ADISC timeout, we would actually start port recovery and remove ZFCP_STATUS_COMMON_UNBLOCKED which zfcp_sysfs_port_fc_security_show() detected and reported as "unknown" instead of the old and possibly stale zfcp_port->connection_info. Add check for ZFCP_STATUS_PORT_LINK_TEST for timely "unknown" report. Link: https://lore.kernel.org/r/[email protected] Fixes: a17c78460093 ("scsi: zfcp: report FC Endpoint Security in sysfs") Cc: <[email protected]> #5.7+ Reviewed-by: Benjamin Block <[email protected]> Signed-off-by: Steffen Maier <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2021-07-12scsi: core: Fix bad pointer dereference when ehandler kthread is invalidTyrel Datwyler1-0/+1
Commit 66a834d09293 ("scsi: core: Fix error handling of scsi_host_alloc()") changed the allocation logic to call put_device() to perform host cleanup with the assumption that IDA removal and stopping the kthread would properly be performed in scsi_host_dev_release(). However, in the unlikely case that the error handler thread fails to spawn, shost->ehandler is set to ERR_PTR(-ENOMEM). The error handler cleanup code in scsi_host_dev_release() will call kthread_stop() if shost->ehandler != NULL which will always be the case whether the kthread was successfully spawned or not. In the case that it failed to spawn this has the nasty side effect of trying to dereference an invalid pointer when kthread_stop() is called. The following splat provides an example of this behavior in the wild: scsi host11: error handler thread failed to spawn, error = -4 Kernel attempted to read user page (10c) - exploit attempt? (uid: 0) BUG: Kernel NULL pointer dereference on read at 0x0000010c Faulting instruction address: 0xc00000000818e9a8 Oops: Kernel access of bad area, sig: 11 [#1] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries Modules linked in: ibmvscsi(+) scsi_transport_srp dm_multipath dm_mirror dm_region hash dm_log dm_mod fuse overlay squashfs loop CPU: 12 PID: 274 Comm: systemd-udevd Not tainted 5.13.0-rc7 #1 NIP: c00000000818e9a8 LR: c0000000089846e8 CTR: 0000000000007ee8 REGS: c000000037d12ea0 TRAP: 0300 Not tainted (5.13.0-rc7) MSR: 800000000280b033 &lt;SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE&gt; CR: 28228228 XER: 20040001 CFAR: c0000000089846e4 DAR: 000000000000010c DSISR: 40000000 IRQMASK: 0 GPR00: c0000000089846e8 c000000037d13140 c000000009cc1100 fffffffffffffffc GPR04: 0000000000000001 0000000000000000 0000000000000000 c000000037dc0000 GPR08: 0000000000000000 c000000037dc0000 0000000000000001 00000000fffff7ff GPR12: 0000000000008000 c00000000a049000 c000000037d13d00 000000011134d5a0 GPR16: 0000000000001740 c0080000190d0000 c0080000190d1740 c000000009129288 GPR20: c000000037d13bc0 0000000000000001 c000000037d13bc0 c0080000190b7898 GPR24: c0080000190b7708 0000000000000000 c000000033bb2c48 0000000000000000 GPR28: c000000046b28280 0000000000000000 000000000000010c fffffffffffffffc NIP [c00000000818e9a8] kthread_stop+0x38/0x230 LR [c0000000089846e8] scsi_host_dev_release+0x98/0x160 Call Trace: [c000000033bb2c48] 0xc000000033bb2c48 (unreliable) [c0000000089846e8] scsi_host_dev_release+0x98/0x160 [c00000000891e960] device_release+0x60/0x100 [c0000000087e55c4] kobject_release+0x84/0x210 [c00000000891ec78] put_device+0x28/0x40 [c000000008984ea4] scsi_host_alloc+0x314/0x430 [c0080000190b38bc] ibmvscsi_probe+0x54/0xad0 [ibmvscsi] [c000000008110104] vio_bus_probe+0xa4/0x4b0 [c00000000892a860] really_probe+0x140/0x680 [c00000000892aefc] driver_probe_device+0x15c/0x200 [c00000000892b63c] device_driver_attach+0xcc/0xe0 [c00000000892b740] __driver_attach+0xf0/0x200 [c000000008926f28] bus_for_each_dev+0xa8/0x130 [c000000008929ce4] driver_attach+0x34/0x50 [c000000008928fc0] bus_add_driver+0x1b0/0x300 [c00000000892c798] driver_register+0x98/0x1a0 [c00000000810eb60] __vio_register_driver+0x80/0xe0 [c0080000190b4a30] ibmvscsi_module_init+0x9c/0xdc [ibmvscsi] [c0000000080121d0] do_one_initcall+0x60/0x2d0 [c000000008261abc] do_init_module+0x7c/0x320 [c000000008265700] load_module+0x2350/0x25b0 [c000000008265cb4] __do_sys_finit_module+0xd4/0x160 [c000000008031110] system_call_exception+0x150/0x2d0 [c00000000800d35c] system_call_common+0xec/0x278 Fix this be nulling shost->ehandler when the kthread fails to spawn. Link: https://lore.kernel.org/r/[email protected] Fixes: 66a834d09293 ("scsi: core: Fix error handling of scsi_host_alloc()") Cc: [email protected] Reviewed-by: Ming Lei <[email protected]> Signed-off-by: Tyrel Datwyler <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2021-07-12scsi: fas216: Fix a build errorBart Van Assche1-1/+1
Use SAM_STAT_GOOD instead of GOOD since GOOD has been removed. Link: https://lore.kernel.org/r/[email protected] Fixes: 3d45cefc8edd ("scsi: core: Drop obsolete Linux-specific SCSI status codes") Fixes: df1303147649 ("scsi: fas216: Use get_status_byte() to avoid using Linux-specific status codes") Cc: Hannes Reinecke <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2021-07-12scsi: core: Fix the documentation of the scsi_execute() time parameterBart Van Assche1-1/+1
The unit of the scsi_execute() timeout parameter is 1/HZ seconds instead of one second, just like the timeouts used in the block layer. Fix the documentation header above the definition of the scsi_execute() macro. Link: https://lore.kernel.org/r/[email protected] Fixes: "[SCSI] use scatter lists for all block pc requests and simplify hw handlers" # v2.6.16.28 Cc: "James E.J. Bottomley" <[email protected]> Cc: Hannes Reinecke <[email protected]> Cc: Ming Lei <[email protected]> Cc: John Garry <[email protected]> Reviewed-by: Avri Altman <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Bart Van Assche <[email protected]> Signed-off-by: Martin K. Petersen <[email protected]>
2021-07-12selftests: memory-hotplug: avoid spamming logs with dump_page(), ratio limit ↵Paolo Pisati1-1/+3
hot-remove error test While the offline memory test obey ratio limit, the same test with error injection does not and tries to offline all the hotpluggable memory, spamming system logs with hundreds of thousands of dump_page() entries, slowing system down (to the point the test itself timesout and gets terminated) and excessive fs occupation: ... [ 9784.393354] page:c00c0000007d1b40 refcount:3 mapcount:0 mapping:c0000001fc03e950 index:0xe7b [ 9784.393355] def_blk_aops [ 9784.393356] flags: 0x3ffff800002062(referenced|active|workingset|private) [ 9784.393358] raw: 003ffff800002062 c0000001b9343a68 c0000001b9343a68 c0000001fc03e950 [ 9784.393359] raw: 0000000000000e7b c000000006607b18 00000003ffffffff c00000000490d000 [ 9784.393359] page dumped because: migration failure [ 9784.393360] page->mem_cgroup:c00000000490d000 [ 9784.393416] migrating pfn 1f46d failed ret:1 ... $ grep "page dumped because: migration failure" /var/log/kern.log | wc -l 2405558 $ ls -la /var/log/kern.log -rw-r----- 1 syslog adm 2256109539 Jun 30 14:19 /var/log/kern.log Signed-off-by: Paolo Pisati <[email protected]> Acked-by: Krzysztof Kozlowski <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2021-07-12kunit: tool: Assert the version requirementSeongJae Park1-0/+2
Commit 87c9c1631788 ("kunit: tool: add support for QEMU") on the 'next' tree adds 'from __future__ import annotations' in 'kunit_kernel.py'. Because it is supported on only >=3.7 Python, people using older Python will get below error: Traceback (most recent call last): File "./tools/testing/kunit/kunit.py", line 20, in <module> import kunit_kernel File "/home/sjpark/linux/tools/testing/kunit/kunit_kernel.py", line 9 from __future__ import annotations ^ SyntaxError: future feature annotations is not defined This commit adds a version assertion in 'kunit.py', so that people get more explicit error message like below: Traceback (most recent call last): File "./tools/testing/kunit/kunit.py", line 15, in <module> assert sys.version_info >= (3, 7), "Python version is too old" AssertionError: Python version is too old Signed-off-by: SeongJae Park <[email protected]> Acked-by: Daniel Latypov <[email protected]> Reviewed-by: Brendan Higgins <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2021-07-12kunit: tool: remove unnecessary "annotations" importDaniel Latypov1-4/+2
The import was working around the fact "tuple[T]" was used instead of typing.Tuple[T]. Convert it to use type.Tuple to be consistent with how the rest of the code is anotated. Signed-off-by: Daniel Latypov <[email protected]> Reviewed-by: David Gow <[email protected]> Reviewed-by: Brendan Higgins <[email protected]> Tested-by: Brendan Higgins <[email protected]> Signed-off-by: Shuah Khan <[email protected]>
2021-07-12Documentation: kunit: drop obsolete note about uml_abort for coverageDaniel Latypov1-13/+1
Commit b6d5799b0b58 ("kunit: Add 'kunit_shutdown' option") changes KUnit to call kernel_halt() by default when done testing. This fixes the issue with not having .gcda files due to not calling atexit() handlers, and therefore we can stop recommending people manually tweak UML code. The need to use older versions of GCC (<=6) remains however, due to linktime issues, same as before. Note: There also might still be issues with .gcda files as well in newer versions. Signed-off-by: Daniel Latypov <[email protected]> Reviewed-by: David Gow <[email protected]> Reviewed-by: Brendan Higgins <[email protected]> Signed-off-by: Shuah Khan <[email protected]>