aboutsummaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)AuthorFilesLines
2023-12-05Merge tag 'for-linus-iommufd' of ↵Linus Torvalds7-113/+171
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd fixes from Jason Gunthorpe: - A small fix for the dirty tracking self test to fail correctly if the code is buggy - Fix a tricky syzkaller race UAF with object reference counting * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: iommufd: Do not UAF during iommufd_put_object() iommufd: Add iommufd_ctx to iommufd_put_object() iommufd/selftest: Fix _test_mock_dirty_bitmaps()
2023-12-05Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds3-5/+11
Pull vdpa fixes from Michael Tsirkin: "Fixes in mlx5 and pds drivers" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: pds_vdpa: set features order pds_vdpa: clear config callback when status goes to 0 pds_vdpa: fix up format-truncation complaint vdpa/mlx5: preserve CVQ vringh index
2023-12-04nvme: fix deadlock between reset and scanBitao Hu2-0/+11
If controller reset occurs when allocating namespace, both nvme_reset_work and nvme_scan_work will hang, as shown below. Test Scripts: for ((t=1;t<=128;t++)) do nsid=`nvme create-ns /dev/nvme1 -s 14537724 -c 14537724 -f 0 -m 0 \ -d 0 | awk -F: '{print($NF);}'` nvme attach-ns /dev/nvme1 -n $nsid -c 0 done nvme reset /dev/nvme1 We will find that both nvme_reset_work and nvme_scan_work hung: INFO: task kworker/u249:4:17848 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u249:4 state:D stack: 0 pid:17848 ppid: 2 flags:0x00000028 Workqueue: nvme-reset-wq nvme_reset_work [nvme] Call trace: __switch_to+0xb4/0xfc __schedule+0x22c/0x670 schedule+0x4c/0xd0 blk_mq_freeze_queue_wait+0x84/0xc0 nvme_wait_freeze+0x40/0x64 [nvme_core] nvme_reset_work+0x1c0/0x5cc [nvme] process_one_work+0x1d8/0x4b0 worker_thread+0x230/0x440 kthread+0x114/0x120 INFO: task kworker/u249:3:22404 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:kworker/u249:3 state:D stack: 0 pid:22404 ppid: 2 flags:0x00000028 Workqueue: nvme-wq nvme_scan_work [nvme_core] Call trace: __switch_to+0xb4/0xfc __schedule+0x22c/0x670 schedule+0x4c/0xd0 rwsem_down_write_slowpath+0x32c/0x98c down_write+0x70/0x80 nvme_alloc_ns+0x1ac/0x38c [nvme_core] nvme_validate_or_alloc_ns+0xbc/0x150 [nvme_core] nvme_scan_ns_list+0xe8/0x2e4 [nvme_core] nvme_scan_work+0x60/0x500 [nvme_core] process_one_work+0x1d8/0x4b0 worker_thread+0x260/0x440 kthread+0x114/0x120 INFO: task nvme:28428 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:nvme state:D stack: 0 pid:28428 ppid: 27119 flags:0x00000000 Call trace: __switch_to+0xb4/0xfc __schedule+0x22c/0x670 schedule+0x4c/0xd0 schedule_timeout+0x160/0x194 do_wait_for_common+0xac/0x1d0 __wait_for_common+0x78/0x100 wait_for_completion+0x24/0x30 __flush_work.isra.0+0x74/0x90 flush_work+0x14/0x20 nvme_reset_ctrl_sync+0x50/0x74 [nvme_core] nvme_dev_ioctl+0x1b0/0x250 [nvme_core] __arm64_sys_ioctl+0xa8/0xf0 el0_svc_common+0x88/0x234 do_el0_svc+0x7c/0x90 el0_svc+0x1c/0x30 el0_sync_handler+0xa8/0xb0 el0_sync+0x148/0x180 The reason for the hang is that nvme_reset_work occurs while nvme_scan_work is still running. nvme_scan_work may add new ns into ctrl->namespaces list after nvme_reset_work frozen all ns->q in ctrl->namespaces list. The newly added ns is not frozen, so nvme_wait_freeze will wait forever. Unfortunately, ctrl->namespaces_rwsem is held by nvme_reset_work, so nvme_scan_work will also wait forever. Now we are deadlocked! PROCESS1 PROCESS2 ============== ============== nvme_scan_work ... nvme_reset_work nvme_validate_or_alloc_ns nvme_dev_disable nvme_alloc_ns nvme_start_freeze down_write ... nvme_ns_add_to_ctrl_list ... up_write nvme_wait_freeze ... down_read nvme_alloc_ns blk_mq_freeze_queue_wait down_write Fix by marking the ctrl with say NVME_CTRL_FROZEN flag set in nvme_start_freeze and cleared in nvme_unfreeze. Then the scan can check it before adding the new namespace (under the namespaces_rwsem). Signed-off-by: Bitao Hu <[email protected]> Reviewed-by: Guixin Liu <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-12-04nvme: prevent potential spectre v1 gadgetNitesh Shetty1-0/+3
This patch fixes the smatch warning, "nvmet_ns_ana_grpid_store() warn: potential spectre issue 'nvmet_ana_group_enabled' [w] (local cap)" Prevent the contents of kernel memory from being leaked to user space via speculative execution by using array_index_nospec. Signed-off-by: Nitesh Shetty <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-12-04nvme: improve NVME_HOST_AUTH and NVME_TARGET_AUTH config descriptionsShin'ichiro Kawasaki2-4/+6
Currently two similar config options NVME_HOST_AUTH and NVME_TARGET_AUTH have almost same descriptions. It is confusing to choose them in menuconfig. Improve the descriptions to distinguish them. Signed-off-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-12-04nvme-ioctl: move capable() admin check to the endKeith Busch1-10/+11
This can be an expensive call on some kernel configs. Move it to the end after checking the cheaper ways to determine if the command is allowed. Reviewed-by: Jens Axboe <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-12-04nvme: ensure reset state check orderingKeith Busch5-49/+63
A different CPU may be setting the ctrl->state value, so ensure proper barriers to prevent optimizing to a stale state. Normally it isn't a problem to observe the wrong state as it is merely advisory to take a quicker path during initialization and error recovery, but seeing an old state can report unexpected ENETRESET errors when a reset request was in fact successful. Reported-by: Minh Hoang <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Keith Busch <[email protected]> Signed-off-by: Hannes Reinecke <[email protected]>
2023-12-04nvme: introduce helper function to get ctrl stateKeith Busch1-0/+5
The controller state is typically written by another CPU, so reading it should ensure no optimizations are taken. This is a repeated pattern in the driver, so start with adding a convenience function that returns the controller state with READ_ONCE(). Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-12-04usb: typec: class: fix typec_altmode_put_partner to put plugsRD Babiera1-2/+3
When typec_altmode_put_partner is called by a plug altmode upon release, the port altmode the plug belongs to will not remove its reference to the plug. The check to see if the altmode being released evaluates against the released altmode's partner instead of the calling altmode itself, so change adev in typec_altmode_put_partner to properly refer to the altmode being released. typec_altmode_set_partner is not run for port altmodes, so also add a check in typec_altmode_release to prevent typec_altmode_put_partner() calls on port altmode release. Fixes: 8a37d87d72f0 ("usb: typec: Bus type for alternate modes") Cc: [email protected] Signed-off-by: RD Babiera <[email protected]> Reviewed-by: Heikki Krogerus <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-12-04USB: gadget: core: adjust uevent timing on gadget unbindRoy Luo1-2/+2
The KOBJ_CHANGE uevent is sent before gadget unbind is actually executed, resulting in inaccurate uevent emitted at incorrect timing (the uevent would have USB_UDC_DRIVER variable set while it would soon be removed). Move the KOBJ_CHANGE uevent to the end of the unbind function so that uevent is sent only after the change has been made. Fixes: 2ccea03a8f7e ("usb: gadget: introduce UDC Class") Cc: [email protected] Signed-off-by: Roy Luo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-12-04platform/mellanox: Check devm_hwmon_device_register_with_groups() return valueKunwu Chan1-0/+2
devm_hwmon_device_register_with_groups() returns an error pointer upon failure. Check its return value for errors. Compile-tested only. Fixes: 1a218d312e65 ("platform/mellanox: mlxbf-pmc: Add Mellanox BlueField PMC driver") Suggested-by: Ilpo Järvinen <[email protected]> Suggested-by: Vadim Pasternak <[email protected]> Signed-off-by: Kunwu Chan <[email protected]> Reviewed-by: Vadim Pasternak <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ij: split the change into two] Reviewed-by: Ilpo Järvinen <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]>
2023-12-04platform/mellanox: Add null pointer checks for devm_kasprintf()Kunwu Chan1-0/+12
devm_kasprintf() returns a pointer to dynamically allocated memory which can be NULL upon failure. Compile-tested only. Fixes: 1a218d312e65 ("platform/mellanox: mlxbf-pmc: Add Mellanox BlueField PMC driver") Suggested-by: Ilpo Järvinen <[email protected]> Suggested-by: Vadim Pasternak <[email protected]> Signed-off-by: Kunwu Chan <[email protected]> Reviewed-by: Vadim Pasternak <[email protected]> Link: https://lore.kernel.org/r/[email protected] [ij: split the change into two] Reviewed-by: Ilpo Järvinen <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]>
2023-12-04mlxbf-bootctl: correctly identify secure boot with development keysDavid Thompson1-13/+26
The secure boot state of the BlueField SoC is represented by two bits: 0 = production state 1 = secure boot enabled 2 = non-secure (secure boot disabled) 3 = RMA state There is also a single bit to indicate whether production keys or development keys are being used when secure boot is enabled. This single bit (specified by MLXBF_BOOTCTL_SB_DEV_MASK) only has meaning if secure boot state equals 1 (secure boot enabled). The secure boot states are as follows: - “GA secured” is when secure boot is enabled with official production keys. - “Secured (development)” is when secure boot is enabled with development keys. Without this fix “GA Secured” is displayed on development cards which is misleading. This patch updates the logic in "lifecycle_state_show()" to handle the case where the SoC is configured for secure boot and is using development keys. Fixes: 79e29cb8fbc5c ("platform/mellanox: Add bootctl driver for Mellanox BlueField Soc") Reviewed-by: Khalil Blaiech <[email protected]> Signed-off-by: David Thompson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Ilpo Järvinen <[email protected]> Signed-off-by: Ilpo Järvinen <[email protected]>
2023-12-04regmap: fix bogus error on regcache_sync successMatthias Reichl1-2/+1
Since commit 0ec7731655de ("regmap: Ensure range selector registers are updated after cache sync") opening pcm512x based soundcards fail with EINVAL and dmesg shows sync cache and pm_runtime_get errors: [ 228.794676] pcm512x 1-004c: Failed to sync cache: -22 [ 228.794740] pcm512x 1-004c: ASoC: error at snd_soc_pcm_component_pm_runtime_get on pcm512x.1-004c: -22 This is caused by the cache check result leaking out into the regcache_sync return value. Fix this by making the check local-only, as the comment above the regcache_read call states a non-zero return value means there's nothing to do so the return value should not be altered. Fixes: 0ec7731655de ("regmap: Ensure range selector registers are updated after cache sync") Cc: [email protected] Signed-off-by: Matthias Reichl <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-12-04firmware: arm_scmi: Fix possible frequency truncation when using level ↵Sudeep Holla1-5/+7
indexing mode The multiplier is already promoted to unsigned long, however the frequency calculations done when using level indexing mode doesn't use the multiplier computed. It instead hardcodes the multiplier value of 1000 at all the usage sites. Clean that up by assigning the multiplier value of 1000 when using the perf level indexing mode and update the frequency calculations to use the multiplier instead. It should fix the possible frequency truncation for all the values greater than or equal to 4GHz on 64-bit machines. Fixes: 31c7c1397a33 ("firmware: arm_scmi: Add v3.2 perf level indexing mode support") Reported-by: Sibi Sankar <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Cc: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Cristian Marussi <[email protected]> Signed-off-by: Sudeep Holla <[email protected]>
2023-12-04firmware: arm_scmi: Fix frequency truncation by promoting multiplier typeSudeep Holla1-3/+3
Fix the possible frequency truncation for all values equal to or greater 4GHz on 64bit machines by updating the multiplier 'mult_factor' to 'unsigned long' type. It is also possible that the multiplier itself can be greater than or equal to 2^32. So we need to also fix the equation computing the value of the multiplier. Fixes: a9e3fbfaa0ff ("firmware: arm_scmi: add initial support for performance protocol") Reported-by: Sibi Sankar <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Cc: Cristian Marussi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sudeep Holla <[email protected]>
2023-12-04r8152: Add RTL8152_INACCESSIBLE to r8153_aldps_en()Douglas Anderson1-0/+2
Delay loops in r8152 should break out if RTL8152_INACCESSIBLE is set so that they don't delay too long if the device becomes inaccessible. Add the break to the loop in r8153_aldps_en(). Fixes: 4214cc550bf9 ("r8152: check if disabling ALDPS is finished") Reviewed-by: Grant Grundler <[email protected]> Signed-off-by: Douglas Anderson <[email protected]> Acked-by: Hayes Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-12-04r8152: Add RTL8152_INACCESSIBLE to r8153_pre_firmware_1()Douglas Anderson1-0/+2
Delay loops in r8152 should break out if RTL8152_INACCESSIBLE is set so that they don't delay too long if the device becomes inaccessible. Add the break to the loop in r8153_pre_firmware_1(). Fixes: 9370f2d05a2a ("r8152: support request_firmware for RTL8153") Reviewed-by: Grant Grundler <[email protected]> Signed-off-by: Douglas Anderson <[email protected]> Acked-by: Hayes Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-12-04r8152: Add RTL8152_INACCESSIBLE to r8156b_wait_loading_flash()Douglas Anderson1-0/+2
Delay loops in r8152 should break out if RTL8152_INACCESSIBLE is set so that they don't delay too long if the device becomes inaccessible. Add the break to the loop in r8156b_wait_loading_flash(). Fixes: 195aae321c82 ("r8152: support new chips") Reviewed-by: Grant Grundler <[email protected]> Signed-off-by: Douglas Anderson <[email protected]> Acked-by: Hayes Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-12-04r8152: Add RTL8152_INACCESSIBLE checks to more loopsDouglas Anderson1-0/+8
Previous commits added checks for RTL8152_INACCESSIBLE in the loops in the driver. There are still a few more that keep tripping the driver up in error cases and make things take longer than they should. Add those in. All the loops that are part of this commit existed in some form or another since the r8152 driver was first introduced, though RTL8152_INACCESSIBLE was known as RTL8152_UNPLUG before commit 715f67f33af4 ("r8152: Rename RTL8152_UNPLUG to RTL8152_INACCESSIBLE") Fixes: ac718b69301c ("net/usb: new driver for RTL8152") Reviewed-by: Grant Grundler <[email protected]> Signed-off-by: Douglas Anderson <[email protected]> Acked-by: Hayes Wang <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-12-04r8152: Hold the rtnl_lock for all of resetDouglas Anderson1-6/+7
As of commit d9962b0d4202 ("r8152: Block future register access if register access fails") there is a race condition that can happen between the USB device reset thread and napi_enable() (not) getting called during rtl8152_open(). Specifically: * While rtl8152_open() is running we get a register access error that's _not_ -ENODEV and queue up a USB reset. * rtl8152_open() exits before calling napi_enable() due to any reason (including usb_submit_urb() returning an error). In that case: * Since the USB reset is perform in a separate thread asynchronously, it can run at anytime USB device lock is not held - even before rtl8152_open() has exited with an error and caused __dev_open() to clear the __LINK_STATE_START bit. * The rtl8152_pre_reset() will notice that the netif_running() returns true (since __LINK_STATE_START wasn't cleared) so it won't exit early. * rtl8152_pre_reset() will then hang in napi_disable() because napi_enable() was never called. We can fix the race by making sure that the r8152 reset routines don't run at the same time as we're opening the device. Specifically we need the reset routines in their entirety rely on the return value of netif_running(). The only way to reliably depend on that is for them to hold the rntl_lock() mutex for the duration of reset. Grabbing the rntl_lock() mutex for the duration of reset seems like a long time, but reset is not expected to be common and the rtnl_lock() mutex is already held for long durations since the core grabs it around the open/close calls. Fixes: d9962b0d4202 ("r8152: Block future register access if register access fails") Reviewed-by: Grant Grundler <[email protected]> Signed-off-by: Douglas Anderson <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-12-04greybus: gb-beagleplay: Ensure le for values in transportAyush Singh1-4/+5
Ensure that the following values are little-endian: - header->pad (which is used for cport_id) - header->size Fixes: ec558bbfea67 ("greybus: Add BeaglePlay Linux Driver") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Signed-off-by: Ayush Singh <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-12-04greybus: BeaglePlay driver needs CRC_CCITTRandy Dunlap1-0/+1
The gb-beagleplay driver uses crc_ccitt(), so it should select CRC_CCITT to make sure that the function is available. Fixes these build errors: s390-linux-ld: drivers/greybus/gb-beagleplay.o: in function `hdlc_append_tx_u8': gb-beagleplay.c:(.text+0x2c0): undefined reference to `crc_ccitt' s390-linux-ld: drivers/greybus/gb-beagleplay.o: in function `hdlc_rx_frame': gb-beagleplay.c:(.text+0x6a0): undefined reference to `crc_ccitt' Fixes: ec558bbfea67 ("greybus: Add BeaglePlay Linux Driver") Signed-off-by: Randy Dunlap <[email protected]> Cc: Ayush Singh <[email protected]> Cc: Johan Hovold <[email protected]> Cc: Alex Elder <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Ayush Singh <[email protected]> Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-12-04Merge 6.7-rc4 into char-misc-linusGreg Kroah-Hartman143-774/+1444
We need 6.7-rc4 in here as we need to revert one of the debugfs changes that came in that release through the wireless tree. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2023-12-03hwmon: (nzxt-kraken2) Fix error handling path in kraken2_probe()Christophe JAILLET1-2/+2
There is no point in calling hid_hw_stop() if hid_hw_start() has failed. There is no point in calling hid_hw_close() if hid_hw_open() has failed. Update the error handling path accordingly. Fixes: 82e3430dfa8c ("hwmon: add driver for NZXT Kraken X42/X52/X62/X72") Reported-by: Aleksa Savic <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Signed-off-by: Christophe JAILLET <[email protected]> Reviewed-by: Jonas Malaco <[email protected]> Link: https://lore.kernel.org/r/a768e69851a07a1f4e29f270f4e2559063f07343.1701617030.git.christophe.jaillet@wanadoo.fr Signed-off-by: Guenter Roeck <[email protected]>
2023-12-03Merge tag 'firewire-fixes-6.7-rc4' of ↵Linus Torvalds1-7/+4
git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 Pull firewire fix from Takashi Sakamoto: "A single patch to fix long-standing issue of memory leak at failure of device registration for fw_unit. We rarely encounter the issue, but it should be applied to stable releases, since it fixes inappropriate API usage" * tag 'firewire-fixes-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394: firewire: core: fix possible memory leak in create_units()
2023-12-03Merge tag 'vfio-v6.7-rc4' of https://github.com/awilliam/linux-vfioLinus Torvalds3-12/+24
Pull vfio fixes from Alex Williamson: - Fix the lifecycle of a mutex in the pds variant driver such that a reset prior to opening the device won't find it uninitialized. Implement the release path to symmetrically destroy the mutex. Also switch a different lock from spinlock to mutex as the code path has the potential to sleep and doesn't need the spinlock context otherwise (Brett Creeley) - Fix an issue detected via randconfig where KVM tries to symbol_get an undeclared function. The symbol is temporarily declared unconditionally here, which resolves the problem and avoids churn relative to a series pending for the next merge window which resolves some of this symbol ugliness, but also fixes Kconfig dependencies (Sean Christopherson) * tag 'vfio-v6.7-rc4' of https://github.com/awilliam/linux-vfio: vfio: Drop vfio_file_iommu_group() stub to fudge around a KVM wart vfio/pds: Fix possible sleep while in atomic context vfio/pds: Fix mutex lock->magic != lock warning
2023-12-03Merge tag 'for-linus-6.7a-rc4-tag' of ↵Linus Torvalds1-1/+3
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - A fix for the Xen event driver setting the correct return value when experiencing an allocation failure - A fix for allocating space for a struct in the percpu area to not cross page boundaries (this one is for x86, a similar one for Arm was already in the pull request for rc3) * tag 'for-linus-6.7a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/events: fix error code in xen_bind_pirq_msi_to_irq() x86/xen: fix percpu vcpu_info allocation
2023-12-02r8169: fix rtl8125b PAUSE frames blasting when suspendedChunHao Lin1-1/+6
When FIFO reaches near full state, device will issue pause frame. If pause slot is enabled(set to 1), in this time, device will issue pause frame only once. But if pause slot is disabled(set to 0), device will keep sending pause frames until FIFO reaches near empty state. When pause slot is disabled, if there is no one to handle receive packets, device FIFO will reach near full state and keep sending pause frames. That will impact entire local area network. This issue can be reproduced in Chromebox (not Chromebook) in developer mode running a test image (and v5.10 kernel): 1) ping -f $CHROMEBOX (from workstation on same local network) 2) run "powerd_dbus_suspend" from command line on the $CHROMEBOX 3) ping $ROUTER (wait until ping fails from workstation) Takes about ~20-30 seconds after step 2 for the local network to stop working. Fix this issue by enabling pause slot to only send pause frame once when FIFO reaches near full state. Fixes: f1bce4ad2f1c ("r8169: add support for RTL8125") Reported-by: Grant Grundler <[email protected]> Tested-by: Grant Grundler <[email protected]> Cc: [email protected] Signed-off-by: ChunHao Lin <[email protected]> Reviewed-by: Jacob Keller <[email protected]> Reviewed-by: Heiner Kallweit <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-12-01hv_netvsc: rndis_filter needs to select NLSRandy Dunlap1-0/+1
rndis_filter uses utf8s_to_utf16s() which is provided by setting NLS, so select NLS to fix the build error: ERROR: modpost: "utf8s_to_utf16s" [drivers/net/hyperv/hv_netvsc.ko] undefined! Fixes: 1ce09e899d28 ("hyperv: Add support for setting MAC from within guests") Signed-off-by: Randy Dunlap <[email protected]> Cc: Haiyang Zhang <[email protected]> Cc: K. Y. Srinivasan <[email protected]> Cc: Wei Liu <[email protected]> Cc: Dexuan Cui <[email protected]> Reviewed-by: Simon Horman <[email protected]> Tested-by: Simon Horman <[email protected]> # build-tested Reviewed-by: Michael Kelley <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-12-01Merge tag 'md-fixes-20231201-1' of ↵Jens Axboe1-2/+2
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.7 Pull MD fix from Song: "This change fixes issue with raid456 reshape." * tag 'md-fixes-20231201-1' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md/raid6: use valid sector values to determine if an I/O should wait on the reshape
2023-12-01net/tg3: fix race condition in tg3_reset_task()Thinh Tran1-1/+10
When an EEH error is encountered by a PCI adapter, the EEH driver modifies the PCI channel's state as shown below: enum { /* I/O channel is in normal state */ pci_channel_io_normal = (__force pci_channel_state_t) 1, /* I/O to channel is blocked */ pci_channel_io_frozen = (__force pci_channel_state_t) 2, /* PCI card is dead */ pci_channel_io_perm_failure = (__force pci_channel_state_t) 3, }; If the same EEH error then causes the tg3 driver's transmit timeout logic to execute, the tg3_tx_timeout() function schedules a reset task via tg3_reset_task_schedule(), which may cause a race condition between the tg3 and EEH driver as both attempt to recover the HW via a reset action. EEH driver gets error event --> eeh_set_channel_state() and set device to one of error state above scheduler: tg3_reset_task() get returned error from tg3_init_hw() --> dev_close() shuts down the interface tg3_io_slot_reset() and tg3_io_resume() fail to reset/resume the device To resolve this issue, we avoid the race condition by checking the PCI channel state in the tg3_reset_task() function and skip the tg3 driver initiated reset when the PCI channel is not in the normal state. (The driver has no access to tg3 device registers at this point and cannot even complete the reset task successfully without external assistance.) We'll leave the reset procedure to be managed by the EEH driver which calls the tg3_io_error_detected(), tg3_io_slot_reset() and tg3_io_resume() functions as appropriate. Adding the same checking in tg3_dump_state() to avoid dumping all device registers when the PCI channel is not in the normal state. Signed-off-by: Thinh Tran <[email protected]> Tested-by: Venkata Sai Duggi <[email protected]> Reviewed-by: David Christensen <[email protected]> Reviewed-by: Michael Chan <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-12-02Merge tag 'pm-6.7-rc4' of ↵Linus Torvalds6-32/+132
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These fix issues in two cpufreq drivers, in the AMD P-state driver and in the power-capping DTPM framework. Specifics: - Fix the AMD P-state driver's EPP sysfs interface in the cases when the performance governor is in use (Ayush Jain) - Make the ->fast_switch() callback in the AMD P-state driver return the target frequency as expected (Gautham R. Shenoy) - Allow user space to control the range of frequencies to use via scaling_min_freq and scaling_max_freq when AMD P-state driver is in use (Wyes Karny) - Prevent power domains needed for wakeup signaling from being turned off during system suspend on Qualcomm systems and prevent performance states votes from runtime-suspended devices from being lost across a system suspend-resume cycle in qcom-cpufreq-nvmem (Stephan Gerhold) - Fix disabling the 792 Mhz OPP in the imx6q cpufreq driver for the i.MX6ULL types that can run at that frequency (Christoph Niedermaier) - Eliminate unnecessary and harmful conversions to uW from the DTPM (dynamic thermal and power management) framework (Lukasz Luba)" * tag 'pm-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: cpufreq/amd-pstate: Only print supported EPP values for performance governor cpufreq/amd-pstate: Fix scaling_min_freq and scaling_max_freq update powercap: DTPM: Fix unneeded conversions to micro-Watts cpufreq/amd-pstate: Fix the return value of amd_pstate_fast_switch() pmdomain: qcom: rpmpd: Set GENPD_FLAG_ACTIVE_WAKEUP cpufreq: qcom-nvmem: Preserve PM domain votes in system suspend cpufreq: qcom-nvmem: Enable virtual power domain devices cpufreq: imx6q: Don't disable 792 Mhz OPP unnecessarily
2023-12-02Merge tag 'acpi-6.7-rc4' of ↵Linus Torvalds1-9/+5
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fixes from Rafael Wysocki: "This fixes a recently introduced build issue on ARM32 and a NULL pointer dereference in the ACPI backlight driver due to a design issue exposed by a recent change in the ACPI bus type code. Specifics: - Fix a recently introduced build issue on ARM32 platforms caused by an inadvertent header file breakage (Dave Jiang) - Eliminate questionable usage of acpi_driver_data() in the ACPI backlight cooling device code that leads to NULL pointer dereferences after recent ACPI core changes (Hans de Goede)" * tag 'acpi-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: video: Use acpi_video_device for cooling-dev driver data ACPI: Fix ARM32 platforms compile issue introduced by fw_table changes
2023-12-02Merge tag 'iommu-fixes-v6.7-rc3' of ↵Linus Torvalds7-42/+123
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu fixes from Joerg Roedel: - Fix race conditions in device probe path - Handle ERR_PTR() returns in __iommu_domain_alloc() path - Update MAINTAINERS entry for Qualcom IOMMUs - Printk argument fix in device tree specific code - Several Intel VT-d fixes from Lu Baolu: - Do not support enforcing cache coherency for non-empty domains - Avoid devTLB invalidation if iommu is off - Disable PCI ATS in legacy passthrough mode - Support non-PCI devices when clearing context - Fix incorrect cache invalidation for mm notification - Add MTL to quirk list to skip TE disabling - Set variable intel_dirty_ops to static * tag 'iommu-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: iommu: Fix printk arg in of_iommu_get_resv_regions() iommu/vt-d: Set variable intel_dirty_ops to static iommu/vt-d: Fix incorrect cache invalidation for mm notification iommu/vt-d: Add MTL to quirk list to skip TE disabling iommu/vt-d: Make context clearing consistent with context mapping iommu/vt-d: Disable PCI ATS in legacy passthrough mode iommu/vt-d: Omit devTLB invalidation requests when TES=0 iommu/vt-d: Support enforce_cache_coherency only for empty domains iommu: Avoid more races around device probe MAINTAINERS: list all Qualcomm IOMMU drivers in the QUALCOMM IOMMU entry iommu: Flow ERR_PTR out from __iommu_domain_alloc()
2023-12-02Merge tag 'drm-fixes-2023-12-01' of git://anongit.freedesktop.org/drm/drmLinus Torvalds64-301/+514
Pull drm fixes from Dave Airlie: "Weekly fixes, mostly amdgpu fixes with a scattering of nouveau, i915, and a couple of reverts. Hopefully it will quieten down in coming weeks. drm: - Revert unexport of prime helpers for fd/handle conversion dma_resv: - Do not double add fences in dma_resv_add_fence. gpuvm: - Fix GPUVM license identifier. i915: - Mark internal GSC engine with reserved uabi class - Take VGA converters into account in eDP probe - Fix intel_pre_plane_updates() call to ensure workarounds get applied panel: - Revert panel fixes as they require exporting device_is_dependent. nouveau: - fix oversized allocations in new vm path - fix zero-length array - remove a stray lock nt36523: - Fix error check for nt36523. amdgpu: - DMUB fix - DCN 3.5 fixes - XGMI fix - DCN 3.2 fixes - Vangogh suspend fix - NBIO 7.9 fix - GFX11 golden register fix - Backlight fix - NBIO 7.11 fix - IB test overflow fix - DCN 3.1.4 fixes - fix a runtime pm ref count - Retimer fix - ABM fix - DCN 3.1.5 fix - Fix AGP addressing - Fix possible memory leak in SMU error path - Make sure PME is enabled in D3 - Fix possible NULL pointer dereference in debugfs - EEPROM fix - GC 9.4.3 fix amdkfd: - IP version check fix - Fix memory leak in pqm_uninit()" * tag 'drm-fixes-2023-12-01' of git://anongit.freedesktop.org/drm/drm: (53 commits) Revert "drm/prime: Unexport helpers for fd/handle conversion" drm/amdgpu: Use another offset for GC 9.4.3 remap drm/amd/display: Fix some HostVM parameters in DML drm/amdkfd: Free gang_ctx_bo and wptr_bo in pqm_uninit drm/amdgpu: Update EEPROM I2C address for smu v13_0_0 drm/amd/display: Allow DTBCLK disable for DCN35 drm/amdgpu: Fix cat debugfs amdgpu_regs_didt causes kernel null pointer drm/amd: Enable PCIe PME from D3 drm/amd/pm: fix a memleak in aldebaran_tables_init drm/amdgpu: fix AGP addressing when GART is not at 0 drm/amd/display: update dcn315 lpddr pstate latency drm/amd/display: fix ABM disablement drm/amd/display: Fix black screen on video playback with embedded panel drm/amd/display: Fix conversions between bytes and KB drm/amdkfd: Use common function for IP version check drm/amd/display: Remove config update drm/amd/display: Update DCN35 clock table policy drm/amd/display: force toggle rate wa for first link training for a retimer drm/amdgpu: correct the amdgpu runtime dereference usage count drm/amd/display: Update min Z8 residency time to 2100 for DCN314 ...
2023-12-01md/raid6: use valid sector values to determine if an I/O should wait on the ↵David Jeffery1-2/+2
reshape During a reshape or a RAID6 array such as expanding by adding an additional disk, I/Os to the region of the array which have not yet been reshaped can stall indefinitely. This is from errors in the stripe_ahead_of_reshape function causing md to think the I/O is to a region in the actively undergoing the reshape. stripe_ahead_of_reshape fails to account for the q disk having a sector value of 0. By not excluding the q disk from the for loop, raid6 will always generate a min_sector value of 0, causing a return value which stalls. The function's max_sector calculation also uses min() when it should use max(), causing the max_sector value to always be 0. During a backwards rebuild this can cause the opposite problem where it allows I/O to advance when it should wait. Fixing these errors will allow safe I/O to advance in a timely manner and delay only I/O which is unsafe due to stripes in the middle of undergoing the reshape. Fixes: 486f60558607 ("md/raid5: Check all disks in a stripe_head for reshape progress") Cc: [email protected] # v6.0+ Signed-off-by: David Jeffery <[email protected]> Tested-by: Laurence Oberman <[email protected]> Signed-off-by: Song Liu <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-12-02Merge tag 'block-6.7-2023-12-01' of git://git.kernel.dk/linuxLinus Torvalds2-7/+29
Pull block fixes from Jens Axboe: - NVMe pull request via Keith: - Invalid namespace identification error handling (Marizio Ewan, Keith) - Fabrics keep-alive tuning (Mark) - Fix for a bad error check regression in bcache (Markus) - Fix for a performance regression with O_DIRECT (Ming) - Fix for a flush related deadlock (Ming) - Make the read-only warn on per-partition (Yu) * tag 'block-6.7-2023-12-01' of git://git.kernel.dk/linux: nvme-core: check for too small lba shift blk-mq: don't count completed flush data request as inflight in case of quiesce block: Document the role of the two attribute groups block: warn once for each partition in bio_check_ro() block: move .bd_inode into 1st cacheline of block_device nvme: check for valid nvme_identify_ns() before using it nvme-core: fix a memory leak in nvme_ns_info_from_identify() nvme: fine-tune sending of first keep-alive bcache: revert replacing IS_ERR_OR_NULL with IS_ERR
2023-12-02Merge tag 'dm-6.7/dm-fixes-2' of ↵Linus Torvalds4-10/+8
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Fix DM verity target's FEC support to always initialize IO before it frees it. Also fix alignment of struct dm_verity_fec_io within the per-bio-data - Fix DM verity target to not FEC failed readahead IO - Update DM flakey target to use MAX_ORDER rather than MAX_ORDER - 1 * tag 'dm-6.7/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm-flakey: start allocating with MAX_ORDER dm-verity: align struct dm_verity_fec_io properly dm verity: don't perform FEC for failed readahead IO dm verity: initialize fec io before freeing it
2023-12-02Merge tag 'scsi-fixes' of ↵Linus Torvalds4-6/+31
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three small fixes, one in drivers. The core changes are to the internal representation of flags in scsi_devices which removes space wasting bools in favour of single bit flags and to add a flag to force a runtime resume which is used by ATA devices" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: Fix system start for ATA devices scsi: Change SCSI device boolean fields to single bit flags scsi: ufs: core: Clear cmd if abort succeeds in MCQ mode
2023-12-02Merge tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefsLinus Torvalds7-91/+91
Pull more bcachefs bugfixes from Kent Overstreet: - bcache & bcachefs were broken with CFI enabled; patch for closures to fix type punning - mark erasure coding as extra-experimental; there are incompatible disk space accounting changes coming for erasure coding, and I'm still seeing checksum errors in some tests - several fixes for durability-related issues (durability is a device specific setting where we can tell bcachefs that data on a given device should be counted as replicated x times) - a fix for a rare livelock when a btree node merge then updates a parent node that is almost full - fix a race in the device removal path, where dropping a pointer in a btree node to a device would be clobbered by an in flight btree write updating the btree node key on completion - fix one SRCU lock hold time warning in the btree gc code - ther's still a bunch more of these to fix - fix a rare race where we'd start copygc before initializing the "are we rw" percpu refcount; copygc would think we were already ro and die immediately * tag 'bcachefs-2023-11-29' of https://evilpiepirate.org/git/bcachefs: (23 commits) bcachefs: Extra kthread_should_stop() calls for copygc bcachefs: Convert gc_alloc_start() to for_each_btree_key2() bcachefs: Fix race between btree writes and metadata drop bcachefs: move journal seq assertion bcachefs: -EROFS doesn't count as move_extent_start_fail bcachefs: trace_move_extent_start_fail() now includes errcode bcachefs: Fix split_race livelock bcachefs: Fix bucket data type for stripe buckets bcachefs: Add missing validation for jset_entry_data_usage bcachefs: Fix zstd compress workspace size bcachefs: bpos is misaligned on big endian bcachefs: Fix ec + durability calculation bcachefs: Data update path won't accidentaly grow replicas bcachefs: deallocate_extra_replicas() bcachefs: Proper refcounting for journal_keys bcachefs: preserve device path as device name bcachefs: Fix an endianness conversion bcachefs: Start gc, copygc, rebalance threads after initing writes ref bcachefs: Don't stop copygc thread on device resize bcachefs: Make sure bch2_move_ratelimit() also waits for move_ops ...
2023-12-01Merge branch 'powercap'Rafael J. Wysocki2-13/+4
Merge a power capping fix for 6.7-rc4 which eliminates unnecessary and harmful conversions to uW from the DTPM (dynamic thermal and power management) framework (Lukasz Luba). * powercap: powercap: DTPM: Fix unneeded conversions to micro-Watts
2023-12-01nvme-core: check for too small lba shiftKeith Busch1-2/+3
The block layer doesn't support logical block sizes smaller than 512 bytes. The nvme spec doesn't support that small either, but the driver isn't checking to make sure the device responded with usable data. Failing to catch this will result in a kernel bug, either from a division by zero when stacking, or a zero length bio. Reviewed-by: Jens Axboe <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-12-01pds_vdpa: set features orderShannon Nelson1-2/+1
Fix up the order that the device and negotiated features are checked to get a more reliable difference when things get changed. Signed-off-by: Shannon Nelson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]>
2023-12-01pds_vdpa: clear config callback when status goes to 0Shannon Nelson1-1/+3
If the client driver is setting status to 0, something is getting shutdown and possibly removed. Make sure we clear the config_cb so that it doesn't end up crashing when trying to call a bogus callback. Signed-off-by: Shannon Nelson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]>
2023-12-01pds_vdpa: fix up format-truncation complaintShannon Nelson1-1/+1
Our friendly kernel test robot has recently been pointing out some format-truncation issues. Here's a fix for one of them. Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Shannon Nelson <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]> Acked-by: Jason Wang <[email protected]>
2023-12-01vdpa/mlx5: preserve CVQ vringh indexSteve Sistare1-1/+6
mlx5_vdpa does not preserve userland's view of vring base for the control queue in the following sequence: ioctl VHOST_SET_VRING_BASE ioctl VHOST_VDPA_SET_STATUS VIRTIO_CONFIG_S_DRIVER_OK mlx5_vdpa_set_status() setup_cvq_vring() vringh_init_iotlb() vringh_init_kern() vrh->last_avail_idx = 0; ioctl VHOST_GET_VRING_BASE To fix, restore the value of cvq->vring.last_avail_idx after calling vringh_init_iotlb. Fixes: 5262912ef3cf ("vdpa/mlx5: Add support for control VQ and MAC setting") Signed-off-by: Steve Sistare <[email protected]> Acked-by: Eugenio Pérez <[email protected]> Acked-by: Jason Wang <[email protected]> Message-Id: <[email protected]> Signed-off-by: Michael S. Tsirkin <[email protected]>
2023-12-01octeontx2-af: Check return value of nix_get_nixlf before using nixlfSubbaraya Sundeep1-1/+7
If a NIXLF is not attached to a PF/VF device then nix_get_nixlf function fails and returns proper error code. But npc_get_default_entry_action does not check it and uses garbage value in subsequent calls. Fix this by cheking the return value of nix_get_nixlf. Fixes: 967db3529eca ("octeontx2-af: add support for multicast/promisc packet replication feature") Signed-off-by: Subbaraya Sundeep <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-12-01octeontx2-pf: Add missing mutex lock in otx2_get_pauseparamSubbaraya Sundeep1-1/+5
All the mailbox messages sent to AF needs to be guarded by mutex lock. Add the missing lock in otx2_get_pauseparam function. Fixes: 75f36270990c ("octeontx2-pf: Support to enable/disable pause frames via ethtool") Signed-off-by: Subbaraya Sundeep <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-12-01gpiolib: sysfs: Fix error handling on failed exportBoerge Struempfel1-6/+9
If gpio_set_transitory() fails, we should free the GPIO again. Most notably, the flag FLAG_REQUESTED has previously been set in gpiod_request_commit(), and should be reset on failure. To my knowledge, this does not affect any current users, since the gpio_set_transitory() mainly returns 0 and -ENOTSUPP, which is converted to 0. However the gpio_set_transitory() function calles the .set_config() function of the corresponding GPIO chip and there are some GPIO drivers in which some (unlikely) branches return other values like -EPROBE_DEFER, and -EINVAL. In these cases, the above mentioned FLAG_REQUESTED would not be reset, which results in the pin being blocked until the next reboot. Fixes: e10f72bf4b3e ("gpio: gpiolib: Generalise state persistence beyond sleep") Signed-off-by: Boerge Struempfel <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>