aboutsummaryrefslogtreecommitdiff
path: root/drivers
AgeCommit message (Collapse)AuthorFilesLines
2023-11-21Merge tag 'thunderbolt-for-v6.7-rc3' of ↵Greg Kroah-Hartman976-25445/+23566
git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt into usb-linus Mika writes: thunderbolt: Fixes for v6.7-rc3 This includes following USB4/Thunderbolt fixes for v6.7-rc3: - Fix a lane bonding issue on ASMedia USB4 device - Send uevents when link is switched to asymmetric or symmetric - Only add device router DP IN adapters to the head of resource list to avoid issues during system resume. All these have been in linux-next with no reported issues. * tag 'thunderbolt-for-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt: (1451 commits) thunderbolt: Only add device router DP IN to the head of the DP resource list thunderbolt: Send uevent after asymmetric/symmetric switch thunderbolt: Set lane bonding bit only for downstream port
2023-11-20s390/dasd: protect device queue against concurrent accessJan Höppner1-11/+13
In dasd_profile_start() the amount of requests on the device queue are counted. The access to the device queue is unprotected against concurrent access. With a lot of parallel I/O, especially with alias devices enabled, the device queue can change while dasd_profile_start() is accessing the queue. In the worst case this leads to a kernel panic due to incorrect pointer accesses. Fix this by taking the device lock before accessing the queue and counting the requests. Additionally the check for a valid profile data pointer can be done earlier to avoid unnecessary locking in a hot path. Cc: <[email protected]> Fixes: 4fa52aa7a82f ("[S390] dasd: add enhanced DASD statistics interface") Reviewed-by: Stefan Haberland <[email protected]> Signed-off-by: Jan Höppner <[email protected]> Signed-off-by: Stefan Haberland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20s390/dasd: resolve spelling mistakeMuhammad Muzammil1-1/+1
resolve typing mistake from pimary to primary Signed-off-by: Muhammad Muzammil <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Stefan Haberland <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20bpf, netkit: Add indirect call wrapper for fetching peer devDaniel Borkmann1-1/+2
ndo_get_peer_dev is used in tcx BPF fast path, therefore make use of indirect call wrapper and therefore optimize the bpf_redirect_peer() internal handling a bit. Add a small skb_get_peer_dev() wrapper which utilizes the INDIRECT_CALL_1() macro instead of open coding. Future work could potentially add a peer pointer directly into struct net_device in future and convert veth and netkit over to use it so that eventually ndo_get_peer_dev can be removed. Co-developed-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Nikolay Aleksandrov <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Stanislav Fomichev <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-11-20veth: Use tstats per-CPU traffic countersPeilin Ye1-19/+11
Currently veth devices use the lstats per-CPU traffic counters, which only cover TX traffic. veth_get_stats64() actually populates RX stats of a veth device from its peer's TX counters, based on the assumption that a veth device can _only_ receive packets from its peer, which is no longer true: For example, recent CNIs (like Cilium) can use the bpf_redirect_peer() BPF helper to redirect traffic from NIC's tc ingress to veth's tc ingress (in a different netns), skipping veth's peer device. Unfortunately, this kind of traffic isn't currently accounted for in veth's RX stats. In preparation for the fix, use tstats (instead of lstats) to maintain both RX and TX counters for each veth device. We'll use RX counters for bpf_redirect_peer() traffic, and keep using TX counters for the usual "peer-to-peer" traffic. In veth_get_stats64(), calculate RX stats by _adding_ RX count to peer's TX count, in order to cover both kinds of traffic. veth_stats_rx() might need a name change (perhaps to "veth_stats_xdp()") for less confusion, but let's leave it to another patch to keep the fix minimal. Signed-off-by: Peilin Ye <[email protected]> Co-developed-by: Daniel Borkmann <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-11-20netkit: Add tstats per-CPU traffic countersDaniel Borkmann1-1/+18
Add dev->tstats traffic accounting to netkit. The latter contains per-CPU RX and TX counters. The dev's TX counters are bumped upon pass/unspec as well as redirect verdicts, in other words, on everything except for drops. The dev's RX counters are bumped upon successful __netif_rx(), as well as from skb_do_redirect() (not part of this commit here). Using dev->lstats with having just a single packets/bytes counter and inferring one another's RX counters from the peer dev's lstats is not possible given skb_do_redirect() can also bump the device's stats. Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-11-20net: Move {l,t,d}stats allocation to core and convert veth & vrfDaniel Borkmann2-25/+5
Move {l,t,d}stats allocation to the core and let netdevs pick the stats type they need. That way the driver doesn't have to bother with error handling (allocation failure checking, making sure free happens in the right spot, etc) - all happening in the core. Co-developed-by: Jakub Kicinski <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Cc: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-11-20net, vrf: Move dstats structure to coreDaniel Borkmann1-17/+7
Just move struct pcpu_dstats out of the vrf into the core, and streamline the field names slightly, so they better align with the {t,l}stats ones. No functional change otherwise. A conversion of the u64s to u64_stats_t could be done at a separate point in future. This move is needed as we are moving the {t,l,d}stats allocation/freeing to the core. Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Nikolay Aleksandrov <[email protected]> Cc: Jakub Kicinski <[email protected]> Cc: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-11-20block/null_blk: Fix double blk_mq_start_request() warningChengming Zhou1-12/+13
When CONFIG_BLK_DEV_NULL_BLK_FAULT_INJECTION is enabled, null_queue_rq() would return BLK_STS_RESOURCE or BLK_STS_DEV_RESOURCE for the request, which has been marked as MQ_RQ_IN_FLIGHT by blk_mq_start_request(). Then null_queue_rqs() put these requests in the rqlist, return back to the block layer core, which would try to queue them individually again, so the warning in blk_mq_start_request() triggered. Fix it by splitting the null_queue_rq() into two parts: the first is the preparation of request, the second is the handling of request. We put the blk_mq_start_request() after the preparation part, which may fail and return back to the block layer core. The throttling also belongs to the preparation part, so move it before blk_mq_start_request(). And change the return type of null_handle_cmd() to void, since it always return BLK_STS_OK now. Reported-by: <[email protected]> Closes: https://lore.kernel.org/all/[email protected]/ Fixes: d78bfa1346ab ("block/null_blk: add queue_rqs() support") Suggested-by: Bart Van Assche <[email protected]> Signed-off-by: Chengming Zhou <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20nvmet-tcp: always initialize tls_handshake_tmo_workHannes Reinecke1-1/+3
The TLS handshake timeout work item should always be initialized to avoid a crash when cancelling the workqueue. Fixes: 675b453e0241 ("nvmet-tcp: enable TLS handshake upcall") Suggested-by: Maurizio Lombardi <[email protected]> Signed-off-by: Hannes Reinecke <[email protected]> Tested-by: Shin'ichiro Kawasaki <[email protected]> Tested-by: Yi Zhang <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-11-20nvmet: nul-terminate the NQNs passed in the connect commandChristoph Hellwig1-0/+4
The host and subsystem NQNs are passed in the connect command payload and interpreted as nul-terminated strings. Ensure they actually are nul-terminated before using them. Fixes: a07b4970f464 "nvmet: add a generic NVMe target") Reported-by: Alon Zahavi <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-11-20nvme: blank out authentication fabrics options if not configuredHannes Reinecke1-0/+2
If the config option NVME_HOST_AUTH is not selected we should not accept the corresponding fabrics options. This allows userspace to detect if NVMe authentication has been enabled for the kernel. Cc: Shin'ichiro Kawasaki <[email protected]> Fixes: f50fff73d620 ("nvme: implement In-Band authentication") Signed-off-by: Hannes Reinecke <[email protected]> Tested-by: Shin'ichiro Kawasaki <[email protected]> Reviewed-by: Daniel Wagner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-11-20nvme: catch errors from nvme_configure_metadata()Hannes Reinecke1-6/+13
nvme_configure_metadata() is issuing I/O, so we might incur an I/O error which will cause the connection to be reset. But in that case any further probing will race with reset and cause UAF errors. So return a status from nvme_configure_metadata() and abort probing if there was an I/O error. Signed-off-by: Hannes Reinecke <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-11-20nvme-tcp: only evaluate 'tls' option if TLS is selectedHannes Reinecke1-1/+1
We only need to evaluate the 'tls' connect option if TLS is enabled; otherwise we might be getting a link error. Fixes: 706add13676d ("nvme: keyring: fix conditional compilation") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/r/[email protected]/ Signed-off-by: Hannes Reinecke <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-11-20nvme-auth: set explanation code for failure2 msgsMark O'Donovan1-0/+2
Some error cases were not setting an auth-failure-reason-code-explanation. This means an AUTH_Failure2 message will be sent with an explanation value of 0 which is a reserved value. Signed-off-by: Mark O'Donovan <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-11-20nvme-auth: unlock mutex in one place onlyMark O'Donovan1-2/+1
Signed-off-by: Mark O'Donovan <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2023-11-20nbd: fix null-ptr-dereference while accessing 'nbd->config'Li Nan1-1/+17
Memory reordering may occur in nbd_genl_connect(), causing config_refs to be set to 1 while nbd->config is still empty. Opening nbd at this time will cause null-ptr-dereference. T1 T2 nbd_open nbd_get_config_unlocked nbd_genl_connect nbd_alloc_and_init_config //memory reordered refcount_set(&nbd->config_refs, 1) // 2 nbd->config ->null point nbd->config = config // 1 Fix it by adding smp barrier to guarantee the execution sequence. Signed-off-by: Li Nan <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20nbd: factor out a helper to get nbd_config without holding 'config_lock'Li Nan1-8/+19
There are no functional changes, just to make code cleaner and prepare to fix null-ptr-dereference while accessing 'nbd->config'. Signed-off-by: Li Nan <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20nbd: fold nbd config initialization into nbd_alloc_config()Li Nan1-22/+19
There are no functional changes, make the code cleaner and prepare to fix null-ptr-dereference while accessing 'nbd->config'. Signed-off-by: Li Nan <[email protected]> Reviewed-by: Josef Bacik <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20Merge tag 'md-fixes-20231120' of ↵Jens Axboe1-1/+2
https://git.kernel.org/pub/scm/linux/kernel/git/song/md into block-6.7 Pull MD fix from Song. * tag 'md-fixes-20231120' of https://git.kernel.org/pub/scm/linux/kernel/git/song/md: md: fix bi_status reporting in md_end_clone_io
2023-11-20ACPI: resource: Skip IRQ override on ASUS ExpertBook B1402CVAHans de Goede1-0/+7
Like various other ASUS ExpertBook-s, the ASUS ExpertBook B1402CVA has an ACPI DSDT table that describes IRQ 1 as ActiveLow while the kernel overrides it to EdgeHigh. This prevents the keyboard from working. To fix this issue, add this laptop to the skip_override_table so that the kernel does not override IRQ 1. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218114 Cc: All applicable <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-11-20ACPI: video: Use acpi_device_fix_up_power_children()Hans de Goede1-1/+1
Commit 89c290ea7589 ("ACPI: video: Put ACPI video and its child devices into D0 on boot") introduced calling acpi_device_fix_up_power_extended() on the video card for which the ACPI video bus is the companion device. This unnecessarily touches the power-state of the GPU itself, while the issue it tries to address only requires calling _PS0 on the child devices. Touching the power-state of the GPU itself is causing suspend / resume issues on e.g. a Lenovo ThinkPad W530. Instead use acpi_device_fix_up_power_children(), which only touches the child devices, to fix this. Fixes: 89c290ea7589 ("ACPI: video: Put ACPI video and its child devices into D0 on boot") Reported-by: Owen T. Heisler <[email protected]> Closes: https://lore.kernel.org/regressions/[email protected]/ Closes: https://gitlab.freedesktop.org/drm/nouveau/-/issues/273 Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218124 Tested-by: Kai-Heng Feng <[email protected]> Tested-by: Owen T. Heisler <[email protected]> Signed-off-by: Hans de Goede <[email protected]> Cc: 6.6+ <[email protected]> # 6.6+ Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-11-20ACPI: PM: Add acpi_device_fix_up_power_children() functionHans de Goede1-0/+13
In some cases it is necessary to fix-up the power-state of an ACPI device's children without touching the ACPI device itself add a new acpi_device_fix_up_power_children() function for this. Signed-off-by: Hans de Goede <[email protected]> Cc: 6.6+ <[email protected]> # 6.6+ Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-11-20ACPI: processor_idle: use raw_safe_halt() in acpi_idle_play_dead()David Woodhouse1-1/+1
Xen HVM guests were observed taking triple-faults when attempting to online a previously offlined vCPU. Investigation showed that the fault was coming from a failing call to lockdep_assert_irqs_disabled(), in load_current_idt() which was too early in the CPU bringup to actually catch the exception and report the failure cleanly. This was a false positive, caused by acpi_idle_play_dead() setting the per-cpu hardirqs_enabled flag by calling safe_halt(). Switch it to use raw_safe_halt() instead, which doesn't do so. Signed-off-by: David Woodhouse <[email protected]> Acked-by: Peter Zijlstra (Intel) <[email protected]> Cc: 6.6+ <[email protected]> # 6.6+ Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-11-20bcache: avoid NULL checking to c->root in run_cache_set()Coly Li1-1/+1
In run_cache_set() after c->root returned from bch_btree_node_get(), it is checked by IS_ERR_OR_NULL(). Indeed it is unncessary to check NULL because bch_btree_node_get() will not return NULL pointer to caller. This patch replaces IS_ERR_OR_NULL() by IS_ERR() for the above reason. Signed-off-by: Coly Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20bcache: add code comments for bch_btree_node_get() and __bch_btree_node_alloc()Coly Li1-0/+7
This patch adds code comments to bch_btree_node_get() and __bch_btree_node_alloc() that NULL pointer will not be returned and it is unnecessary to check NULL pointer by the callers of these routines. Signed-off-by: Coly Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20bcache: replace a mistaken IS_ERR() by IS_ERR_OR_NULL() in btree_gc_coalesce()Coly Li1-1/+1
Commit 028ddcac477b ("bcache: Remove unnecessary NULL point check in node allocations") do the following change inside btree_gc_coalesce(), 31 @@ -1340,7 +1340,7 @@ static int btree_gc_coalesce( 32 memset(new_nodes, 0, sizeof(new_nodes)); 33 closure_init_stack(&cl); 34 35 - while (nodes < GC_MERGE_NODES && !IS_ERR_OR_NULL(r[nodes].b)) 36 + while (nodes < GC_MERGE_NODES && !IS_ERR(r[nodes].b)) 37 keys += r[nodes++].keys; 38 39 blocks = btree_default_blocks(b->c) * 2 / 3; At line 35 the original r[nodes].b is not always allocatored from __bch_btree_node_alloc(), and possibly initialized as NULL pointer by caller of btree_gc_coalesce(). Therefore the change at line 36 is not correct. This patch replaces the mistaken IS_ERR() by IS_ERR_OR_NULL() to avoid potential issue. Fixes: 028ddcac477b ("bcache: Remove unnecessary NULL point check in node allocations") Cc: <[email protected]> # 6.5+ Cc: Zheng Wang <[email protected]> Signed-off-by: Coly Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20bcache: fixup multi-threaded bch_sectors_dirty_init() wake-up raceMingzhe Zou1-1/+2
We get a kernel crash about "unable to handle kernel paging request": ```dmesg [368033.032005] BUG: unable to handle kernel paging request at ffffffffad9ae4b5 [368033.032007] PGD fc3a0d067 P4D fc3a0d067 PUD fc3a0e063 PMD 8000000fc38000e1 [368033.032012] Oops: 0003 [#1] SMP PTI [368033.032015] CPU: 23 PID: 55090 Comm: bch_dirtcnt[0] Kdump: loaded Tainted: G OE --------- - - 4.18.0-147.5.1.es8_24.x86_64 #1 [368033.032017] Hardware name: Tsinghua Tongfang THTF Chaoqiang Server/072T6D, BIOS 2.4.3 01/17/2017 [368033.032027] RIP: 0010:native_queued_spin_lock_slowpath+0x183/0x1d0 [368033.032029] Code: 8b 02 48 85 c0 74 f6 48 89 c1 eb d0 c1 e9 12 83 e0 03 83 e9 01 48 c1 e0 05 48 63 c9 48 05 c0 3d 02 00 48 03 04 cd 60 68 93 ad <48> 89 10 8b 42 08 85 c0 75 09 f3 90 8b 42 08 85 c0 74 f7 48 8b 02 [368033.032031] RSP: 0018:ffffbb48852abe00 EFLAGS: 00010082 [368033.032032] RAX: ffffffffad9ae4b5 RBX: 0000000000000246 RCX: 0000000000003bf3 [368033.032033] RDX: ffff97b0ff8e3dc0 RSI: 0000000000600000 RDI: ffffbb4884743c68 [368033.032034] RBP: 0000000000000001 R08: 0000000000000000 R09: 000007ffffffffff [368033.032035] R10: ffffbb486bb01000 R11: 0000000000000001 R12: ffffffffc068da70 [368033.032036] R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000000000 [368033.032038] FS: 0000000000000000(0000) GS:ffff97b0ff8c0000(0000) knlGS:0000000000000000 [368033.032039] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [368033.032040] CR2: ffffffffad9ae4b5 CR3: 0000000fc3a0a002 CR4: 00000000003626e0 [368033.032042] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [368033.032043] bcache: bch_cached_dev_attach() Caching rbd479 as bcache462 on set 8cff3c36-4a76-4242-afaa-7630206bc70b [368033.032045] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [368033.032046] Call Trace: [368033.032054] _raw_spin_lock_irqsave+0x32/0x40 [368033.032061] __wake_up_common_lock+0x63/0xc0 [368033.032073] ? bch_ptr_invalid+0x10/0x10 [bcache] [368033.033502] bch_dirty_init_thread+0x14c/0x160 [bcache] [368033.033511] ? read_dirty_submit+0x60/0x60 [bcache] [368033.033516] kthread+0x112/0x130 [368033.033520] ? kthread_flush_work_fn+0x10/0x10 [368033.034505] ret_from_fork+0x35/0x40 ``` The crash occurred when call wake_up(&state->wait), and then we want to look at the value in the state. However, bch_sectors_dirty_init() is not found in the stack of any task. Since state is allocated on the stack, we guess that bch_sectors_dirty_init() has exited, causing bch_dirty_init_thread() to be unable to handle kernel paging request. In order to verify this idea, we added some printing information during wake_up(&state->wait). We find that "wake up" is printed twice, however we only expect the last thread to wake up once. ```dmesg [ 994.641004] alcache: bch_dirty_init_thread() wake up [ 994.641018] alcache: bch_dirty_init_thread() wake up [ 994.641523] alcache: bch_sectors_dirty_init() init exit ``` There is a race. If bch_sectors_dirty_init() exits after the first wake up, the second wake up will trigger this bug("unable to handle kernel paging request"). Proceed as follows: bch_sectors_dirty_init kthread_run ==============> bch_dirty_init_thread(bch_dirtcnt[0]) ... ... atomic_inc(&state.started) ... ... ... atomic_read(&state.enough) ... ... atomic_set(&state->enough, 1) kthread_run ======================================================> bch_dirty_init_thread(bch_dirtcnt[1]) ... atomic_dec_and_test(&state->started) ... atomic_inc(&state.started) ... ... ... wake_up(&state->wait) ... atomic_read(&state.enough) atomic_dec_and_test(&state->started) ... ... wait_event(state.wait, atomic_read(&state.started) == 0) ... return ... wake_up(&state->wait) We believe it is very common to wake up twice if there is no dirty, but crash is an extremely low probability event. It's hard for us to reproduce this issue. We attached and detached continuously for a week, with a total of more than one million attaches and only one crash. Putting atomic_inc(&state.started) before kthread_run() can avoid waking up twice. Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded") Signed-off-by: Mingzhe Zou <[email protected]> Cc: <[email protected]> Signed-off-by: Coly Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20bcache: fixup lock c->root errorMingzhe Zou1-3/+11
We had a problem with io hung because it was waiting for c->root to release the lock. crash> cache_set.root -l cache_set.list ffffa03fde4c0050 root = 0xffff802ef454c800 crash> btree -o 0xffff802ef454c800 | grep rw_semaphore [ffff802ef454c858] struct rw_semaphore lock; crash> struct rw_semaphore ffff802ef454c858 struct rw_semaphore { count = { counter = -4294967297 }, wait_list = { next = 0xffff00006786fc28, prev = 0xffff00005d0efac8 }, wait_lock = { raw_lock = { { val = { counter = 0 }, { locked = 0 '\000', pending = 0 '\000' }, { locked_pending = 0, tail = 0 } } } }, osq = { tail = { counter = 0 } }, owner = 0xffffa03fdc586603 } The "counter = -4294967297" means that lock count is -1 and a write lock is being attempted. Then, we found that there is a btree with a counter of 1 in btree_cache_freeable. crash> cache_set -l cache_set.list ffffa03fde4c0050 -o|grep btree_cache [ffffa03fde4c1140] struct list_head btree_cache; [ffffa03fde4c1150] struct list_head btree_cache_freeable; [ffffa03fde4c1160] struct list_head btree_cache_freed; [ffffa03fde4c1170] unsigned int btree_cache_used; [ffffa03fde4c1178] wait_queue_head_t btree_cache_wait; [ffffa03fde4c1190] struct task_struct *btree_cache_alloc_lock; crash> list -H ffffa03fde4c1140|wc -l 973 crash> list -H ffffa03fde4c1150|wc -l 1123 crash> cache_set.btree_cache_used -l cache_set.list ffffa03fde4c0050 btree_cache_used = 2097 crash> list -s btree -l btree.list -H ffffa03fde4c1140|grep -E -A2 "^ lock = {" > btree_cache.txt crash> list -s btree -l btree.list -H ffffa03fde4c1150|grep -E -A2 "^ lock = {" > btree_cache_freeable.txt [root@node-3 127.0.0.1-2023-08-04-16:40:28]# pwd /var/crash/127.0.0.1-2023-08-04-16:40:28 [root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache.txt|grep counter|grep -v "counter = 0" [root@node-3 127.0.0.1-2023-08-04-16:40:28]# cat btree_cache_freeable.txt|grep counter|grep -v "counter = 0" counter = 1 We found that this is a bug in bch_sectors_dirty_init() when locking c->root: (1). Thread X has locked c->root(A) write. (2). Thread Y failed to lock c->root(A), waiting for the lock(c->root A). (3). Thread X bch_btree_set_root() changes c->root from A to B. (4). Thread X releases the lock(c->root A). (5). Thread Y successfully locks c->root(A). (6). Thread Y releases the lock(c->root B). down_write locked ---(1)----------------------┐ | | | down_read waiting ---(2)----┐ | | | ┌-------------┐ ┌-------------┐ bch_btree_set_root ===(3)========>> | c->root A | | c->root B | | | └-------------┘ └-------------┘ up_write ---(4)---------------------┘ | | | | | down_read locked ---(5)-----------┘ | | | up_read ---(6)-----------------------------┘ Since c->root may change, the correct steps to lock c->root should be the same as bch_root_usage(), compare after locking. static unsigned int bch_root_usage(struct cache_set *c) { unsigned int bytes = 0; struct bkey *k; struct btree *b; struct btree_iter iter; goto lock_root; do { rw_unlock(false, b); lock_root: b = c->root; rw_lock(false, b, b->level); } while (b != c->root); for_each_key_filter(&b->keys, k, &iter, bch_ptr_bad) bytes += bkey_bytes(k); rw_unlock(false, b); return (bytes * 100) / btree_bytes(c); } Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded") Signed-off-by: Mingzhe Zou <[email protected]> Cc: <[email protected]> Signed-off-by: Coly Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20bcache: fixup init dirty data errorsMingzhe Zou1-1/+4
We found that after long run, the dirty_data of the bcache device will have errors. This error cannot be eliminated unless re-register. We also found that reattach after detach, this error can accumulate. In bch_sectors_dirty_init(), all inode <= d->id keys will be recounted again. This is wrong, we only need to count the keys of the current device. Fixes: b144e45fc576 ("bcache: make bch_sectors_dirty_init() to be multithreaded") Signed-off-by: Mingzhe Zou <[email protected]> Cc: <[email protected]> Signed-off-by: Coly Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20bcache: prevent potential division by zero errorRand Deeb1-1/+1
In SHOW(), the variable 'n' is of type 'size_t.' While there is a conditional check to verify that 'n' is not equal to zero before executing the 'do_div' macro, concerns arise regarding potential division by zero error in 64-bit environments. The concern arises when 'n' is 64 bits in size, greater than zero, and the lower 32 bits of it are zeros. In such cases, the conditional check passes because 'n' is non-zero, but the 'do_div' macro casts 'n' to 'uint32_t,' effectively truncating it to its lower 32 bits. Consequently, the 'n' value becomes zero. To fix this potential division by zero error and ensure precise division handling, this commit replaces the 'do_div' macro with div64_u64(). div64_u64() is designed to work with 64-bit operands, guaranteeing that division is performed correctly. This change enhances the robustness of the code, ensuring that division operations yield accurate results in all scenarios, eliminating the possibility of division by zero, and improving compatibility across different 64-bit environments. Found by Linux Verification Center (linuxtesting.org) with SVACE. Signed-off-by: Rand Deeb <[email protected]> Cc: <[email protected]> Signed-off-by: Coly Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20bcache: remove redundant assignment to variable cur_idxColin Ian King1-1/+1
Variable cur_idx is being initialized with a value that is never read, it is being re-assigned later in a while-loop. Remove the redundant assignment. Cleans up clang scan build warning: drivers/md/bcache/writeback.c:916:2: warning: Value stored to 'cur_idx' is never read [deadcode.DeadStores] Signed-off-by: Colin Ian King <[email protected]> Reviewed-by: Coly Li <[email protected]> Signed-off-by: Coly Li <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20bcache: check return value from btree_node_alloc_replacement()Coly Li1-0/+2
In btree_gc_rewrite_node(), pointer 'n' is not checked after it returns from btree_gc_rewrite_node(). There is potential possibility that 'n' is a non NULL ERR_PTR(), referencing such error code is not permitted in following code. Therefore a return value checking is necessary after 'n' is back from btree_node_alloc_replacement(). Signed-off-by: Coly Li <[email protected]> Reported-by: Dan Carpenter <[email protected]> Cc: <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20bcache: avoid oversize memory allocation by small stripe_sizeColy Li2-0/+3
Arraies bcache->stripe_sectors_dirty and bcache->full_dirty_stripes are used for dirty data writeback, their sizes are decided by backing device capacity and stripe size. Larger backing device capacity or smaller stripe size make these two arraies occupies more dynamic memory space. Currently bcache->stripe_size is directly inherited from queue->limits.io_opt of underlying storage device. For normal hard drives, its limits.io_opt is 0, and bcache sets the corresponding stripe_size to 1TB (1<<31 sectors), it works fine 10+ years. But for devices do declare value for queue->limits.io_opt, small stripe_size (comparing to 1TB) becomes an issue for oversize memory allocations of bcache->stripe_sectors_dirty and bcache->full_dirty_stripes, while the capacity of hard drives gets much larger in recent decade. For example a raid5 array assembled by three 20TB hardrives, the raid device capacity is 40TB with typical 512KB limits.io_opt. After the math calculation in bcache code, these two arraies will occupy 400MB dynamic memory. Even worse Andrea Tomassetti reports that a 4KB limits.io_opt is declared on a new 2TB hard drive, then these two arraies request 2GB and 512MB dynamic memory from kzalloc(). The result is that bcache device always fails to initialize on his system. To avoid the oversize memory allocation, bcache->stripe_size should not directly inherited by queue->limits.io_opt from the underlying device. This patch defines BCH_MIN_STRIPE_SZ (4MB) as minimal bcache stripe size and set bcache device's stripe size against the declared limits.io_opt value from the underlying storage device, - If the declared limits.io_opt > BCH_MIN_STRIPE_SZ, bcache device will set its stripe size directly by this limits.io_opt value. - If the declared limits.io_opt < BCH_MIN_STRIPE_SZ, bcache device will set its stripe size by a value multiplying limits.io_opt and euqal or large than BCH_MIN_STRIPE_SZ. Then the minimal stripe size of a bcache device will always be >= 4MB. For a 40TB raid5 device with 512KB limits.io_opt, memory occupied by bcache->stripe_sectors_dirty and bcache->full_dirty_stripes will be 50MB in total. For a 2TB hard drive with 4KB limits.io_opt, memory occupied by these two arraies will be 2.5MB in total. Such mount of memory allocated for bcache->stripe_sectors_dirty and bcache->full_dirty_stripes is reasonable for most of storage devices. Reported-by: Andrea Tomassetti <[email protected]> Signed-off-by: Coly Li <[email protected]> Reviewed-by: Eric Wheeler <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-11-20drm/rockchip: vop: Fix color for RGB888/BGR888 format on VOP fullJonas Karlman1-3/+11
Use of DRM_FORMAT_RGB888 and DRM_FORMAT_BGR888 on e.g. RK3288, RK3328 and RK3399 result in wrong colors being displayed. The issue can be observed using modetest: modetest -s <connector_id>@<crtc_id>:1920x1080-60@RG24 modetest -s <connector_id>@<crtc_id>:1920x1080-60@BG24 Vendor 4.4 kernel apply an inverted rb swap for these formats on VOP full framework (IP version 3.x) compared to VOP little framework (2.x). Fix colors by applying different rb swap for VOP full framework (3.x) and VOP little framework (2.x) similar to vendor 4.4 kernel. Fixes: 85a359f25388 ("drm/rockchip: Add BGR formats to VOP") Signed-off-by: Jonas Karlman <[email protected]> Tested-by: Diederik de Haas <[email protected]> Reviewed-by: Christopher Obbard <[email protected]> Tested-by: Christopher Obbard <[email protected]> Signed-off-by: Heiko Stuebner <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-11-20drm/i915: do not clean GT table on error pathAndrzej Hajda2-14/+1
The only task of intel_gt_release_all is to zero gt table. Calling it on error path prevents intel_gt_driver_late_release_all (called from i915_driver_late_release) to cleanup GTs, causing leakage. After i915_driver_late_release GT array is not used anymore so it does not need cleaning at all. Sample leak report: BUG i915_request (...): Objects remaining in i915_request on __kmem_cache_shutdown() ... Object 0xffff888113420040 @offset=64 Allocated in __i915_request_create+0x75/0x610 [i915] age=18339 cpu=1 pid=1454 kmem_cache_alloc+0x25b/0x270 __i915_request_create+0x75/0x610 [i915] i915_request_create+0x109/0x290 [i915] __engines_record_defaults+0xca/0x440 [i915] intel_gt_init+0x275/0x430 [i915] i915_gem_init+0x135/0x2c0 [i915] i915_driver_probe+0x8d1/0xdc0 [i915] v2: removed whole intel_gt_release_all Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8489 Fixes: bec68cc9ea42 ("drm/i915: Prepare for multiple GTs") Signed-off-by: Andrzej Hajda <[email protected]> Reviewed-by: Tvrtko Ursulin <[email protected]> Reviewed-by: Nirmoy Das <[email protected]> Reviewed-by: Andi Shyti <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit e899505533852bf1da133f2f4c9a9655ff77f7e5) Signed-off-by: Jani Nikula <[email protected]>
2023-11-20drm/i915/dp_mst: Fix race between connector registration and setupImre Deak1-8/+8
After drm_connector_init() is called the connector is visible to the rest of the kernel via the drm_mode_config::connector_list. Make sure that the DSC AUX device and capabilities are setup by that time. Another race condition is adding the connector to the connector list before drm_connector_helper_add() sets the connector helper functions. That's an unrelated issue, for which the fix is for a follow-up. One solution would be adding the connector to the connector list only during its registration in drm_connector_register(). Cc: Stanislav Lisovskiy <[email protected]> Cc: Ville Syrjälä <[email protected]> Fixes: 808b43fa7e56 ("drm/i915/dp_mst: Set connector DSC capabilities and decompression AUX") Reviewed-by: Stanislav Lisovskiy <[email protected]> Signed-off-by: Imre Deak <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected] (cherry picked from commit 560ea72c76eb6d0c59f77580414e64cc09f1093d) Signed-off-by: Jani Nikula <[email protected]>
2023-11-19md: fix bi_status reporting in md_end_clone_ioSong Liu1-1/+2
md_end_clone_io() may overwrite error status in orig_bio->bi_status with BLK_STS_OK. This could happen when orig_bio has BIO_CHAIN (split by md_submit_bio => bio_split_to_limits, for example). As a result, upper layer may miss error reported from md (or the device) and consider the failed IO was successful. Fix this by only update orig_bio->bi_status when current bio reports error and orig_bio is BLK_STS_OK. This is the same behavior as __bio_chain_endio(). Fixes: 10764815ff47 ("md: add io accounting for raid0 and raid5") Cc: [email protected] # v5.14+ Reported-by: Bhanu Victor DiCara <[email protected]> Closes: https://lore.kernel.org/regressions/5727380.DvuYhMxLoT@bvd0/ Signed-off-by: Song Liu <[email protected]> Tested-by: Xiao Ni <[email protected]> Reviewed-by: Yu Kuai <[email protected]> Acked-by: Guoqing Jiang <[email protected]>
2023-11-20ata: pata_isapnp: Add missing error check for devm_ioport_map()Chen Ni1-0/+3
Add missing error return check for devm_ioport_map() and return the error if this function call fails. Fixes: 0d5ff566779f ("libata: convert to iomap") Signed-off-by: Chen Ni <[email protected]> Reviewed-by: Sergey Shtylyov <[email protected]> Signed-off-by: Damien Le Moal <[email protected]>
2023-11-19Merge tag 'irq_urgent_for_v6.7_rc2' of ↵Linus Torvalds1-6/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Borislav Petkov: - Flush the translation service tables to prevent unpredictable behavior on non-coherent GIC devices * tag 'irq_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic-v3-its: Flush ITS tables correctly in non-coherent GIC designs
2023-11-19octeontx2-pf: Fix memory leak during interface downSuman Ghosh1-0/+2
During 'ifconfig <netdev> down' one RSS memory was not getting freed. This patch fixes the same. Fixes: 81a4362016e7 ("octeontx2-pf: Add RSS multi group support") Signed-off-by: Suman Ghosh <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-11-19wireguard: use DEV_STATS_INC()Eric Dumazet3-9/+10
wg_xmit() can be called concurrently, KCSAN reported [1] some device stats updates can be lost. Use DEV_STATS_INC() for this unlikely case. [1] BUG: KCSAN: data-race in wg_xmit / wg_xmit read-write to 0xffff888104239160 of 8 bytes by task 1375 on cpu 0: wg_xmit+0x60f/0x680 drivers/net/wireguard/device.c:231 __netdev_start_xmit include/linux/netdevice.h:4918 [inline] netdev_start_xmit include/linux/netdevice.h:4932 [inline] xmit_one net/core/dev.c:3543 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3559 ... read-write to 0xffff888104239160 of 8 bytes by task 1378 on cpu 1: wg_xmit+0x60f/0x680 drivers/net/wireguard/device.c:231 __netdev_start_xmit include/linux/netdevice.h:4918 [inline] netdev_start_xmit include/linux/netdevice.h:4932 [inline] xmit_one net/core/dev.c:3543 [inline] dev_hard_start_xmit+0x11b/0x3f0 net/core/dev.c:3559 ... v2: also change wg_packet_consume_data_done() (Hangbin Liu) and wg_packet_purge_staged_packets() Fixes: e7096c131e51 ("net: WireGuard secure network tunnel") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Cc: Hangbin Liu <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]> Reviewed-by: Hangbin Liu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-11-19net: wangxun: fix kernel panic due to null pointerJiawen Wu3-9/+7
When the device uses a custom subsystem vendor ID, the function wx_sw_init() returns before the memory of 'wx->mac_table' is allocated. The null pointer will causes the kernel panic. Fixes: 79625f45ca73 ("net: wangxun: Move MAC address handling to libwx") Signed-off-by: Jiawen Wu <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-11-19drm/panel: simple: Fix Innolux G101ICE-L01 timingsMarek Vasut1-6/+6
The Innolux G101ICE-L01 datasheet [1] page 17 table 6.1 INPUT SIGNAL TIMING SPECIFICATIONS indicates that maximum vertical blanking time is 40 lines. Currently the driver uses 29 lines. Fix it, and since this panel is a DE panel, adjust the timings to make them less hostile to controllers which cannot do 1 px HSA/VSA, distribute the delays evenly between all three parts. [1] https://www.data-modul.com/sites/default/files/products/G101ICE-L01-C2-specification-12042389.pdf Fixes: 1e29b840af9f ("drm/panel: simple: Add Innolux G101ICE-L01 panel") Signed-off-by: Marek Vasut <[email protected]> Reviewed-by: Neil Armstrong <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-11-19drm/panel: simple: Fix Innolux G101ICE-L01 bus flagsMarek Vasut1-0/+1
Add missing .bus_flags = DRM_BUS_FLAG_DE_HIGH to this panel description, ones which match both the datasheet and the panel display_timing flags . Fixes: 1e29b840af9f ("drm/panel: simple: Add Innolux G101ICE-L01 panel") Signed-off-by: Marek Vasut <[email protected]> Reviewed-by: Neil Armstrong <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2023-11-18Merge tag 'scsi-fixes' of ↵Linus Torvalds4-41/+38
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Seven small fixes, six in drivers and one in sd. The sd fix is so large because it changes a struct pointer to a struct but otherwise is fairly simple" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: ufs: qcom-ufs: dt-bindings: Document the SM8650 UFS Controller scsi: sd: Fix sshdr use in sd_suspend_common() scsi: scsi_debug: Delete some bogus error checking scsi: scsi_debug: Fix some bugs in sdebug_error_write() scsi: ufs: core: Fix racing issue between ufshcd_mcq_abort() and ISR scsi: ufs: core: Expand MCQ queue slot to DeviceQueueDepth + 1 scsi: qla2xxx: Fix system crash due to bad pointer access
2023-11-18Merge tag 'parisc-for-6.7-rc2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux Pull parisc fixes from Helge Deller: "On parisc we still sometimes need writeable stacks, e.g. if programs aren't compiled with gcc-14. To avoid issues with the upcoming systemd-254 we therefore have to disable prctl(PR_SET_MDWE) for now (for parisc only). The other two patches are minor: a bugfix for the soft power-off on qemu with 64-bit kernel and prefer strscpy() over strlcpy(): - Fix power soft-off on qemu - Disable prctl(PR_SET_MDWE) since parisc sometimes still needs writeable stacks - Use strscpy instead of strlcpy in show_cpuinfo()" * tag 'parisc-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux: prctl: Disable prctl(PR_SET_MDWE) on parisc parisc/power: Fix power soft-off when running on qemu parisc: Replace strlcpy() with strscpy()
2023-11-18Merge tag 'for-6.7/dm-fixes' of ↵Linus Torvalds6-100/+130
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper fixes from Mike Snitzer: - Various fixes for the DM delay target to address regressions introduced during the 6.7 merge window - Fixes to both DM bufio and the verity target for no-sleep mode, to address sleeping while atomic issues - Update DM crypt target in response to the treewide change that made MAX_ORDER inclusive * tag 'for-6.7/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: dm-crypt: start allocating with MAX_ORDER dm-verity: don't use blocking calls from tasklets dm-bufio: fix no-sleep mode dm-delay: avoid duplicate logic dm-delay: fix bugs introduced by kthread mode dm-delay: fix a race between delay_presuspend and delay_bio
2023-11-18parisc/power: Fix power soft-off when running on qemuHelge Deller1-1/+1
Firmware returns the physical address of the power switch, so need to use gsc_writel() instead of direct memory access. Fixes: d0c219472980 ("parisc/power: Add power soft-off when running on qemu") Signed-off-by: Helge Deller <[email protected]> Cc: [email protected] # v6.0+
2023-11-18Merge tag 'i2c-for-6.7-rc2' of ↵Linus Torvalds3-18/+78
git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Revert a not-working conversion to generic recovery for PXA, use proper IO accessors for designware, and use proper PM level for ocores to allow accessing interrupt providers late" * tag 'i2c-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: ocores: Move system PM hooks to the NOIRQ phase i2c: designware: Fix corrupted memory seen in the ISR Revert "i2c: pxa: move to generic GPIO recovery"