aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2022-05-05tty: n_gsm: fix buffer over-read in gsm_dlci_data()Daniel Starke1-0/+1
'len' is decreased after each octet that has its EA bit set to 0, which means that the value is encoded with additional octets. However, the final octet does not decreases 'len' which results in 'len' being one byte too long. A buffer over-read may occur in tty_insert_flip_string() as it tries to read one byte more than the passed content size of 'data'. Decrease 'len' also for the final octet which has the EA bit set to 1 to write the correct number of bytes from the internal receive buffer to the virtual tty. Fixes: 2e124b4a390c ("TTY: switch tty_flip_buffer_push") Cc: [email protected] Signed-off-by: Daniel Starke <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-05-05serial: 8250_mtk: Fix register address for XON/XOFF characterAngeloGioacchino Del Regno1-2/+5
The XON1/XOFF1 character registers are at offset 0xa0 and 0xa8 respectively, so we cannot use the definition in serial_port.h. Fixes: bdbd0a7f8f03 ("serial: 8250-mtk: modify baudrate setting") Signed-off-by: AngeloGioacchino Del Regno <[email protected]> Cc: stable <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-05-05serial: 8250_mtk: Make sure to select the right FEATURE_SELAngeloGioacchino Del Regno1-0/+7
Set the FEATURE_SEL at probe time to make sure that BIT(0) is enabled: this guarantees that when the port is configured as AP UART, the right register layout is interpreted by the UART IP. Signed-off-by: AngeloGioacchino Del Regno <[email protected]> Cc: stable <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-05-05serial: 8250_mtk: Fix UART_EFR register addressAngeloGioacchino Del Regno1-7/+8
On MediaTek SoCs, the UART IP is 16550A compatible, but there are some specific quirks: we are declaring a register shift of 2, but this is only valid for the majority of the registers, as there are some that are out of the standard layout. Specifically, this driver is using definitions from serial_reg.h, where we have a UART_EFR register defined as 2: this results in a 0x8 offset, but there we have the FCR register instead. The right offset for the EFR register on MediaTek UART is at 0x98, so, following the decimal definition convention in serial_reg.h and accounting for the register left shift of two, add and use the correct register address for this IP, defined as decimal 38, so that the final calculation results in (0x26 << 2) = 0x98. Fixes: bdbd0a7f8f03 ("serial: 8250-mtk: modify baudrate setting") Signed-off-by: AngeloGioacchino Del Regno <[email protected]> Cc: stable <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-05-05tty/serial: digicolor: fix possible null-ptr-deref in digicolor_uart_probe()Yang Yingliang1-3/+2
It will cause null-ptr-deref when using 'res', if platform_get_resource() returns NULL, so move using 'res' after devm_ioremap_resource() that will check it to avoid null-ptr-deref. And use devm_platform_get_and_ioremap_resource() to simplify code. Fixes: 5930cb3511df ("serial: driver for Conexant Digicolor USART") Signed-off-by: Yang Yingliang <[email protected]> Reviewed-by: Baruch Siach <[email protected]> Cc: stable <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-05-05usb: gadget: uvc: allow for application to cleanly shutdownDan Vacura3-1/+29
Several types of kernel panics can occur due to timing during the uvc gadget removal. This appears to be a problem with gadget resources being managed by both the client application's v4l2 open/close and the UDC gadget bind/unbind. Since the concept of USB_GADGET_DELAYED_STATUS doesn't exist for unbind, add a wait to allow for the application to close out. Some examples of the panics that can occur are: <1>[ 1147.652313] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000028 <4>[ 1147.652510] Call trace: <4>[ 1147.652514] usb_gadget_disconnect+0x74/0x1f0 <4>[ 1147.652516] usb_gadget_deactivate+0x38/0x168 <4>[ 1147.652520] usb_function_deactivate+0x54/0x90 <4>[ 1147.652524] uvc_function_disconnect+0x14/0x38 <4>[ 1147.652527] uvc_v4l2_release+0x34/0xa0 <4>[ 1147.652537] __fput+0xdc/0x2c0 <4>[ 1147.652540] ____fput+0x10/0x1c <4>[ 1147.652545] task_work_run+0xe4/0x12c <4>[ 1147.652549] do_notify_resume+0x108/0x168 <1>[ 282.950561][ T1472] Unable to handle kernel NULL pointer dereference at virtual address 00000000000005b8 <6>[ 282.953111][ T1472] Call trace: <6>[ 282.953121][ T1472] usb_function_deactivate+0x54/0xd4 <6>[ 282.953134][ T1472] uvc_v4l2_release+0xac/0x1e4 <6>[ 282.953145][ T1472] v4l2_release+0x134/0x1f0 <6>[ 282.953167][ T1472] __fput+0xf4/0x428 <6>[ 282.953178][ T1472] ____fput+0x14/0x24 <6>[ 282.953193][ T1472] task_work_run+0xac/0x130 <3>[ 213.410077][ T29] configfs-gadget gadget: uvc: Failed to queue request (-108). <1>[ 213.410116][ T29] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000003 <6>[ 213.413460][ T29] Call trace: <6>[ 213.413474][ T29] uvcg_video_pump+0x1f0/0x384 <6>[ 213.413489][ T29] process_one_work+0x2a4/0x544 <6>[ 213.413502][ T29] worker_thread+0x350/0x784 <6>[ 213.413515][ T29] kthread+0x2ac/0x320 <6>[ 213.413528][ T29] ret_from_fork+0x10/0x30 Signed-off-by: Dan Vacura <[email protected]> Cc: stable <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-05-05usb: typec: tcpci: Don't skip cleanup in .remove() on errorUwe Kleine-König1-1/+1
Returning an error value in an i2c remove callback results in an error message being emitted by the i2c core, but otherwise it doesn't make a difference. The device goes away anyhow and the devm cleanups are called. In this case the remove callback even returns early without stopping the tcpm worker thread and various timers. A work scheduled on the work queue, or a firing timer after tcpci_remove() returned probably results in a use-after-free situation because the regmap and driver data were freed. So better make sure that tcpci_unregister_port() is called even if disabling the irq failed. Also emit a more specific error message instead of the i2c core's "remove failed (EIO), will be ignored" and return 0 to suppress the core's warning. This patch is (also) a preparation for making i2c remove callbacks return void. Fixes: 3ba76256fc4e ("usb: typec: tcpci: mask event interrupts when remove driver") Signed-off-by: Uwe Kleine-König <[email protected]> Cc: stable <[email protected]> Acked-by: Heikki Krogerus <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-05-05usb: cdc-wdm: fix reading stuck on device closeSergey Ryazanov1-0/+1
cdc-wdm tracks whether a response reading request is in-progress and blocks the next request from being sent until the previous request is completed. As soon as last user closes the cdc-wdm device file, the driver cancels any ongoing requests, resets the pending response counter, but leaves the response reading in-progress flag (WDM_RESPONDING) untouched. So if the user closes the device file during the response receive request is being performed, no more data will be obtained from the modem. The request will be cancelled, effectively preventing the WDM_RESPONDING flag from being reseted. Keeping the flag set will prevent a new response receive request from being sent, permanently blocking the read path. The read path will staying blocked until the module will be reloaded or till the modem will be re-attached. This stuck has been observed with a Huawei E3372 modem attached to an OpenWrt router and using the comgt utility to set up a network connection. Fix this issue by clearing the WDM_RESPONDING flag on the device file close. Without this fix, the device reading stuck can be easily reproduced in a few connection establishing attempts. With this fix, a load test for modem connection re-establishing worked for several hours without any issues. Fixes: 922a5eadd5a3 ("usb: cdc-wdm: Fix race between autosuspend and reading from the device") Signed-off-by: Sergey Ryazanov <[email protected]> Cc: stable <[email protected]> Acked-by: Oliver Neukum <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-05-05btrfs: sysfs: export the balance paused state of exclusive operationDavid Sterba1-0/+3
The new state allowing device addition with paused balance is not exported to user space so it can't recognize it and actually start the operation. Fixes: efc0e69c2fea ("btrfs: introduce exclusive operation BALANCE_PAUSED state") CC: [email protected] # 5.17 Signed-off-by: David Sterba <[email protected]>
2022-05-05btrfs: fix assertion failure when logging directory key range itemFilipe Manana1-14/+25
When inserting a key range item (BTRFS_DIR_LOG_INDEX_KEY) while logging a directory, we don't expect the insertion to fail with -EEXIST, because we are holding the directory's log_mutex and we have dropped all existing BTRFS_DIR_LOG_INDEX_KEY keys from the log tree before we started to log the directory. However it's possible that during the logging we attempt to insert the same BTRFS_DIR_LOG_INDEX_KEY key twice, but for this to happen we need to race with insertions of items from other inodes in the subvolume's tree while we are logging a directory. Here's how this can happen: 1) We are logging a directory with inode number 1000 that has its items spread across 3 leaves in the subvolume's tree: leaf A - has index keys from the range 2 to 20 for example. The last item in the leaf corresponds to a dir item for index number 20. All these dir items were created in a past transaction. leaf B - has index keys from the range 22 to 100 for example. It has no keys from other inodes, all its keys are dir index keys for our directory inode number 1000. Its first key is for the dir item with a sequence number of 22. All these dir items were also created in a past transaction. leaf C - has index keys for our directory for the range 101 to 120 for example. This leaf also has items from other inodes, and its first item corresponds to the dir item for index number 101 for our directory with inode number 1000; 2) When we finish processing the items from leaf A at log_dir_items(), we log a BTRFS_DIR_LOG_INDEX_KEY key with an offset of 21 and a last offset of 21, meaning the log is authoritative for the index range from 21 to 21 (a single sequence number). At this point leaf B was not yet modified in the current transaction; 3) When we return from log_dir_items() we have released our read lock on leaf B, and have set *last_offset_ret to 21 (index number of the first item on leaf B minus 1); 4) Some other task inserts an item for other inode (inode number 1001 for example) into leaf C. That resulted in pushing some items from leaf C into leaf B, in order to make room for the new item, so now leaf B has dir index keys for the sequence number range from 22 to 102 and leaf C has the dir items for the sequence number range 103 to 120; 5) At log_directory_changes() we call log_dir_items() again, passing it a 'min_offset' / 'min_key' value of 22 (*last_offset_ret from step 3 plus 1, so 21 + 1). Then btrfs_search_forward() leaves us at slot 0 of leaf B, since leaf B was modified in the current transaction. We have also initialized 'last_old_dentry_offset' to 20 after calling btrfs_previous_item() at log_dir_items(), as it left us at the last item of leaf A, which refers to the dir item with sequence number 20; 6) We then call process_dir_items_leaf() to process the dir items of leaf B, and when we process the first item, corresponding to slot 0, sequence number 22, we notice the dir item was created in a past transaction and its sequence number is greater than the value of *last_old_dentry_offset + 1 (20 + 1), so we decide to log again a BTRFS_DIR_LOG_INDEX_KEY key with an offset of 21 and an end range of 21 (key.offset - 1 == 22 - 1 == 21), which results in an -EEXIST error from insert_dir_log_key(), as we have already inserted that key at step 2, triggering the assertion at process_dir_items_leaf(). The trace produced in dmesg is like the following: assertion failed: ret != -EEXIST, in fs/btrfs/tree-log.c:3857 [198255.980839][ T7460] ------------[ cut here ]------------ [198255.981666][ T7460] kernel BUG at fs/btrfs/ctree.h:3617! [198255.983141][ T7460] invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI [198255.984080][ T7460] CPU: 0 PID: 7460 Comm: repro-ghost-dir Not tainted 5.18.0-5314c78ac373-misc-next+ [198255.986027][ T7460] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 [198255.988600][ T7460] RIP: 0010:assertfail.constprop.0+0x1c/0x1e [198255.989465][ T7460] Code: 8b 4c 89 (...) [198255.992599][ T7460] RSP: 0018:ffffc90007387188 EFLAGS: 00010282 [198255.993414][ T7460] RAX: 000000000000003d RBX: 0000000000000065 RCX: 0000000000000000 [198255.996056][ T7460] RDX: 0000000000000001 RSI: ffffffff8b62b180 RDI: fffff52000e70e24 [198255.997668][ T7460] RBP: ffffc90007387188 R08: 000000000000003d R09: ffff8881f0e16507 [198255.999199][ T7460] R10: ffffed103e1c2ca0 R11: 0000000000000001 R12: 00000000ffffffef [198256.000683][ T7460] R13: ffff88813befc630 R14: ffff888116c16e70 R15: ffffc90007387358 [198256.007082][ T7460] FS: 00007fc7f7c24640(0000) GS:ffff8881f0c00000(0000) knlGS:0000000000000000 [198256.009939][ T7460] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [198256.014133][ T7460] CR2: 0000560bb16d0b78 CR3: 0000000140b34005 CR4: 0000000000170ef0 [198256.015239][ T7460] Call Trace: [198256.015674][ T7460] <TASK> [198256.016313][ T7460] log_dir_items.cold+0x16/0x2c [198256.018858][ T7460] ? replay_one_extent+0xbf0/0xbf0 [198256.025932][ T7460] ? release_extent_buffer+0x1d2/0x270 [198256.029658][ T7460] ? rcu_read_lock_sched_held+0x16/0x80 [198256.031114][ T7460] ? lock_acquired+0xbe/0x660 [198256.032633][ T7460] ? rcu_read_lock_sched_held+0x16/0x80 [198256.034386][ T7460] ? lock_release+0xcf/0x8a0 [198256.036152][ T7460] log_directory_changes+0xf9/0x170 [198256.036993][ T7460] ? log_dir_items+0xba0/0xba0 [198256.037661][ T7460] ? do_raw_write_unlock+0x7d/0xe0 [198256.038680][ T7460] btrfs_log_inode+0x233b/0x26d0 [198256.041294][ T7460] ? log_directory_changes+0x170/0x170 [198256.042864][ T7460] ? btrfs_attach_transaction_barrier+0x60/0x60 [198256.045130][ T7460] ? rcu_read_lock_sched_held+0x16/0x80 [198256.046568][ T7460] ? lock_release+0xcf/0x8a0 [198256.047504][ T7460] ? lock_downgrade+0x420/0x420 [198256.048712][ T7460] ? ilookup5_nowait+0x81/0xa0 [198256.049747][ T7460] ? lock_downgrade+0x420/0x420 [198256.050652][ T7460] ? do_raw_spin_unlock+0xa9/0x100 [198256.051618][ T7460] ? __might_resched+0x128/0x1c0 [198256.052511][ T7460] ? __might_sleep+0x66/0xc0 [198256.053442][ T7460] ? __kasan_check_read+0x11/0x20 [198256.054251][ T7460] ? iget5_locked+0xbd/0x150 [198256.054986][ T7460] ? run_delayed_iput_locked+0x110/0x110 [198256.055929][ T7460] ? btrfs_iget+0xc7/0x150 [198256.056630][ T7460] ? btrfs_orphan_cleanup+0x4a0/0x4a0 [198256.057502][ T7460] ? free_extent_buffer+0x13/0x20 [198256.058322][ T7460] btrfs_log_inode+0x2654/0x26d0 [198256.059137][ T7460] ? log_directory_changes+0x170/0x170 [198256.060020][ T7460] ? rcu_read_lock_sched_held+0x16/0x80 [198256.060930][ T7460] ? rcu_read_lock_sched_held+0x16/0x80 [198256.061905][ T7460] ? lock_contended+0x770/0x770 [198256.062682][ T7460] ? btrfs_log_inode_parent+0xd04/0x1750 [198256.063582][ T7460] ? lock_downgrade+0x420/0x420 [198256.064432][ T7460] ? preempt_count_sub+0x18/0xc0 [198256.065550][ T7460] ? __mutex_lock+0x580/0xdc0 [198256.066654][ T7460] ? stack_trace_save+0x94/0xc0 [198256.068008][ T7460] ? __kasan_check_write+0x14/0x20 [198256.072149][ T7460] ? __mutex_unlock_slowpath+0x12a/0x430 [198256.073145][ T7460] ? mutex_lock_io_nested+0xcd0/0xcd0 [198256.074341][ T7460] ? wait_for_completion_io_timeout+0x20/0x20 [198256.075345][ T7460] ? lock_downgrade+0x420/0x420 [198256.076142][ T7460] ? lock_contended+0x770/0x770 [198256.076939][ T7460] ? do_raw_spin_lock+0x1c0/0x1c0 [198256.078401][ T7460] ? btrfs_sync_file+0x5e6/0xa40 [198256.080598][ T7460] btrfs_log_inode_parent+0x523/0x1750 [198256.081991][ T7460] ? wait_current_trans+0xc8/0x240 [198256.083320][ T7460] ? lock_downgrade+0x420/0x420 [198256.085450][ T7460] ? btrfs_end_log_trans+0x70/0x70 [198256.086362][ T7460] ? rcu_read_lock_sched_held+0x16/0x80 [198256.087544][ T7460] ? lock_release+0xcf/0x8a0 [198256.088305][ T7460] ? lock_downgrade+0x420/0x420 [198256.090375][ T7460] ? dget_parent+0x8e/0x300 [198256.093538][ T7460] ? do_raw_spin_lock+0x1c0/0x1c0 [198256.094918][ T7460] ? lock_downgrade+0x420/0x420 [198256.097815][ T7460] ? do_raw_spin_unlock+0xa9/0x100 [198256.101822][ T7460] ? dget_parent+0xb7/0x300 [198256.103345][ T7460] btrfs_log_dentry_safe+0x48/0x60 [198256.105052][ T7460] btrfs_sync_file+0x629/0xa40 [198256.106829][ T7460] ? start_ordered_ops.constprop.0+0x120/0x120 [198256.109655][ T7460] ? __fget_files+0x161/0x230 [198256.110760][ T7460] vfs_fsync_range+0x6d/0x110 [198256.111923][ T7460] ? start_ordered_ops.constprop.0+0x120/0x120 [198256.113556][ T7460] __x64_sys_fsync+0x45/0x70 [198256.114323][ T7460] do_syscall_64+0x5c/0xc0 [198256.115084][ T7460] ? syscall_exit_to_user_mode+0x3b/0x50 [198256.116030][ T7460] ? do_syscall_64+0x69/0xc0 [198256.116768][ T7460] ? do_syscall_64+0x69/0xc0 [198256.117555][ T7460] ? do_syscall_64+0x69/0xc0 [198256.118324][ T7460] ? sysvec_call_function_single+0x57/0xc0 [198256.119308][ T7460] ? asm_sysvec_call_function_single+0xa/0x20 [198256.120363][ T7460] entry_SYSCALL_64_after_hwframe+0x44/0xae [198256.121334][ T7460] RIP: 0033:0x7fc7fe97b6ab [198256.122067][ T7460] Code: 0f 05 48 (...) [198256.125198][ T7460] RSP: 002b:00007fc7f7c23950 EFLAGS: 00000293 ORIG_RAX: 000000000000004a [198256.126568][ T7460] RAX: ffffffffffffffda RBX: 00007fc7f7c239f0 RCX: 00007fc7fe97b6ab [198256.127942][ T7460] RDX: 0000000000000002 RSI: 000056167536bcf0 RDI: 0000000000000004 [198256.129302][ T7460] RBP: 0000000000000004 R08: 0000000000000000 R09: 000000007ffffeb8 [198256.130670][ T7460] R10: 00000000000001ff R11: 0000000000000293 R12: 0000000000000001 [198256.132046][ T7460] R13: 0000561674ca8140 R14: 00007fc7f7c239d0 R15: 000056167536dab8 [198256.133403][ T7460] </TASK> Fix this by treating -EEXIST as expected at insert_dir_log_key() and have it update the item with an end offset corresponding to the maximum between the previously logged end offset and the new requested end offset. The end offsets may be different due to dir index key deletions that happened as part of unlink operations while we are logging a directory (triggered when fsyncing some other inode parented by the directory) or during renames which always attempt to log a single dir index deletion. Reported-by: Zygo Blaxell <[email protected]> Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Fixes: 732d591a5d6c12 ("btrfs: stop copying old dir items when logging a directory") Signed-off-by: Filipe Manana <[email protected]> Signed-off-by: David Sterba <[email protected]>
2022-05-05btrfs: zoned: activate block group properly on unlimited active zone deviceNaohiro Aota1-14/+8
btrfs_zone_activate() checks if it activated all the underlying zones in the loop. However, that check never hit on an unlimited activate zone device (max_active_zones == 0). Fortunately, it still works without ENOSPC because btrfs_zone_activate() returns true in the end, even if block_group->zone_is_active == 0. But, it is confusing to have non zone_is_active block group still usable for allocation. Also, we are wasting CPU time to iterate the loop every time btrfs_zone_activate() is called for the blog groups. Since error case in the loop is handled by out_unlock, we can just set zone_is_active and do the list stuff after the loop. Fixes: f9a912a3c45f ("btrfs: zoned: make zone activation multi stripe capable") Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Naohiro Aota <[email protected]> Signed-off-by: David Sterba <[email protected]>
2022-05-05btrfs: zoned: move non-changing condition check out of the loopNaohiro Aota1-6/+6
btrfs_zone_activate() checks if block_group->alloc_offset == block_group->zone_capacity every time it iterates the loop. But, it is not depending on the index. Move out the check and do it only once. Fixes: f9a912a3c45f ("btrfs: zoned: make zone activation multi stripe capable") Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: Naohiro Aota <[email protected]> Signed-off-by: David Sterba <[email protected]>
2022-05-05btrfs: force v2 space cache usage for subpage mountQu Wenruo1-0/+11
[BUG] For a 4K sector sized btrfs with v1 cache enabled and only mounted on systems with 4K page size, if it's mounted on subpage (64K page size) systems, it can cause the following warning on v1 space cache: BTRFS error (device dm-1): csum mismatch on free space cache BTRFS warning (device dm-1): failed to load free space cache for block group 84082688, rebuilding it now Although not a big deal, as kernel can rebuild it without problem, such warning will bother end users, especially if they want to switch the same btrfs seamlessly between different page sized systems. [CAUSE] V1 free space cache is still using fixed PAGE_SIZE for various bitmap, like BITS_PER_BITMAP. Such hard-coded PAGE_SIZE usage will cause various mismatch, from v1 cache size to checksum. Thus kernel will always reject v1 cache with a different PAGE_SIZE with csum mismatch. [FIX] Although we should fix v1 cache, it's already going to be marked deprecated soon. And we have v2 cache based on metadata (which is already fully subpage compatible), and it has almost everything superior than v1 cache. So just force subpage mount to use v2 cache on mount. Reported-by: Matt Corallo <[email protected]> CC: [email protected] # 5.15+ Link: https://lore.kernel.org/linux-btrfs/[email protected]/ Reviewed-by: Josef Bacik <[email protected]> Signed-off-by: Qu Wenruo <[email protected]> Signed-off-by: David Sterba <[email protected]>
2022-05-05cgroup/cpuset: Remove cpus_allowed/mems_allowed setup in cpuset_init_smp()Waiman Long1-2/+5
There are 3 places where the cpu and node masks of the top cpuset can be initialized in the order they are executed: 1) start_kernel -> cpuset_init() 2) start_kernel -> cgroup_init() -> cpuset_bind() 3) kernel_init_freeable() -> do_basic_setup() -> cpuset_init_smp() The first cpuset_init() call just sets all the bits in the masks. The second cpuset_bind() call sets cpus_allowed and mems_allowed to the default v2 values. The third cpuset_init_smp() call sets them back to v1 values. For systems with cgroup v2 setup, cpuset_bind() is called once. As a result, cpu and memory node hot add may fail to update the cpu and node masks of the top cpuset to include the newly added cpu or node in a cgroup v2 environment. For systems with cgroup v1 setup, cpuset_bind() is called again by rebind_subsystem() when the v1 cpuset filesystem is mounted as shown in the dmesg log below with an instrumented kernel. [ 2.609781] cpuset_bind() called - v2 = 1 [ 3.079473] cpuset_init_smp() called [ 7.103710] cpuset_bind() called - v2 = 0 smp_init() is called after the first two init functions. So we don't have a complete list of active cpus and memory nodes until later in cpuset_init_smp() which is the right time to set up effective_cpus and effective_mems. To fix this cgroup v2 mask setup problem, the potentially incorrect cpus_allowed & mems_allowed setting in cpuset_init_smp() are removed. For cgroup v2 systems, the initial cpuset_bind() call will set the masks correctly. For cgroup v1 systems, the second call to cpuset_bind() will do the right setup. cc: [email protected] Signed-off-by: Waiman Long <[email protected]> Tested-by: Feng Tang <[email protected]> Reviewed-by: Michal Koutný <[email protected]> Signed-off-by: Tejun Heo <[email protected]>
2022-05-05Merge tag 's390-5.18-4' of ↵Linus Torvalds3-1/+27
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Heiko Carstens: - Disable -Warray-bounds warning for gcc12, since the only known way to workaround false positive warnings on lowcore accesses would result in worse code on fast paths. - Avoid lockdep_assert_held() warning in kvm vm memop code. - Reduce overhead within gmap_rmap code to get rid of long latencies when e.g. shutting down 2nd level guests. * tag 's390-5.18-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: KVM: s390: vsie/gmap: reduce gmap_rmap overhead KVM: s390: Fix lockdep issue in vm memop s390: disable -Warray-bounds
2022-05-05Merge tag 'mips-fixes_5.18_1' of ↵Linus Torvalds2-12/+7
git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux Pull MIPS fix from Thomas Bogendoerfer: "Extend R4000/R4400 CPU erratum workaround to all revisions" * tag 'mips-fixes_5.18_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: MIPS: Fix CP0 counter erratum detection for R4k CPUs
2022-05-05Merge tag 'net-5.18-rc6' of ↵Linus Torvalds76-386/+724
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from can, rxrpc and wireguard. Previous releases - regressions: - igmp: respect RCU rules in ip_mc_source() and ip_mc_msfilter() - mld: respect RCU rules in ip6_mc_source() and ip6_mc_msfilter() - rds: acquire netns refcount on TCP sockets - rxrpc: enable IPv6 checksums on transport socket - nic: hinic: fix bug of wq out of bound access - nic: thunder: don't use pci_irq_vector() in atomic context - nic: bnxt_en: fix possible bnxt_open() failure caused by wrong RFS flag - nic: mlx5e: - lag, fix use-after-free in fib event handler - fix deadlock in sync reset flow Previous releases - always broken: - tcp: fix insufficient TCP source port randomness - can: grcan: grcan_close(): fix deadlock - nfc: reorder destructive operations in to avoid bugs Misc: - wireguard: improve selftests reliability" * tag 'net-5.18-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (63 commits) NFC: netlink: fix sleep in atomic bug when firmware download timeout selftests: ocelot: tc_flower_chains: specify conform-exceed action for policer tcp: drop the hash_32() part from the index calculation tcp: increase source port perturb table to 2^16 tcp: dynamically allocate the perturb table used by source ports tcp: add small random increments to the source port tcp: resalt the secret every 10 seconds tcp: use different parts of the port_offset for index and offset secure_seq: use the 64 bits of the siphash for port offset calculation wireguard: selftests: set panic_on_warn=1 from cmdline wireguard: selftests: bump package deps wireguard: selftests: restore support for ccache wireguard: selftests: use newer toolchains to fill out architectures wireguard: selftests: limit parallelism to $(nproc) tests at once wireguard: selftests: make routing loop test non-fatal net/mlx5: Fix matching on inner TTC net/mlx5: Avoid double clear or set of sync reset requested net/mlx5: Fix deadlock in sync reset flow net/mlx5e: Fix trust state reset in reload net/mlx5e: Avoid checking offload capability in post_parse action ...
2022-05-05USB: serial: qcserial: add support for Sierra Wireless EM7590Ethan Yang1-0/+2
Add support for Sierra Wireless EM7590 0xc080/0xc081 compositions. Signed-off-by: Ethan Yang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: [email protected] Signed-off-by: Johan Hovold <[email protected]>
2022-05-05gpio: visconti: Fix fwnode of GPIO IRQNobuhiro Iwamatsu1-5/+2
The fwnode of GPIO IRQ must be set to its own fwnode, not the fwnode of the parent IRQ. Therefore, this sets own fwnode instead of the parent IRQ fwnode to GPIO IRQ's. Fixes: 2ad74f40dacc ("gpio: visconti: Add Toshiba Visconti GPIO support") Signed-off-by: Nobuhiro Iwamatsu <[email protected]> Reviewed-by: Linus Walleij <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2022-05-05USB: serial: option: add Fibocom MA510 modemSven Schwermer1-0/+2
The MA510 modem has 3 USB configurations that are configurable via the AT command AT+GTUSBMODE={30,31,32} which make the modem enumerate with the following interfaces, respectively: 30: Diag + QDSS + Modem + RMNET 31: Diag + Modem + AT + ECM 32: Modem + AT + ECM The first configuration (30) reuses u-blox R410M's VID/PID with identical interface configuration. A detailed description of the USB configuration for each mode follows: +GTUSBMODE: 30 -------------- T: Bus=03 Lev=01 Prnt=01 Port=06 Cnt=04 Dev#= 19 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=05c6 ProdID=90b2 Rev= 0.00 S: Manufacturer=Fibocom MA510 Modem S: Product=Fibocom MA510 Modem S: SerialNumber=55e2695b C:* #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=ff Driver=(none) E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan E: Ad=85(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms +GTUSBMODE: 31 -------------- T: Bus=03 Lev=01 Prnt=01 Port=06 Cnt=04 Dev#= 99 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=2cb7 ProdID=0106 Rev= 0.00 S: Manufacturer=Fibocom MA510 Modem S: Product=Fibocom MA510 Modem S: SerialNumber=55e2695b C:* #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA A: FirstIf#= 3 IfCount= 2 Cls=02(comm.) Sub=00 Prot=00 I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=82(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option E: Ad=84(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=06 Prot=00 Driver=cdc_ether E: Ad=86(I) Atr=03(Int.) MxPS= 64 Ivl=2ms I: If#= 4 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether I:* If#= 4 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms +GTUSBMODE: 32 -------------- T: Bus=03 Lev=01 Prnt=01 Port=06 Cnt=04 Dev#=100 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=2cb7 ProdID=010a Rev= 0.00 S: Manufacturer=Fibocom MA510 Modem S: Product=Fibocom MA510 Modem S: SerialNumber=55e2695b C:* #Ifs= 4 Cfg#= 1 Atr=e0 MxPwr=500mA A: FirstIf#= 2 IfCount= 2 Cls=02(comm.) Sub=00 Prot=00 I:* If#= 0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option E: Ad=81(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=fe Prot=ff Driver=option E: Ad=83(I) Atr=03(Int.) MxPS= 64 Ivl=2ms E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=06 Prot=00 Driver=cdc_ether E: Ad=85(I) Atr=03(Int.) MxPS= 64 Ivl=2ms I: If#= 3 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether I:* If#= 3 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms Signed-off-by: Sven Schwermer <[email protected]> Cc: [email protected] Signed-off-by: Johan Hovold <[email protected]>
2022-05-05USB: serial: option: add Fibocom L610 modemSven Schwermer1-0/+2
The L610 modem has 3 USB configurations that are configurable via the AT command AT+GTUSBMODE={31,32,33} which make the modem enumerate with the following interfaces, respectively: 31: Modem + NV + MOS + Diag + LOG + AT + AT 32: ECM + Modem + NV + MOS + Diag + LOG + AT + AT 33: RNDIS + Modem + NV + MOS + Diag + LOG + AT + AT A detailed description of the USB configuration for each mode follows: +GTUSBMODE: 31 -------------- T: Bus=03 Lev=01 Prnt=01 Port=06 Cnt=04 Dev#=124 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1782 ProdID=4d10 Rev= 0.00 S: Manufacturer=FIBOCOM S: Product=L610 C:* #Ifs= 7 Cfg#= 1 Atr=e0 MxPwr=400mA I:* If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms +GTUSBMODE: 32 -------------- T: Bus=03 Lev=01 Prnt=01 Port=06 Cnt=04 Dev#=122 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1782 ProdID=4d11 Rev= 0.00 S: Manufacturer=FIBOCOM S: Product=L610 C:* #Ifs= 9 Cfg#= 1 Atr=e0 MxPwr=400mA A: FirstIf#= 0 IfCount= 2 Cls=02(comm.) Sub=06 Prot=00 I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=06 Prot=00 Driver=cdc_ether E: Ad=81(I) Atr=03(Int.) MxPS= 16 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether I:* If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether E: Ad=82(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 7 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 8 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=08(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms +GTUSBMODE: 33 -------------- T: Bus=03 Lev=01 Prnt=01 Port=06 Cnt=04 Dev#=126 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1782 ProdID=4d11 Rev= 0.00 S: Manufacturer=FIBOCOM S: Product=L610 C:* #Ifs= 9 Cfg#= 1 Atr=e0 MxPwr=400mA A: FirstIf#= 0 IfCount= 2 Cls=e0(wlcon) Sub=01 Prot=03 I:* If#= 0 Alt= 0 #EPs= 1 Cls=e0(wlcon) Sub=01 Prot=03 Driver=rndis_host E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=4096ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 7 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=88(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=07(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 8 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=89(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=08(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms Signed-off-by: Sven Schwermer <[email protected]> Cc: [email protected] Signed-off-by: Johan Hovold <[email protected]>
2022-05-05USB: serial: pl2303: add device id for HP LM930 DisplayScott Chen2-0/+2
Add the device id for the HPLM930Display which is a PL2303GC based device. Signed-off-by: Scott Chen <[email protected]> Cc: [email protected] Signed-off-by: Johan Hovold <[email protected]>
2022-05-05genirq: Synchronize interrupt thread startupThomas Pfaff3-10/+33
A kernel hang can be observed when running setserial in a loop on a kernel with force threaded interrupts. The sequence of events is: setserial open("/dev/ttyXXX") request_irq() do_stuff() -> serial interrupt -> wake(irq_thread) desc->threads_active++; close() free_irq() kthread_stop(irq_thread) synchronize_irq() <- hangs because desc->threads_active != 0 The thread is created in request_irq() and woken up, but does not get on a CPU to reach the actual thread function, which would handle the pending wake-up. kthread_stop() sets the should stop condition which makes the thread immediately exit, which in turn leaves the stale threads_active count around. This problem was introduced with commit 519cc8652b3a, which addressed a interrupt sharing issue in the PCIe code. Before that commit free_irq() invoked synchronize_irq(), which waits for the hard interrupt handler and also for associated threads to complete. To address the PCIe issue synchronize_irq() was replaced with __synchronize_hardirq(), which only waits for the hard interrupt handler to complete, but not for threaded handlers. This was done under the assumption, that the interrupt thread already reached the thread function and waits for a wake-up, which is guaranteed to be handled before acting on the stop condition. The problematic case, that the thread would not reach the thread function, was obviously overlooked. Make sure that the interrupt thread is really started and reaches thread_fn() before returning from __setup_irq(). This utilizes the existing wait queue in the interrupt descriptor. The wait queue is unused for non-shared interrupts. For shared interrupts the usage might cause a spurious wake-up of a waiter in synchronize_irq() or the completion of a threaded handler might cause a spurious wake-up of the waiter for the ready flag. Both are harmless and have no functional impact. [ tglx: Amended changelog ] Fixes: 519cc8652b3a ("genirq: Synchronize only with single thread on free_irq()") Signed-off-by: Thomas Pfaff <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Marc Zyngier <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2022-05-05MAINTAINERS: update the GPIO git tree entryBartosz Golaszewski1-1/+1
My git tree has become the de facto main GPIO tree. Update the MAINTAINERS file to reflect that. Signed-off-by: Bartosz Golaszewski <[email protected]> Reported-by: Baruch Siach <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Linus Walleij <[email protected]>
2022-05-05NFC: netlink: fix sleep in atomic bug when firmware download timeoutDuoming Zhou1-2/+2
There are sleep in atomic bug that could cause kernel panic during firmware download process. The root cause is that nlmsg_new with GFP_KERNEL parameter is called in fw_dnld_timeout which is a timer handler. The call trace is shown below: BUG: sleeping function called from invalid context at include/linux/sched/mm.h:265 Call Trace: kmem_cache_alloc_node __alloc_skb nfc_genl_fw_download_done call_timer_fn __run_timers.part.0 run_timer_softirq __do_softirq ... The nlmsg_new with GFP_KERNEL parameter may sleep during memory allocation process, and the timer handler is run as the result of a "software interrupt" that should not call any other function that could sleep. This patch changes allocation mode of netlink message from GFP_KERNEL to GFP_ATOMIC in order to prevent sleep in atomic bug. The GFP_ATOMIC flag makes memory allocation operation could be used in atomic context. Fixes: 9674da8759df ("NFC: Add firmware upload netlink command") Fixes: 9ea7187c53f6 ("NFC: netlink: Rename CMD_FW_UPLOAD to CMD_FW_DOWNLOAD") Signed-off-by: Duoming Zhou <[email protected]> Reviewed-by: Krzysztof Kozlowski <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2022-05-05mm/readahead: Fix readahead with large foliosMatthew Wilcox (Oracle)1-6/+9
Reading 100KB chunks from a big file (eg dd bs=100K) leads to poor readahead behaviour. Studying the traces in detail, I noticed two problems. The first is that we were setting the readahead flag on the folio which contains the last byte read from the block. This is wrong because we will trigger readahead at the end of the read without waiting to see if a subsequent read is going to use the pages we just read. Instead, we need to set the readahead flag on the first folio _after_ the one which contains the last byte that we're reading. The second is that we were looking for the index of the folio with the readahead flag set to exactly match the start + size - async_size. If we've rounded this, either down (as previously) or up (as now), we'll think we hit a folio marked as readahead by a different read, and try to read the wrong pages. So round the expected index to the order of the folio we hit. Reported-by: Guo Xuenan <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
2022-05-05block: Do not call folio_next() on an unreferenced folioMatthew Wilcox (Oracle)1-1/+4
It is unsafe to call folio_next() on a folio unless you hold a reference on it that prevents it from being split or freed. After returning from the iterator, iomap calls folio_end_writeback() which may drop the last reference to the page, or allow the page to be split. If that happens, the iterator will not advance far enough through the bio_vec, leading to assertion failures like the BUG() in folio_end_writeback() that checks we're not trying to end writeback on a page not currently under writeback. Other assertion failures were also seen, but they're all explained by this one bug. Fix the bug by remembering where the next folio starts before returning from the iterator. There are other ways of fixing this bug, but this seems the simplest. Reported-by: Darrick J. Wong <[email protected]> Tested-by: Darrick J. Wong <[email protected]> Reported-by: Brian Foster <[email protected]> Tested-by: Brian Foster <[email protected]> Signed-off-by: Matthew Wilcox (Oracle) <[email protected]>
2022-05-04selftests: ocelot: tc_flower_chains: specify conform-exceed action for policerVladimir Oltean1-1/+1
As discussed here with Ido Schimmel: https://patchwork.kernel.org/project/netdevbpf/patch/[email protected]/ the default conform-exceed action is "reclassify", for a reason we don't really understand. The point is that hardware can't offload that police action, so not specifying "conform-exceed" was always wrong, even though the command used to work in hardware (but not in software) until the kernel started adding validation for it. Fix the command used by the selftest by making the policer drop on exceed, and pass the packet to the next action (goto) on conform. Fixes: 8cd6b020b644 ("selftests: ocelot: add some example VCAP IS1, IS2 and ES0 tc offloads") Signed-off-by: Vladimir Oltean <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04Merge branch 'insufficient-tcp-source-port-randomness'Jakub Kicinski5-25/+43
Willy Tarreau says: ==================== insufficient TCP source port randomness In a not-yet published paper, Moshe Kol, Amit Klein, and Yossi Gilad report being able to accurately identify a client by forcing it to emit only 40 times more connections than the number of entries in the table_perturb[] table, which is indexed by hashing the connection tuple. The current 2^8 setting allows them to perform that attack with only 10k connections, which is not hard to achieve in a few seconds. Eric, Amit and I have been working on this for a few weeks now imagining, testing and eliminating a number of approaches that Amit and his team were still able to break or that were found to be too risky or too expensive, and ended up with the simple improvements in this series that resists to the attack, doesn't degrade the performance, and preserves a reliable port selection algorithm to avoid connection failures, including the odd/even port selection preference that allows bind() to always find a port quickly even under strong connect() stress. The approach relies on several factors: - resalting the hash secret that's used to choose the table_perturb[] entry every 10 seconds to eliminate slow attacks and force the attacker to forget everything that was learned after this delay. This already eliminates most of the problem because if a client stays silent for more than 10 seconds there's no link between the previous and the next patterns, and 10s isn't yet frequent enough to cause too frequent repetition of a same port that may induce a connection failure ; - adding small random increments to the source port. Previously, a random 0 or 1 was added every 16 ports. Now a random 0 to 7 is added after each port. This means that with the default 32768-60999 range, a worst case rollover happens after 1764 connections, and an average of 3137. This doesn't stop statistical attacks but requires significantly more iterations of the same attack to confirm a guess. - increasing the table_perturb[] size from 2^8 to 2^16, which Amit says will require 2.6 million connections to be attacked with the changes above, making it pointless to get a fingerprint that will only last 10 seconds. Due to the size, the table was made dynamic. - a few minor improvements on the bits used from the hash, to eliminate some unfortunate correlations that may possibly have been exploited to design future attack models. These changes were tested under the most extreme conditions, up to 1.1 million connections per second to one and a few targets, showing no performance regression, and only 2 connection failures within 13 billion, which is less than 2^-32 and perfectly within usual values. The series is split into small reviewable changes and was already reviewed by Amit and Eric. ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04tcp: drop the hash_32() part from the index calculationWilly Tarreau1-1/+1
In commit 190cc82489f4 ("tcp: change source port randomizarion at connect() time"), the table_perturb[] array was introduced and an index was taken from the port_offset via hash_32(). But it turns out that hash_32() performs a multiplication while the input here comes from the output of SipHash in secure_seq, that is well distributed enough to avoid the need for yet another hash. Suggested-by: Amit Klein <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04tcp: increase source port perturb table to 2^16Willy Tarreau1-4/+5
Moshe Kol, Amit Klein, and Yossi Gilad reported being able to accurately identify a client by forcing it to emit only 40 times more connections than there are entries in the table_perturb[] table. The previous two improvements consisting in resalting the secret every 10s and adding randomness to each port selection only slightly improved the situation, and the current value of 2^8 was too small as it's not very difficult to make a client emit 10k connections in less than 10 seconds. Thus we're increasing the perturb table from 2^8 to 2^16 so that the same precision now requires 2.6M connections, which is more difficult in this time frame and harder to hide as a background activity. The impact is that the table now uses 256 kB instead of 1 kB, which could mostly affect devices making frequent outgoing connections. However such components usually target a small set of destinations (load balancers, database clients, perf assessment tools), and in practice only a few entries will be visited, like before. A live test at 1 million connections per second showed no performance difference from the previous value. Reported-by: Moshe Kol <[email protected]> Reported-by: Yossi Gilad <[email protected]> Reported-by: Amit Klein <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04tcp: dynamically allocate the perturb table used by source portsWilly Tarreau1-2/+10
We'll need to further increase the size of this table and it's likely that at some point its size will not be suitable anymore for a static table. Let's allocate it on boot from inet_hashinfo2_init(), which is called from tcp_init(). Cc: Moshe Kol <[email protected]> Cc: Yossi Gilad <[email protected]> Cc: Amit Klein <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04tcp: add small random increments to the source portWilly Tarreau1-4/+5
Here we're randomly adding between 0 and 7 random increments to the selected source port in order to add some noise in the source port selection that will make the next port less predictable. With the default port range of 32768-60999 this means a worst case reuse scenario of 14116/8=1764 connections between two consecutive uses of the same port, with an average of 14116/4.5=3137. This code was stressed at more than 800000 connections per second to a fixed target with all connections closed by the client using RSTs (worst condition) and only 2 connections failed among 13 billion, despite the hash being reseeded every 10 seconds, indicating a perfectly safe situation. Cc: Moshe Kol <[email protected]> Cc: Yossi Gilad <[email protected]> Cc: Amit Klein <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04tcp: resalt the secret every 10 secondsEric Dumazet1-3/+9
In order to limit the ability for an observer to recognize the source ports sequence used to contact a set of destinations, we should periodically shuffle the secret. 10 seconds looks effective enough without causing particular issues. Cc: Moshe Kol <[email protected]> Cc: Yossi Gilad <[email protected]> Cc: Amit Klein <[email protected]> Cc: Jason A. Donenfeld <[email protected]> Tested-by: Willy Tarreau <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04tcp: use different parts of the port_offset for index and offsetWilly Tarreau1-1/+1
Amit Klein suggests that we use different parts of port_offset for the table's index and the port offset so that there is no direct relation between them. Cc: Jason A. Donenfeld <[email protected]> Cc: Moshe Kol <[email protected]> Cc: Yossi Gilad <[email protected]> Cc: Amit Klein <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04secure_seq: use the 64 bits of the siphash for port offset calculationWilly Tarreau5-11/+13
SipHash replaced MD5 in secure_ipv{4,6}_port_ephemeral() via commit 7cd23e5300c1 ("secure_seq: use SipHash in place of MD5"), but the output remained truncated to 32-bit only. In order to exploit more bits from the hash, let's make the functions return the full 64-bit of siphash_3u32(). We also make sure the port offset calculation in __inet_hash_connect() remains done on 32-bit to avoid the need for div_u64_rem() and an extra cost on 32-bit systems. Cc: Jason A. Donenfeld <[email protected]> Cc: Moshe Kol <[email protected]> Cc: Yossi Gilad <[email protected]> Cc: Amit Klein <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Signed-off-by: Willy Tarreau <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04Merge branch 'wireguard-patches-for-5-18-rc6'Jakub Kicinski21-99/+228
Jason A. Donenfeld says: ==================== wireguard patches for 5.18-rc6 In working on some other problems, I wound up leaning on the WireGuard CI more than usual and uncovered a few small issues with reliability. These are fairly low key changes, since they don't impact kernel code itself. One change does stick out in particular, though, which is the "make routing loop test non-fatal" commit. I'm not thrilled about doing this, but currently [1] remains unsolved, and I'm still working on a real solution to that (hopefully for 5.19 or 5.20 if I can come up with a good idea...), so for now that test just prints a big red warning instead. [1] https://lore.kernel.org/netdev/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04wireguard: selftests: set panic_on_warn=1 from cmdlineJason A. Donenfeld18-23/+17
Rather than setting this once init is running, set panic_on_warn from the kernel command line, so that it catches splats from WireGuard initialization code and the various crypto selftests. Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04wireguard: selftests: bump package depsJason A. Donenfeld1-9/+9
Use newer, more reliable package dependencies. These should hopefully reduce flakes. However, we keep the old iputils package, as it accumulated bugs after resulting in flakes on slow machines. Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04wireguard: selftests: restore support for ccacheJason A. Donenfeld2-1/+18
When moving to non-system toolchains, we inadvertantly killed the ability to use ccache. So instead, build ccache support into the test harness directly. Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04wireguard: selftests: use newer toolchains to fill out architecturesJason A. Donenfeld9-63/+169
Rather than relying on the system to have cross toolchains available, simply download musl.cc's ones and use that libc.so, and then we use it to fill in a few missing platforms, such as riscv64, riscv64, powerpc64, and s390x. Since riscv doesn't have a second serial port in its device description, we have to use virtio's vport. This is actually the same situation on ARM, but we were previously hacking QEMU up to work around this, which required a custom QEMU. Instead just do the vport trick on ARM too. Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04wireguard: selftests: limit parallelism to $(nproc) tests at onceJason A. Donenfeld1-10/+10
The parallel tests were added to catch queueing issues from multiple cores. But what happens in reality when testing tons of processes is that these separate threads wind up fighting with the scheduler, and we wind up with contention in places we don't care about that decrease the chances of hitting a bug. So just do a test with the number of CPU cores, rather than trying to scale up arbitrarily. Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-04wireguard: selftests: make routing loop test non-fatalJason A. Donenfeld1-1/+13
I hate to do this, but I still do not have a good solution to actually fix this bug across architectures. So just disable it for now, so that the CI can still deliver actionable results. This commit adds a large red warning, so that at least the failure isn't lost forever, and hopefully this can be revisited down the line. Link: https://lore.kernel.org/netdev/CAHmME9pv1x6C4TNdL6648HydD8r+txpV4hTUXOBVkrapBXH4QQ@mail.gmail.com/ Link: https://lore.kernel.org/netdev/[email protected]/ Link: https://lore.kernel.org/wireguard/CAHmME9rNnBiNvBstb7MPwK-7AmAN0sOfnhdR=eeLrowWcKxaaQ@mail.gmail.com/ Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-05-05x86/fpu: Prevent FPU state corruptionThomas Gleixner1-41/+26
The FPU usage related to task FPU management is either protected by disabling interrupts (switch_to, return to user) or via fpregs_lock() which is a wrapper around local_bh_disable(). When kernel code wants to use the FPU then it has to check whether it is possible by calling irq_fpu_usable(). But the condition in irq_fpu_usable() is wrong. It allows FPU to be used when: !in_interrupt() || interrupted_user_mode() || interrupted_kernel_fpu_idle() The latter is checking whether some other context already uses FPU in the kernel, but if that's not the case then it allows FPU to be used unconditionally even if the calling context interrupted a fpregs_lock() critical region. If that happens then the FPU state of the interrupted context becomes corrupted. Allow in kernel FPU usage only when no other context has in kernel FPU usage and either the calling context is not hard interrupt context or the hard interrupt did not interrupt a local bottomhalf disabled region. It's hard to find a proper Fixes tag as the condition was broken in one way or the other for a very long time and the eager/lazy FPU changes caused a lot of churn. Picked something remotely connected from the history. This survived undetected for quite some time as FPU usage in interrupt context is rare, but the recent changes to the random code unearthed it at least on a kernel which had FPU debugging enabled. There is probably a higher rate of silent corruption as not all issues can be detected by the FPU debugging code. This will be addressed in a subsequent change. Fixes: 5d2bd7009f30 ("x86, fpu: decouple non-lazy/eager fpu restore from xsave") Reported-by: Filipe Manana <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Tested-by: Filipe Manana <[email protected]> Reviewed-by: Borislav Petkov <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2022-05-04RDMA/rxe: Change mcg_lock to a _bh lockBob Pearson1-21/+15
rxe_mcast.c currently uses _irqsave spinlocks for rxe->mcg_lock while rxe_recv.c uses _bh spinlocks for the same lock. As there is no case where the mcg_lock can be taken from an IRQ, change these all to bh locks so we don't have confusing mismatched lock types on the same spinlock. Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bob Pearson <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2022-05-04RDMA/rxe: Do not call dev_mc_add/del() under a spinlockBob Pearson1-28/+23
These routines were not intended to be called under a spinlock and will throw debugging warnings: raw_local_irq_restore() called with IRQs enabled WARNING: CPU: 13 PID: 3107 at kernel/locking/irqflag-debug.c:10 warn_bogus_irq_restore+0x2f/0x50 CPU: 13 PID: 3107 Comm: python3 Tainted: G E 5.18.0-rc1+ #7 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 RIP: 0010:warn_bogus_irq_restore+0x2f/0x50 Call Trace: <TASK> _raw_spin_unlock_irqrestore+0x75/0x80 rxe_attach_mcast+0x304/0x480 [rdma_rxe] ib_attach_mcast+0x88/0xa0 [ib_core] ib_uverbs_attach_mcast+0x186/0x1e0 [ib_uverbs] ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0xcd/0x140 [ib_uverbs] ib_uverbs_cmd_verbs+0xdb0/0xea0 [ib_uverbs] ib_uverbs_ioctl+0xd2/0x160 [ib_uverbs] do_syscall_64+0x5c/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Move them out of the spinlock, it is OK if there is some races setting up the MC reception at the ethernet layer with rbtree lookups. Fixes: 6090a0c4c7c6 ("RDMA/rxe: Cleanup rxe_mcast.c") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Bob Pearson <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2022-05-04RDMA/siw: Fix a condition race issue in MPA request processingCheng Xu1-3/+4
The calling of siw_cm_upcall and detaching new_cep with its listen_cep should be atomistic semantics. Otherwise siw_reject may be called in a temporary state, e,g, siw_cm_upcall is called but the new_cep->listen_cep has not being cleared. This fixes a WARN: WARNING: CPU: 7 PID: 201 at drivers/infiniband/sw/siw/siw_cm.c:255 siw_cep_put+0x125/0x130 [siw] CPU: 2 PID: 201 Comm: kworker/u16:22 Kdump: loaded Tainted: G E 5.17.0-rc7 #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Workqueue: iw_cm_wq cm_work_handler [iw_cm] RIP: 0010:siw_cep_put+0x125/0x130 [siw] Call Trace: <TASK> siw_reject+0xac/0x180 [siw] iw_cm_reject+0x68/0xc0 [iw_cm] cm_work_handler+0x59d/0xe20 [iw_cm] process_one_work+0x1e2/0x3b0 worker_thread+0x50/0x3a0 ? rescuer_thread+0x390/0x390 kthread+0xe5/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: 6c52fdc244b5 ("rdma/siw: connection management") Link: https://lore.kernel.org/r/d528d83466c44687f3872eadcb8c184528b2e2d4.1650526554.git.chengyou@linux.alibaba.com Reported-by: Luis Chamberlain <[email protected]> Reviewed-by: Bernard Metzler <[email protected]> Signed-off-by: Cheng Xu <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2022-05-04dt-bindings: pci: apple,pcie: Drop max-link-speed from exampleHector Martin1-3/+0
We no longer use these since 111659c2a570 (and they never worked anyway); drop them from the example to avoid confusion. Fixes: 111659c2a570 ("arm64: dts: apple: t8103: Remove PCIe max-link-speed properties") Signed-off-by: Hector Martin <[email protected]> Reviewed-by: Alyssa Rosenzweig <[email protected]> Signed-off-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-05-04dt-bindings: Drop redundant 'maxItems/minItems' in if/then schemasRob Herring13-68/+5
Another round of removing redundant minItems/maxItems when 'items' list is specified. This time it is in if/then schemas as the meta-schema was failing to check this case. If a property has an 'items' list, then a 'minItems' or 'maxItems' with the same size as the list is redundant and can be dropped. Note that is DT schema specific behavior and not standard json-schema behavior. The tooling will fixup the final schema adding any unspecified minItems/maxItems. Signed-off-by: Rob Herring <[email protected]> Acked-By: Vinod Koul <[email protected]> Acked-by: Marc Kleine-Budde <[email protected]> Acked-by: Mark Brown <[email protected]> Acked-by: Ulf Hansson <[email protected]> # For MMC Acked-by: Jonathan Cameron <[email protected]> #for IIO Link: https://lore.kernel.org/r/[email protected]
2022-05-04dt-bindings: pinctrl: Allow values for drive-push-pull and drive-open-drainRob Herring1-2/+10
A few platforms, at91 and tegra, use drive-push-pull and drive-open-drain with a 0 or 1 value. There's not really a need for values as '1' should be equivalent to no value (it wasn't treated that way) and drive-push-pull disabled is equivalent to drive-open-drain. So dropping the value can't be done without breaking existing OSs. As we don't want new cases, mark the case with values as deprecated. Cc: Arnd Bergmann <[email protected]> Cc: Thierry Reding <[email protected]> Cc: Jonathan Hunter <[email protected]> Cc: Nicolas Ferre <[email protected]> Cc: Claudiu Beznea <[email protected]> Signed-off-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected]