aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-04netfilter: nf_tables: discard table flag update with pending basechain deletionPablo Neira Ayuso1-4/+5
Hook unregistration is deferred to the commit phase, same occurs with hook updates triggered by the table dormant flag. When both commands are combined, this results in deleting a basechain while leaving its hook still registered in the core. Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-04-04netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()Ziyang Xuan1-2/+7
nft_unregister_flowtable_type() within nf_flow_inet_module_exit() can concurrent with __nft_flowtable_type_get() within nf_tables_newflowtable(). And thhere is not any protection when iterate over nf_tables_flowtables list in __nft_flowtable_type_get(). Therefore, there is pertential data-race of nf_tables_flowtables list entry. Use list_for_each_entry_rcu() to iterate over nf_tables_flowtables list in __nft_flowtable_type_get(), and use rcu_read_lock() in the caller nft_flowtable_type_get() to protect the entire type query process. Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend") Signed-off-by: Ziyang Xuan <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-04-04netfilter: nf_tables: reject new basechain after table flag updatePablo Neira Ayuso1-0/+3
When dormant flag is toggled, hooks are disabled in the commit phase by iterating over current chains in table (existing and new). The following configuration allows for an inconsistent state: add table x add chain x y { type filter hook input priority 0; } add table x { flags dormant; } add chain x w { type filter hook input priority 1; } which triggers the following warning when trying to unregister chain w which is already unregistered. [ 127.322252] WARNING: CPU: 7 PID: 1211 at net/netfilter/core.c:50 1 __nf_unregister_net_hook+0x21a/0x260 [...] [ 127.322519] Call Trace: [ 127.322521] <TASK> [ 127.322524] ? __warn+0x9f/0x1a0 [ 127.322531] ? __nf_unregister_net_hook+0x21a/0x260 [ 127.322537] ? report_bug+0x1b1/0x1e0 [ 127.322545] ? handle_bug+0x3c/0x70 [ 127.322552] ? exc_invalid_op+0x17/0x40 [ 127.322556] ? asm_exc_invalid_op+0x1a/0x20 [ 127.322563] ? kasan_save_free_info+0x3b/0x60 [ 127.322570] ? __nf_unregister_net_hook+0x6a/0x260 [ 127.322577] ? __nf_unregister_net_hook+0x21a/0x260 [ 127.322583] ? __nf_unregister_net_hook+0x6a/0x260 [ 127.322590] ? __nf_tables_unregister_hook+0x8a/0xe0 [nf_tables] [ 127.322655] nft_table_disable+0x75/0xf0 [nf_tables] [ 127.322717] nf_tables_commit+0x2571/0x2620 [nf_tables] Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-04-04netfilter: nf_tables: flush pending destroy work before exit_net releasePablo Neira Ayuso1-0/+1
Similar to 2c9f0293280e ("netfilter: nf_tables: flush pending destroy work before netlink notifier") to address a race between exit_net and the destroy workqueue. The trace below shows an element to be released via destroy workqueue while exit_net path (triggered via module removal) has already released the set that is used in such transaction. [ 1360.547789] BUG: KASAN: slab-use-after-free in nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables] [ 1360.547861] Read of size 8 at addr ffff888140500cc0 by task kworker/4:1/152465 [ 1360.547870] CPU: 4 PID: 152465 Comm: kworker/4:1 Not tainted 6.8.0+ #359 [ 1360.547882] Workqueue: events nf_tables_trans_destroy_work [nf_tables] [ 1360.547984] Call Trace: [ 1360.547991] <TASK> [ 1360.547998] dump_stack_lvl+0x53/0x70 [ 1360.548014] print_report+0xc4/0x610 [ 1360.548026] ? __virt_addr_valid+0xba/0x160 [ 1360.548040] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 1360.548054] ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables] [ 1360.548176] kasan_report+0xae/0xe0 [ 1360.548189] ? nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables] [ 1360.548312] nf_tables_trans_destroy_work+0x3f5/0x590 [nf_tables] [ 1360.548447] ? __pfx_nf_tables_trans_destroy_work+0x10/0x10 [nf_tables] [ 1360.548577] ? _raw_spin_unlock_irq+0x18/0x30 [ 1360.548591] process_one_work+0x2f1/0x670 [ 1360.548610] worker_thread+0x4d3/0x760 [ 1360.548627] ? __pfx_worker_thread+0x10/0x10 [ 1360.548640] kthread+0x16b/0x1b0 [ 1360.548653] ? __pfx_kthread+0x10/0x10 [ 1360.548665] ret_from_fork+0x2f/0x50 [ 1360.548679] ? __pfx_kthread+0x10/0x10 [ 1360.548690] ret_from_fork_asm+0x1a/0x30 [ 1360.548707] </TASK> [ 1360.548719] Allocated by task 192061: [ 1360.548726] kasan_save_stack+0x20/0x40 [ 1360.548739] kasan_save_track+0x14/0x30 [ 1360.548750] __kasan_kmalloc+0x8f/0xa0 [ 1360.548760] __kmalloc_node+0x1f1/0x450 [ 1360.548771] nf_tables_newset+0x10c7/0x1b50 [nf_tables] [ 1360.548883] nfnetlink_rcv_batch+0xbc4/0xdc0 [nfnetlink] [ 1360.548909] nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink] [ 1360.548927] netlink_unicast+0x367/0x4f0 [ 1360.548935] netlink_sendmsg+0x34b/0x610 [ 1360.548944] ____sys_sendmsg+0x4d4/0x510 [ 1360.548953] ___sys_sendmsg+0xc9/0x120 [ 1360.548961] __sys_sendmsg+0xbe/0x140 [ 1360.548971] do_syscall_64+0x55/0x120 [ 1360.548982] entry_SYSCALL_64_after_hwframe+0x55/0x5d [ 1360.548994] Freed by task 192222: [ 1360.548999] kasan_save_stack+0x20/0x40 [ 1360.549009] kasan_save_track+0x14/0x30 [ 1360.549019] kasan_save_free_info+0x3b/0x60 [ 1360.549028] poison_slab_object+0x100/0x180 [ 1360.549036] __kasan_slab_free+0x14/0x30 [ 1360.549042] kfree+0xb6/0x260 [ 1360.549049] __nft_release_table+0x473/0x6a0 [nf_tables] [ 1360.549131] nf_tables_exit_net+0x170/0x240 [nf_tables] [ 1360.549221] ops_exit_list+0x50/0xa0 [ 1360.549229] free_exit_list+0x101/0x140 [ 1360.549236] unregister_pernet_operations+0x107/0x160 [ 1360.549245] unregister_pernet_subsys+0x1c/0x30 [ 1360.549254] nf_tables_module_exit+0x43/0x80 [nf_tables] [ 1360.549345] __do_sys_delete_module+0x253/0x370 [ 1360.549352] do_syscall_64+0x55/0x120 [ 1360.549360] entry_SYSCALL_64_after_hwframe+0x55/0x5d (gdb) list *__nft_release_table+0x473 0x1e033 is in __nft_release_table (net/netfilter/nf_tables_api.c:11354). 11349 list_for_each_entry_safe(flowtable, nf, &table->flowtables, list) { 11350 list_del(&flowtable->list); 11351 nft_use_dec(&table->use); 11352 nf_tables_flowtable_destroy(flowtable); 11353 } 11354 list_for_each_entry_safe(set, ns, &table->sets, list) { 11355 list_del(&set->list); 11356 nft_use_dec(&table->use); 11357 if (set->flags & (NFT_SET_MAP | NFT_SET_OBJECT)) 11358 nft_map_deactivate(&ctx, set); (gdb) [ 1360.549372] Last potentially related work creation: [ 1360.549376] kasan_save_stack+0x20/0x40 [ 1360.549384] __kasan_record_aux_stack+0x9b/0xb0 [ 1360.549392] __queue_work+0x3fb/0x780 [ 1360.549399] queue_work_on+0x4f/0x60 [ 1360.549407] nft_rhash_remove+0x33b/0x340 [nf_tables] [ 1360.549516] nf_tables_commit+0x1c6a/0x2620 [nf_tables] [ 1360.549625] nfnetlink_rcv_batch+0x728/0xdc0 [nfnetlink] [ 1360.549647] nfnetlink_rcv+0x1a8/0x1e0 [nfnetlink] [ 1360.549671] netlink_unicast+0x367/0x4f0 [ 1360.549680] netlink_sendmsg+0x34b/0x610 [ 1360.549690] ____sys_sendmsg+0x4d4/0x510 [ 1360.549697] ___sys_sendmsg+0xc9/0x120 [ 1360.549706] __sys_sendmsg+0xbe/0x140 [ 1360.549715] do_syscall_64+0x55/0x120 [ 1360.549725] entry_SYSCALL_64_after_hwframe+0x55/0x5d Fixes: 0935d5588400 ("netfilter: nf_tables: asynchronous release") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-04-04netfilter: nf_tables: release mutex after nft_gc_seq_end from abort pathPablo Neira Ayuso1-5/+8
The commit mutex should not be released during the critical section between nft_gc_seq_begin() and nft_gc_seq_end(), otherwise, async GC worker could collect expired objects and get the released commit lock within the same GC sequence. nf_tables_module_autoload() temporarily releases the mutex to load module dependencies, then it goes back to replay the transaction again. Move it at the end of the abort phase after nft_gc_seq_end() is called. Cc: [email protected] Fixes: 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path") Reported-by: Kuan-Ting Chen <[email protected]> Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-04-04netfilter: nf_tables: release batch on table validation from abort pathPablo Neira Ayuso1-5/+10
Unlike early commit path stage which triggers a call to abort, an explicit release of the batch is required on abort, otherwise mutex is released and commit_list remains in place. Add WARN_ON_ONCE to ensure commit_list is empty from the abort path before releasing the mutex. After this patch, commit_list is always assumed to be empty before grabbing the mutex, therefore 03c1f1ef1584 ("netfilter: Cleanup nft_net->module_list from nf_tables_exit_net()") only needs to release the pending modules for registration. Cc: [email protected] Fixes: c0391b6ab810 ("netfilter: nf_tables: missing validation from the abort path") Signed-off-by: Pablo Neira Ayuso <[email protected]>
2024-04-04Revert "tg3: Remove residual error handling in tg3_suspend"Paolo Abeni1-4/+26
This reverts commit 9ab4ad295622a3481818856762471c1f8c830e18. I went out of coffee and applied it to the wrong tree. Blame on me. Signed-off-by: Paolo Abeni <[email protected]>
2024-04-04x86/CPU/AMD: Track SNP host status with cc_platform_*()Borislav Petkov (AMD)8-39/+49
The host SNP worthiness can determined later, after alternatives have been patched, in snp_rmptable_init() depending on cmdline options like iommu=pt which is incompatible with SNP, for example. Which means that one cannot use X86_FEATURE_SEV_SNP and will need to have a special flag for that control. Use that newly added CC_ATTR_HOST_SEV_SNP in the appropriate places. Move kdump_sev_callback() to its rightful place, while at it. Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support") Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Tested-by: Srikanth Aithal <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-04-04x86/cc: Add cc_platform_set/_clear() helpersBorislav Petkov (AMD)2-0/+64
Add functionality to set and/or clear different attributes of the machine as a confidential computing platform. Add the first one too: whether the machine is running as a host for SEV-SNP guests. Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support") Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Tested-by: Srikanth Aithal <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-04-04x86/kvm/Kconfig: Have KVM_AMD_SEV select ARCH_HAS_CC_PLATFORMBorislav Petkov (AMD)1-0/+1
The functionality to load SEV-SNP guests by the host will soon rely on cc_platform* helpers because the cpu_feature* API with the early patching is insufficient when SNP support needs to be disabled late. Therefore, pull that functionality in. Fixes: 216d106c7ff7 ("x86/sev: Add SEV-SNP host initialization support") Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Tom Lendacky <[email protected]> Tested-by: Srikanth Aithal <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-04-04x86/coco: Require seeding RNG with RDRAND on CoCo systemsJason A. Donenfeld3-0/+45
There are few uses of CoCo that don't rely on working cryptography and hence a working RNG. Unfortunately, the CoCo threat model means that the VM host cannot be trusted and may actively work against guests to extract secrets or manipulate computation. Since a malicious host can modify or observe nearly all inputs to guests, the only remaining source of entropy for CoCo guests is RDRAND. If RDRAND is broken -- due to CPU hardware fault -- the RNG as a whole is meant to gracefully continue on gathering entropy from other sources, but since there aren't other sources on CoCo, this is catastrophic. This is mostly a concern at boot time when initially seeding the RNG, as after that the consequences of a broken RDRAND are much more theoretical. So, try at boot to seed the RNG using 256 bits of RDRAND output. If this fails, panic(). This will also trigger if the system is booted without RDRAND, as RDRAND is essential for a safe CoCo boot. Add this deliberately to be "just a CoCo x86 driver feature" and not part of the RNG itself. Many device drivers and platforms have some desire to contribute something to the RNG, and add_device_randomness() is specifically meant for this purpose. Any driver can call it with seed data of any quality, or even garbage quality, and it can only possibly make the quality of the RNG better or have no effect, but can never make it worse. Rather than trying to build something into the core of the RNG, consider the particular CoCo issue just a CoCo issue, and therefore separate it all out into driver (well, arch/platform) code. [ bp: Massage commit message. ] Signed-off-by: Jason A. Donenfeld <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Reviewed-by: Elena Reshetova <[email protected]> Reviewed-by: Kirill A. Shutemov <[email protected]> Reviewed-by: Theodore Ts'o <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2024-04-04tg3: Remove residual error handling in tg3_suspendNikita Kiryushin1-26/+4
As of now, tg3_power_down_prepare always ends with success, but the error handling code from former tg3_set_power_state call is still here. This code became unreachable in commit c866b7eac073 ("tg3: Do not use legacy PCI power management"). Remove (now unreachable) error handling code for simplification and change tg3_power_down_prepare to a void function as its result is no more checked. Signed-off-by: Nikita Kiryushin <[email protected]> Reviewed-by: Michael Chan <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2024-04-04memblock tests: fix undefined reference to `BIT'Wei Yang1-0/+2
commit 772dd0342727 ("mm: enumerate all gfp flags") define gfp flags with the help of BIT, while gfp_types.h doesn't include header file for the definition. This through an error on building memblock tests. Let's include linux/bits.h to fix it. Signed-off-by: Wei Yang <[email protected]> CC: Suren Baghdasaryan <[email protected]> CC: Michal Hocko <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]>
2024-04-04memblock tests: fix undefined reference to `panic'Wei Yang2-0/+20
commit e96c6b8f212a ("memblock: report failures when memblock_can_resize is not set") introduced the usage of panic, which is not defined in memblock test. Let's define it directly in panic.h to fix it. Signed-off-by: Wei Yang <[email protected]> CC: Song Shuai <[email protected]> CC: Mike Rapoport <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]>
2024-04-04memblock tests: fix undefined reference to `early_pfn_to_nid'Wei Yang1-0/+5
commit 6a9531c3a880 ("memblock: fix crash when reserved memory is not added to memory") introduce the usage of early_pfn_to_nid, which is not defined in memblock tests. The original definition of early_pfn_to_nid is defined in mm.h, so let add this in the corresponding mm.h. Signed-off-by: Wei Yang <[email protected]> CC: Yajun Deng <[email protected]> CC: Mike Rapoport <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mike Rapoport (IBM) <[email protected]>
2024-04-04x86/numa/32: Include missing <asm/pgtable_areas.h>Arnd Bergmann1-0/+1
The __vmalloc_start_set declaration is in a header that is not included in numa_32.c in current linux-next: arch/x86/mm/numa_32.c: In function 'initmem_init': arch/x86/mm/numa_32.c:57:9: error: '__vmalloc_start_set' undeclared (first use in this function) 57 | __vmalloc_start_set = true; | ^~~~~~~~~~~~~~~~~~~ arch/x86/mm/numa_32.c:57:9: note: each undeclared identifier is reported only once for each function it appears in Add an explicit #include. Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-04-04ata: sata_gemini: Check clk_enable() resultChen Ni1-1/+4
The call to clk_enable() in gemini_sata_start_bridge() can fail. Add a check to detect such failure. Signed-off-by: Chen Ni <[email protected]> Signed-off-by: Damien Le Moal <[email protected]>
2024-04-04ata: sata_mv: Fix PCI device ID table declaration compilation warningArnd Bergmann1-32/+31
Building with W=1 shows a warning for an unused variable when CONFIG_PCI is diabled: drivers/ata/sata_mv.c:790:35: error: unused variable 'mv_pci_tbl' [-Werror,-Wunused-const-variable] static const struct pci_device_id mv_pci_tbl[] = { Move the table into the same block that containsn the pci_driver definition. Fixes: 7bb3c5290ca0 ("sata_mv: Remove PCI dependency") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Damien Le Moal <[email protected]>
2024-04-03net: mana: Fix Rx DMA datasize and skb_over_panicHaiyang Zhang2-2/+1
mana_get_rxbuf_cfg() aligns the RX buffer's DMA datasize to be multiple of 64. So a packet slightly bigger than mtu+14, say 1536, can be received and cause skb_over_panic. Sample dmesg: [ 5325.237162] skbuff: skb_over_panic: text:ffffffffc043277a len:1536 put:1536 head:ff1100018b517000 data:ff1100018b517100 tail:0x700 end:0x6ea dev:<NULL> [ 5325.243689] ------------[ cut here ]------------ [ 5325.245748] kernel BUG at net/core/skbuff.c:192! [ 5325.247838] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 5325.258374] RIP: 0010:skb_panic+0x4f/0x60 [ 5325.302941] Call Trace: [ 5325.304389] <IRQ> [ 5325.315794] ? skb_panic+0x4f/0x60 [ 5325.317457] ? asm_exc_invalid_op+0x1f/0x30 [ 5325.319490] ? skb_panic+0x4f/0x60 [ 5325.321161] skb_put+0x4e/0x50 [ 5325.322670] mana_poll+0x6fa/0xb50 [mana] [ 5325.324578] __napi_poll+0x33/0x1e0 [ 5325.326328] net_rx_action+0x12e/0x280 As discussed internally, this alignment is not necessary. To fix this bug, remove it from the code. So oversized packets will be marked as CQE_RX_TRUNCATED by NIC, and dropped. Cc: [email protected] Fixes: 2fbbd712baf1 ("net: mana: Enable RX path to handle various MTU sizes") Signed-off-by: Haiyang Zhang <[email protected]> Reviewed-by: Dexuan Cui <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-03net/sched: fix lockdep splat in qdisc_tree_reduce_backlog()Eric Dumazet1-1/+1
qdisc_tree_reduce_backlog() is called with the qdisc lock held, not RTNL. We must use qdisc_lookup_rcu() instead of qdisc_lookup() syzbot reported: WARNING: suspicious RCU usage 6.1.74-syzkaller #0 Not tainted ----------------------------- net/sched/sch_api.c:305 suspicious rcu_dereference_protected() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 3 locks held by udevd/1142: #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:306 [inline] #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:747 [inline] #0: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: net_tx_action+0x64a/0x970 net/core/dev.c:5282 #1: ffff888171861108 (&sch->q.lock){+.-.}-{2:2}, at: spin_lock include/linux/spinlock.h:350 [inline] #1: ffff888171861108 (&sch->q.lock){+.-.}-{2:2}, at: net_tx_action+0x754/0x970 net/core/dev.c:5297 #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_lock_acquire include/linux/rcupdate.h:306 [inline] #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: rcu_read_lock include/linux/rcupdate.h:747 [inline] #2: ffffffff87c729a0 (rcu_read_lock){....}-{1:2}, at: qdisc_tree_reduce_backlog+0x84/0x580 net/sched/sch_api.c:792 stack backtrace: CPU: 1 PID: 1142 Comm: udevd Not tainted 6.1.74-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Call Trace: <TASK> [<ffffffff85b85f14>] __dump_stack lib/dump_stack.c:88 [inline] [<ffffffff85b85f14>] dump_stack_lvl+0x1b1/0x28f lib/dump_stack.c:106 [<ffffffff85b86007>] dump_stack+0x15/0x1e lib/dump_stack.c:113 [<ffffffff81802299>] lockdep_rcu_suspicious+0x1b9/0x260 kernel/locking/lockdep.c:6592 [<ffffffff84f0054c>] qdisc_lookup+0xac/0x6f0 net/sched/sch_api.c:305 [<ffffffff84f037c3>] qdisc_tree_reduce_backlog+0x243/0x580 net/sched/sch_api.c:811 [<ffffffff84f5b78c>] pfifo_tail_enqueue+0x32c/0x4b0 net/sched/sch_fifo.c:51 [<ffffffff84fbcf63>] qdisc_enqueue include/net/sch_generic.h:833 [inline] [<ffffffff84fbcf63>] netem_dequeue+0xeb3/0x15d0 net/sched/sch_netem.c:723 [<ffffffff84eecab9>] dequeue_skb net/sched/sch_generic.c:292 [inline] [<ffffffff84eecab9>] qdisc_restart net/sched/sch_generic.c:397 [inline] [<ffffffff84eecab9>] __qdisc_run+0x249/0x1e60 net/sched/sch_generic.c:415 [<ffffffff84d7aa96>] qdisc_run+0xd6/0x260 include/net/pkt_sched.h:125 [<ffffffff84d85d29>] net_tx_action+0x7c9/0x970 net/core/dev.c:5313 [<ffffffff85e002bd>] __do_softirq+0x2bd/0x9bd kernel/softirq.c:616 [<ffffffff81568bca>] invoke_softirq kernel/softirq.c:447 [inline] [<ffffffff81568bca>] __irq_exit_rcu+0xca/0x230 kernel/softirq.c:700 [<ffffffff81568ae9>] irq_exit_rcu+0x9/0x20 kernel/softirq.c:712 [<ffffffff85b89f52>] sysvec_apic_timer_interrupt+0x42/0x90 arch/x86/kernel/apic/apic.c:1107 [<ffffffff85c00ccb>] asm_sysvec_apic_timer_interrupt+0x1b/0x20 arch/x86/include/asm/idtentry.h:656 Fixes: d636fc5dd692 ("net: sched: add rcu annotations around qdisc->qdisc_sleeping") Reported-by: syzbot <[email protected]> Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: Jiri Pirko <[email protected]> Acked-by: Jamal Hadi Salim <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-03net: phy: micrel: lan8814: Fix when enabling/disabling 1-step timestampingHoratiu Vultur1-2/+8
There are 2 issues with the blamed commit. 1. When the phy is initialized, it would enable the disabled of UDPv4 checksums. The UDPv6 checksum is already enabled by default. So when 1-step is configured then it would clear these flags. 2. After the 1-step is configured, then if 2-step is configured then the 1-step would be still configured because it is not clearing the flag. So the sync frames will still have origin timestamps set. Fix this by reading first the value of the register and then just change bit 12 as this one determines if the timestamp needs to be inserted in the frame, without changing any other bits. Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy") Signed-off-by: Horatiu Vultur <[email protected]> Reviewed-by: Divya Koppera <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-03net: stmmac: fix rx queue priority assignmentPiotr Wejman2-16/+62
The driver should ensure that same priority is not mapped to multiple rx queues. From DesignWare Cores Ethernet Quality-of-Service Databook, section 17.1.29 MAC_RxQ_Ctrl2: "[...]The software must ensure that the content of this field is mutually exclusive to the PSRQ fields for other queues, that is, the same priority is not mapped to multiple Rx queues[...]" Previously rx_queue_priority() function was: - clearing all priorities from a queue - adding new priorities to that queue After this patch it will: - first assign new priorities to a queue - then remove those priorities from all other queues - keep other priorities previously assigned to that queue Fixes: a8f5102af2a7 ("net: stmmac: TX and RX queue priority configuration") Fixes: 2142754f8b9c ("net: stmmac: Add MAC related callbacks for XGMAC2") Signed-off-by: Piotr Wejman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-03net: txgbe: fix i2c dev name cannot match clkdevDuanqiang Wen1-3/+5
txgbe clkdev shortened clk_name, so i2c_dev info_name also need to shorten. Otherwise, i2c_dev cannot initialize clock. Fixes: e30cef001da2 ("net: txgbe: fix clk_name exceed MAX_DEV_ID limits") Signed-off-by: Duanqiang Wen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-03Merge branch 'net-fec-fix-to-suspend-resume-with-mac_managed_pm'Jakub Kicinski1-2/+9
John Ernberg says: ==================== net: fec: Fix to suspend / resume with mac_managed_pm Since the introduction of mac_managed_pm in the FEC driver there were some discrepancies regarding power management of the PHY. This failed on our board that has a permanently powered Microchip LAN8700R attached to the FEC. Although the root cause of the failure can be traced back to f166f890c8f0 ("net: ethernet: fec: Replace interrupt driven MDIO with polled IO") and probably even before that, we only started noticing the problem going from 5.10 to 6.1. Since 557d5dc83f68 ("net: fec: use mac-managed PHY PM") is actually a fix to most of the power management sequencing problems that came with power managing the MDIO bus which for the FEC meant adding a race with FEC resume (and phy_start() if netif was running) and PHY resume. That it worked before for us was probably just luck... Thanks to Wei's response to my report at [1] I was able to pick up his patch and start honing in on the remaining missing details. [1]: https://lore.kernel.org/netdev/[email protected]/ v3: https://lore.kernel.org/netdev/[email protected]/ v2: https://lore.kernel.org/netdev/[email protected]/ v1: https://lore.kernel.org/netdev/[email protected]/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-03net: fec: Set mac_managed_pm during probeWei Fang1-2/+9
Setting mac_managed_pm during interface up is too late. In situations where the link is not brought up yet and the system suspends the regular PHY power management will run. Since the FEC ETHEREN control bit is cleared (automatically) on suspend the controller is off in resume. When the regular PHY power management resume path runs in this context it will write to the MII_DATA register but nothing will be transmitted on the MDIO bus. This can be observed by the following log: fec 5b040000.ethernet eth0: MDIO read timeout Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110 Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: error -110 The data written will however remain in the MII_DATA register. When the link later is set to administrative up it will trigger a call to fec_restart() which will restore the MII_SPEED register. This triggers the quirk explained in f166f890c8f0 ("net: ethernet: fec: Replace interrupt driven MDIO with polled IO") causing an extra MII_EVENT. This extra event desynchronizes all the MDIO register reads, causing them to complete too early. Leading all reads to read as 0 because fec_enet_mdio_wait() returns too early. When a Microchip LAN8700R PHY is connected to the FEC, the 0 reads causes the PHY to be initialized incorrectly and the PHY will not transmit any ethernet signal in this state. It cannot be brought out of this state without a power cycle of the PHY. Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM") Closes: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Wei Fang <[email protected]> [jernberg: commit message] Signed-off-by: John Ernberg <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-03bcachefs: create debugfs dir for each btreeThomas Bertschinger1-15/+15
This creates a subdirectory for each individual btree under the btrees/ debugfs directory. Directory structure, before: /sys/kernel/debug/bcachefs/$FS_ID/btrees/ ├── alloc ├── alloc-bfloat-failed ├── alloc-formats ├── backpointers ├── backpointers-bfloat-failed ├── backpointers-formats ... Directory structure, after: /sys/kernel/debug/bcachefs/$FS_ID/btrees/ ├── alloc │   ├── bfloat-failed │   ├── formats │   └── keys ├── backpointers │   ├── bfloat-failed │   ├── formats │   └── keys ... Signed-off-by: Thomas Bertschinger <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-04-03riscv: Fix vector state restore in rt_sigreturn()Björn Töpel1-7/+8
The RISC-V Vector specification states in "Appendix D: Calling Convention for Vector State" [1] that "Executing a system call causes all caller-saved vector registers (v0-v31, vl, vtype) and vstart to become unspecified.". In the RISC-V kernel this is called "discarding the vstate". Returning from a signal handler via the rt_sigreturn() syscall, vector discard is also performed. However, this is not an issue since the vector state should be restored from the sigcontext, and therefore not care about the vector discard. The "live state" is the actual vector register in the running context, and the "vstate" is the vector state of the task. A dirty live state, means that the vstate and live state are not in synch. When vectorized user_from_copy() was introduced, an bug sneaked in at the restoration code, related to the discard of the live state. An example when this go wrong: 1. A userland application is executing vector code 2. The application receives a signal, and the signal handler is entered. 3. The application returns from the signal handler, using the rt_sigreturn() syscall. 4. The live vector state is discarded upon entering the rt_sigreturn(), and the live state is marked as "dirty", indicating that the live state need to be synchronized with the current vstate. 5. rt_sigreturn() restores the vstate, except the Vector registers, from the sigcontext 6. rt_sigreturn() restores the Vector registers, from the sigcontext, and now the vectorized user_from_copy() is used. The dirty live state from the discard is saved to the vstate, making the vstate corrupt. 7. rt_sigreturn() returns to the application, which crashes due to corrupted vstate. Note that the vectorized user_from_copy() is invoked depending on the value of CONFIG_RISCV_ISA_V_UCOPY_THRESHOLD. Default is 768, which means that vlen has to be larger than 128b for this bug to trigger. The fix is simply to mark the live state as non-dirty/clean prior performing the vstate restore. Link: https://github.com/riscv/riscv-isa-manual/releases/download/riscv-isa-release-8abdb41-2024-03-26/unpriv-isa-asciidoc.pdf # [1] Reported-by: Charlie Jenkins <[email protected]> Reported-by: Vineet Gupta <[email protected]> Fixes: c2a658d41924 ("riscv: lib: vectorize copy_to_user/copy_from_user") Signed-off-by: Björn Töpel <[email protected]> Reviewed-by: Andy Chiu <[email protected]> Tested-by: Vineet Gupta <[email protected]> Link: https://lore.kernel.org/r/[email protected] Cc: [email protected] Signed-off-by: Palmer Dabbelt <[email protected]>
2024-04-04i2c: pxa: hide unused icr_bits[] variableArnd Bergmann1-1/+1
The function using this is hidden in an #ifdef, so the variable needs the same one for a clean W=1 build: drivers/i2c/busses/i2c-pxa.c:327:26: error: 'icr_bits' defined but not used [-Werror=unused-const-variable=] Fixes: d6a7b5f84b5c ("[ARM] 4827/1: fix two warnings in drivers/i2c/busses/i2c-pxa.c") Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Andi Shyti <[email protected]>
2024-04-03randomize_kstack: Improve entropy diffusionKees Cook1-1/+1
The kstack_offset variable was really only ever using the low bits for kernel stack offset entropy. Add a ror32() to increase bit diffusion. Suggested-by: Arnd Bergmann <[email protected]> Fixes: 39218ff4c625 ("stack: Optionally randomize kernel stack offset each syscall") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-04-03ubsan: fix unused variable warning in test moduleArnd Bergmann1-1/+1
This is one of the drivers with an unused variable that is marked 'const'. Adding a __used annotation here avoids the warning and lets us enable the option by default: lib/test_ubsan.c:137:28: error: unused variable 'skip_ubsan_array' [-Werror,-Wunused-const-variable] Fixes: 4a26f49b7b3d ("ubsan: expand tests and reporting") Signed-off-by: Arnd Bergmann <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-04-03gcc-plugins/stackleak: Avoid .head.text sectionArd Biesheuvel1-0/+2
The .head.text section carries the startup code that runs with the MMU off or with a translation of memory that deviates from the ordinary one. So avoid instrumentation with the stackleak plugin, which already avoids .init.text and .noinstr.text entirely. Fixes: 48204aba801f1b51 ("x86/sme: Move early SME kernel encryption handling into .head.text") Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-lkp/[email protected] Signed-off-by: Ard Biesheuvel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-04-03cxl/core: Fix initialization of mbox_cmd.size_out in get eventKwangjin Ko1-1/+2
Since mbox_cmd.size_out is overwritten with the actual output size in the function below, it needs to be initialized every time. cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd Problem scenario: 1) The size_out variable is initially set to the size of the mailbox. 2) Read an event. - size_out is set to 160 bytes(header 32B + one event 128B). - Two event are created while reading. 3) Read the new *two* events. - size_out is still set to 160 bytes. - Although the value of out_len is 288 bytes, only 160 bytes are copied from the mailbox register to the local variable. - record_count is set to 2. - Accessing records[1] will result in reading incorrect data. Fixes: 6ebe28f9ec72 ("cxl/mem: Read, trace, and clear events on driver load") Tested-by: Ira Weiny <[email protected]> Reviewed-by: Ira Weiny <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Signed-off-by: Kwangjin Ko <[email protected]> Signed-off-by: Dave Jiang <[email protected]>
2024-04-03idpf: fix kernel panic on unknown packet typesJoshua Hay1-2/+2
In the very rare case where a packet type is unknown to the driver, idpf_rx_process_skb_fields would return early without calling eth_type_trans to set the skb protocol / the network layer handler. This is especially problematic if tcpdump is running when such a packet is received, i.e. it would cause a kernel panic. Instead, call eth_type_trans for every single packet, even when the packet type is unknown. Fixes: 3a8845af66ed ("idpf: add RX splitq napi poll support") Reported-by: Balazs Nemeth <[email protected]> Signed-off-by: Joshua Hay <[email protected]> Reviewed-by: Jesse Brandeburg <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Tested-by: Salvatore Daniele <[email protected]> Signed-off-by: Pavan Kumar Linga <[email protected]> Tested-by: Krishneil Singh <[email protected]> Signed-off-by: Tony Nguyen <[email protected]>
2024-04-03vdso: Use CONFIG_PAGE_SHIFT in vdso/datapage.hArnd Bergmann2-9/+2
Both the vdso rework and the CONFIG_PAGE_SHIFT changes were merged during the v6.9 merge window, so it is now possible to use CONFIG_PAGE_SHIFT instead of including asm/page.h in the vdso. This avoids the workaround for arm64 - commit 8b3843ae3634 ("vdso/datapage: Quick fix - use asm/page-def.h for ARM64") and addresses a build warning for powerpc64: In file included from <built-in>:4: In file included from /home/arnd/arm-soc/arm-soc/lib/vdso/gettimeofday.c:5: In file included from ../include/vdso/datapage.h:25: arch/powerpc/include/asm/page.h:230:9: error: result of comparison of constant 13835058055282163712 with expression of type 'unsigned long' is always true [-Werror,-Wtautological-constant-out-of-range-compare] 230 | return __pa(kaddr) >> PAGE_SHIFT; | ^~~~~~~~~~~ arch/powerpc/include/asm/page.h:217:37: note: expanded from macro '__pa' 217 | VIRTUAL_WARN_ON((unsigned long)(x) < PAGE_OFFSET); \ | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~ arch/powerpc/include/asm/page.h:202:73: note: expanded from macro 'VIRTUAL_WARN_ON' 202 | #define VIRTUAL_WARN_ON(x) WARN_ON(IS_ENABLED(CONFIG_DEBUG_VIRTUAL) && (x)) | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~ arch/powerpc/include/asm/bug.h:88:25: note: expanded from macro 'WARN_ON' 88 | int __ret_warn_on = !!(x); \ | ^ Signed-off-by: Arnd Bergmann <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Kees Cook <[email protected]> Acked-by: Michael Ellerman <[email protected]> (powerpc) Link: https://lore.kernel.org/r/[email protected]
2024-04-03smb: client: fix potential UAF in cifs_signal_cifsd_for_reconnect()Paulo Alcantara1-0/+2
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF. Cc: [email protected] Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-03smb: client: fix potential UAF in smb2_is_network_name_deleted()Paulo Alcantara1-0/+2
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF. Cc: [email protected] Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-03smb: client: fix potential UAF in is_valid_oplock_break()Paulo Alcantara1-0/+2
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF. Cc: [email protected] Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-03smb: client: fix potential UAF in smb2_is_valid_oplock_break()Paulo Alcantara1-0/+2
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF. Cc: [email protected] Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-03smb: client: fix potential UAF in smb2_is_valid_lease_break()Paulo Alcantara1-0/+2
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF. Cc: [email protected] Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-03smb: client: fix potential UAF in cifs_stats_proc_show()Paulo Alcantara1-0/+2
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF. Cc: [email protected] Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-03smb: client: fix potential UAF in cifs_stats_proc_write()Paulo Alcantara1-0/+2
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF. Cc: [email protected] Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-03smb: client: fix potential UAF in cifs_dump_full_key()Paulo Alcantara1-1/+5
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF. Cc: [email protected] Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-03smb: client: fix potential UAF in cifs_debug_files_proc_show()Paulo Alcantara2-0/+12
Skip sessions that are being teared down (status == SES_EXITING) to avoid UAF. Cc: [email protected] Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-03smb3: retrying on failed server closeRitvik Budhiraja7-17/+85
In the current implementation, CIFS close sends a close to the server and does not check for the success of the server close. This patch adds functionality to check for server close return status and retries in case of an EBUSY or EAGAIN error. This can help avoid handle leaks Cc: [email protected] Signed-off-by: Ritvik Budhiraja <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-03nios2: Only use built-in devicetree blob if configured to do soGuenter Roeck1-1/+5
Starting with commit 7b937cc243e5 ("of: Create of_root if no dtb provided by firmware"), attempts to boot nios2 images with an external devicetree blob result in a crash. Kernel panic - not syncing: early_init_dt_alloc_memory_arch: Failed to allocate 72 bytes align=0x40 For nios2, a built-in devicetree blob always overrides devicetree blobs provided by ROMMON/BIOS. This includes the new dummy devicetree blob. Result is that the dummy devicetree blob is used even if an external devicetree blob is provided. Since the dummy devicetree blob does not include any memory information, memory allocations fail, resulting in the crash. To fix the problem, only use the built-in devicetree blob if CONFIG_NIOS2_DTB_SOURCE_BOOL is enabled. Fixes: 7b937cc243e5 ("of: Create of_root if no dtb provided by firmware") Cc: Frank Rowand <[email protected]> Cc: Stephen Boyd <[email protected]> Cc: Rob Herring <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Reviewed-by: Rob Herring <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Rob Herring <[email protected]>
2024-04-03bcachefs: reconstruct_inode()Kent Overstreet1-2/+50
If an inode is missing, but corresponding extents and dirent still exist, it's well worth recreating it - this does so. Signed-off-by: Kent Overstreet <[email protected]>
2024-04-03bcachefs: Subvolume reconstructionKent Overstreet1-19/+148
We can now recreate missing subvolumes from dirents and/or inodes. Signed-off-by: Kent Overstreet <[email protected]>
2024-04-03bcachefs: Check for extents that point to same spaceKent Overstreet2-8/+168
In backpointer repair, if we get a missing backpointer - but there's already a backpointer that points to an existing extent - we've got multiple extents that point to the same space and need to decide which to keep. Signed-off-by: Kent Overstreet <[email protected]>
2024-04-03bcachefs: Reconstruct missing snapshot nodesKent Overstreet6-6/+199
When the snapshots btree is going, we'll have to delete huge amounts of data - unless we can reconstruct it by looking at the keys that refer to it. Signed-off-by: Kent Overstreet <[email protected]>
2024-04-03bcachefs: Flag btrees with missing dataKent Overstreet6-5/+44
We need this to know when we should attempt to reconstruct the snapshots btree Signed-off-by: Kent Overstreet <[email protected]>