aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-04-03net: bcmgenet: Reset RBUF on first openPhil Elwell1-4/+12
If the RBUF logic is not reset when the kernel starts then there may be some data left over from any network boot loader. If the 64-byte packet headers are enabled then this can be fatal. Extend bcmgenet_dma_disable to do perform the reset, but not when called from bcmgenet_resume in order to preserve a wake packet. N.B. This different handling of resume is just based on a hunch - why else wouldn't one reset the RBUF as well as the TBUF? If this isn't the case then it's easy to change the patch to make the RBUF reset unconditional. See: https://github.com/raspberrypi/linux/issues/3850 See: https://github.com/raspberrypi/firmware/issues/1882 Signed-off-by: Phil Elwell <[email protected]> Signed-off-by: Maarten Vanraes <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-03spi: mchp-pci1xxx: Fix a possible null pointer dereference in pci1xxx_spi_probeHuai-Yuan Liu1-0/+2
In function pci1xxxx_spi_probe, there is a potential null pointer that may be caused by a failed memory allocation by the function devm_kzalloc. Hence, a null pointer check needs to be added to prevent null pointer dereferencing later in the code. To fix this issue, spi_bus->spi_int[iter] should be checked. The memory allocated by devm_kzalloc will be automatically released, so just directly return -ENOMEM without worrying about memory leaks. Fixes: 1cc0cbea7167 ("spi: microchip: pci1xxxx: Add driver for SPI controller of PCI1XXXX PCIe switch") Signed-off-by: Huai-Yuan Liu <[email protected]> Link: https://msgid.link/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2024-04-03spi: spi-fsl-lpspi: remove redundant spi_controller_put callCarlos Song1-8/+6
devm_spi_alloc_controller will allocate an SPI controller and automatically release a reference on it when dev is unbound from its driver. It doesn't need to call spi_controller_put explicitly to put the reference when lpspi driver failed initialization. Fixes: 2ae0ab0143fc ("spi: lpspi: Avoid potential use-after-free in probe()") Signed-off-by: Carlos Song <[email protected]> Reviewed-by: Alexander Sverdlin <[email protected]> Link: https://msgid.link/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2024-04-03octeontx2-af: Add array index checkAleksandr Mishin1-0/+2
In rvu_map_cgx_lmac_pf() the 'iter', which is used as an array index, can reach value (up to 14) that exceed the size (MAX_LMAC_COUNT = 8) of the array. Fix this bug by adding 'iter' value check. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support") Signed-off-by: Aleksandr Mishin <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-04-03perf/x86/intel/ds: Don't clear ->pebs_data_cfg for the last PEBS eventKan Liang1-4/+4
The MSR_PEBS_DATA_CFG MSR register is used to configure which data groups should be generated into a PEBS record, and it's shared among all counters. If there are different configurations among counters, perf combines all the configurations. The first perf command as below requires a complete PEBS record (including memory info, GPRs, XMMs, and LBRs). The second perf command only requires a basic group. However, after the second perf command is running, the MSR_PEBS_DATA_CFG register is cleared. Only a basic group is generated in a PEBS record, which is wrong. The required information for the first perf command is missed. $ perf record --intr-regs=AX,SP,XMM0 -a -C 8 -b -W -d -c 100000003 -o /dev/null -e cpu/event=0xd0,umask=0x81/upp & $ sleep 5 $ perf record --per-thread -c 1 -e cycles:pp --no-timestamp --no-tid taskset -c 8 ./noploop 1000 The first PEBS event is a system-wide PEBS event. The second PEBS event is a per-thread event. When the thread is scheduled out, the intel_pmu_pebs_del() function is invoked to update the PEBS state. Since the system-wide event is still available, the cpuc->n_pebs is 1. The cpuc->pebs_data_cfg is cleared. The data configuration for the system-wide PEBS event is lost. The (cpuc->n_pebs == 1) check was introduced in commit: b6a32f023fcc ("perf/x86: Fix PEBS threshold initialization") At that time, it indeed didn't hurt whether the state was updated during the removal, because only the threshold is updated. The calculation of the threshold takes the last PEBS event into account. However, since commit: b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG") we delay the threshold update, and clear the PEBS data config, which triggers the bug. The PEBS data config update scope should not be shrunk during removal. [ mingo: Improved the changelog & comments. ] Fixes: b752ea0c28e3 ("perf/x86/intel/ds: Flush PEBS DS when changing PEBS_DATA_CFG") Reported-by: Stephane Eranian <[email protected]> Signed-off-by: Kan Liang <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Cc: [email protected] Link: https://lore.kernel.org/r/[email protected]
2024-04-03x86/resctrl: Fix uninitialized memory read when last CPU of domain goes offlineReinette Chatre1-1/+2
Tony encountered this OOPS when the last CPU of a domain goes offline while running a kernel built with CONFIG_NO_HZ_FULL: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI ... RIP: 0010:__find_nth_andnot_bit+0x66/0x110 ... Call Trace: <TASK> ? __die() ? page_fault_oops() ? exc_page_fault() ? asm_exc_page_fault() cpumask_any_housekeeping() mbm_setup_overflow_handler() resctrl_offline_cpu() resctrl_arch_offline_cpu() cpuhp_invoke_callback() cpuhp_thread_fun() smpboot_thread_fn() kthread() ret_from_fork() ret_from_fork_asm() </TASK> The NULL pointer dereference is encountered while searching for another online CPU in the domain (of which there are none) that can be used to run the MBM overflow handler. Because the kernel is configured with CONFIG_NO_HZ_FULL the search for another CPU (in its effort to prefer those CPUs that aren't marked nohz_full) consults the mask representing the nohz_full CPUs, tick_nohz_full_mask. On a kernel with CONFIG_CPUMASK_OFFSTACK=y tick_nohz_full_mask is not allocated unless the kernel is booted with the "nohz_full=" parameter and because of that any access to tick_nohz_full_mask needs to be guarded with tick_nohz_full_enabled(). Replace the IS_ENABLED(CONFIG_NO_HZ_FULL) with tick_nohz_full_enabled(). The latter ensures tick_nohz_full_mask can be accessed safely and can be used whether kernel is built with CONFIG_NO_HZ_FULL enabled or not. [ Use Ingo's suggestion that combines the two NO_HZ checks into one. ] Fixes: a4846aaf3945 ("x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow") Reported-by: Tony Luck <[email protected]> Signed-off-by: Reinette Chatre <[email protected]> Signed-off-by: Ingo Molnar <[email protected]> Reviewed-by: Babu Moger <[email protected]> Link: https://lore.kernel.org/r/ff8dfc8d3dcb04b236d523d1e0de13d2ef585223.1711993956.git.reinette.chatre@intel.com Closes: https://lore.kernel.org/lkml/ZgIFT5gZgIQ9A9G7@agluck-desk3/
2024-04-02Merge tag 'selinux-pr-20240402' of ↵Linus Torvalds1-5/+7
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux fix from Paul Moore: "A single patch for SELinux to fix a problem where we could potentially dereference an error pointer if we failed to successfully mount selinuxfs" * tag 'selinux-pr-20240402' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: avoid dereference of garbage after mount failure
2024-04-02MAINTAINERS: mlx5: Add Tariq ToukanTariq Toukan1-0/+2
Add myself as mlx5 core and EN maintainer. Signed-off-by: Tariq Toukan <[email protected]> Reviewed-by: Gal Pressman <[email protected]> Acked-by: Saeed Mahameed <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-02ipv6: Fix infinite recursion in fib6_dump_done().Kuniyuki Iwashima1-7/+7
syzkaller reported infinite recursive calls of fib6_dump_done() during netlink socket destruction. [1] From the log, syzkaller sent an AF_UNSPEC RTM_GETROUTE message, and then the response was generated. The following recvmmsg() resumed the dump for IPv6, but the first call of inet6_dump_fib() failed at kzalloc() due to the fault injection. [0] 12:01:34 executing program 3: r0 = socket$nl_route(0x10, 0x3, 0x0) sendmsg$nl_route(r0, ... snip ...) recvmmsg(r0, ... snip ...) (fail_nth: 8) Here, fib6_dump_done() was set to nlk_sk(sk)->cb.done, and the next call of inet6_dump_fib() set it to nlk_sk(sk)->cb.args[3]. syzkaller stopped receiving the response halfway through, and finally netlink_sock_destruct() called nlk_sk(sk)->cb.done(). fib6_dump_done() calls fib6_dump_end() and nlk_sk(sk)->cb.done() if it is still not NULL. fib6_dump_end() rewrites nlk_sk(sk)->cb.done() by nlk_sk(sk)->cb.args[3], but it has the same function, not NULL, calling itself recursively and hitting the stack guard page. To avoid the issue, let's set the destructor after kzalloc(). [0]: FAULT_INJECTION: forcing a failure. name failslab, interval 1, probability 0, space 0, times 0 CPU: 1 PID: 432110 Comm: syz-executor.3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:117) should_fail_ex (lib/fault-inject.c:52 lib/fault-inject.c:153) should_failslab (mm/slub.c:3733) kmalloc_trace (mm/slub.c:3748 mm/slub.c:3827 mm/slub.c:3992) inet6_dump_fib (./include/linux/slab.h:628 ./include/linux/slab.h:749 net/ipv6/ip6_fib.c:662) rtnl_dump_all (net/core/rtnetlink.c:4029) netlink_dump (net/netlink/af_netlink.c:2269) netlink_recvmsg (net/netlink/af_netlink.c:1988) ____sys_recvmsg (net/socket.c:1046 net/socket.c:2801) ___sys_recvmsg (net/socket.c:2846) do_recvmmsg (net/socket.c:2943) __x64_sys_recvmmsg (net/socket.c:3041 net/socket.c:3034 net/socket.c:3034) [1]: BUG: TASK stack guard page was hit at 00000000f2fa9af1 (stack is 00000000b7912430..000000009a436beb) stack guard page: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 223719 Comm: kworker/1:3 Not tainted 6.8.0-12821-g537c2e91d354-dirty #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Workqueue: events netlink_sock_destruct_work RIP: 0010:fib6_dump_done (net/ipv6/ip6_fib.c:570) Code: 3c 24 e8 f3 e9 51 fd e9 28 fd ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 41 57 41 56 41 55 41 54 55 48 89 fd <53> 48 8d 5d 60 e8 b6 4d 07 fd 48 89 da 48 b8 00 00 00 00 00 fc ff RSP: 0018:ffffc9000d980000 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffffffff84405990 RCX: ffffffff844059d3 RDX: ffff8881028e0000 RSI: ffffffff84405ac2 RDI: ffff88810c02f358 RBP: ffff88810c02f358 R08: 0000000000000007 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000224 R12: 0000000000000000 R13: ffff888007c82c78 R14: ffff888007c82c68 R15: ffff888007c82c68 FS: 0000000000000000(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffc9000d97fff8 CR3: 0000000102309002 CR4: 0000000000770ef0 PKRU: 55555554 Call Trace: <#DF> </#DF> <TASK> fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1)) fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1)) ... fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1)) fib6_dump_done (net/ipv6/ip6_fib.c:572 (discriminator 1)) netlink_sock_destruct (net/netlink/af_netlink.c:401) __sk_destruct (net/core/sock.c:2177 (discriminator 2)) sk_destruct (net/core/sock.c:2224) __sk_free (net/core/sock.c:2235) sk_free (net/core/sock.c:2246) process_one_work (kernel/workqueue.c:3259) worker_thread (kernel/workqueue.c:3329 kernel/workqueue.c:3416) kthread (kernel/kthread.c:388) ret_from_fork (arch/x86/kernel/process.c:153) ret_from_fork_asm (arch/x86/entry/entry_64.S:256) Modules linked in: Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzkaller <[email protected]> Signed-off-by: Kuniyuki Iwashima <[email protected]> Reviewed-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-02r8169: fix issue caused by buggy BIOS on certain boards with RTL8168dHeiner Kallweit1-0/+9
On some boards with this chip version the BIOS is buggy and misses to reset the PHY page selector. This results in the PHY ID read accessing registers on a different page, returning a more or less random value. Fix this by resetting the page selector first. Fixes: f1e911d5d0df ("r8169: add basic phylib support") Cc: [email protected] Signed-off-by: Heiner Kallweit <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-02io_uring/kbuf: hold io_buffer_list reference over mmapJens Axboe3-14/+36
If we look up the kbuf, ensure that it doesn't get unregistered until after we're done with it. Since we're inside mmap, we cannot safely use the io_uring lock. Rely on the fact that we can lookup the buffer list under RCU now and grab a reference to it, preventing it from being unregistered until we're done with it. The lookup returns the io_buffer_list directly with it referenced. Cc: [email protected] # v6.4+ Fixes: 5cf4f52e6d8a ("io_uring: free io_buffer_list entries via RCU") Signed-off-by: Jens Axboe <[email protected]>
2024-04-02io_uring/kbuf: protect io_buffer_list teardown with a referenceJens Axboe2-4/+13
No functional changes in this patch, just in preparation for being able to keep the buffer list alive outside of the ctx->uring_lock. Cc: [email protected] # v6.4+ Signed-off-by: Jens Axboe <[email protected]>
2024-04-02io_uring/kbuf: get rid of bl->is_readyJens Axboe2-10/+0
Now that xarray is being exclusively used for the buffer_list lookup, this check is no longer needed. Get rid of it and the is_ready member. Cc: [email protected] # v6.4+ Signed-off-by: Jens Axboe <[email protected]>
2024-04-02io_uring/kbuf: get rid of lower BGID listsJens Axboe3-65/+8
Just rely on the xarray for any kind of bgid. This simplifies things, and it really doesn't bring us much, if anything. Cc: [email protected] # v6.4+ Signed-off-by: Jens Axboe <[email protected]>
2024-04-02vsock/virtio: fix packet delivery to tap deviceMarco Pinna1-1/+2
Commit 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") added virtio_transport_deliver_tap_pkt() for handing packets to the vsockmon device. However, in virtio_transport_send_pkt_work(), the function is called before actually sending the packet (i.e. before placing it in the virtqueue with virtqueue_add_sgs() and checking whether it returned successfully). Queuing the packet in the virtqueue can fail even multiple times. However, in virtio_transport_deliver_tap_pkt() we deliver the packet to the monitoring tap interface only the first time we call it. This certainly avoids seeing the same packet replicated multiple times in the monitoring interface, but it can show the packet sent with the wrong timestamp or even before we succeed to queue it in the virtqueue. Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs() and making sure it returned successfully. Fixes: 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") Cc: [email protected] Signed-off-by: Marco Pinna <[email protected]> Reviewed-by: Stefano Garzarella <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-02ax25: fix use-after-free bugs caused by ax25_ds_del_timerDuoming Zhou1-1/+1
When the ax25 device is detaching, the ax25_dev_device_down() calls ax25_ds_del_timer() to cleanup the slave_timer. When the timer handler is running, the ax25_ds_del_timer() that calls del_timer() in it will return directly. As a result, the use-after-free bugs could happen, one of the scenarios is shown below: (Thread 1) | (Thread 2) | ax25_ds_timeout() ax25_dev_device_down() | ax25_ds_del_timer() | del_timer() | ax25_dev_put() //FREE | | ax25_dev-> //USE In order to mitigate bugs, when the device is detaching, use timer_shutdown_sync() to stop the timer. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Duoming Zhou <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-04-03ARM: dts: imx7s-warp: Pass OV2680 link-frequenciesFabio Estevam1-0/+1
Since commit 63b0cd30b78e ("media: ov2680: Add bus-cfg / endpoint property verification") the ov2680 no longer probes on a imx7s-warp7: ov2680 1-0036: error -EINVAL: supported link freq 330000000 not found ov2680 1-0036: probe with driver ov2680 failed with error -22 Fix it by passing the required 'link-frequencies' property as recommended by: https://www.kernel.org/doc/html/v6.9-rc1/driver-api/media/camera-sensor.html#handling-clocks Cc: [email protected] Fixes: 63b0cd30b78e ("media: ov2680: Add bus-cfg / endpoint property verification") Signed-off-by: Fabio Estevam <[email protected]> Signed-off-by: Shawn Guo <[email protected]>
2024-04-02bcachefs: ratelimit informational fsck errorsKent Overstreet1-4/+4
Signed-off-by: Kent Overstreet <[email protected]>
2024-04-02bcachefs: Check for bad needs_discard before doing discardKent Overstreet1-21/+26
In the discard worker, we were failing to validate the bucket state - meaning a corrupt needs_discard btree could cause us to discard a bucket that we shouldn't. If check_alloc_info hasn't run yet we just want to bail out, otherwise it's a filesystem inconsistent error. Signed-off-by: Kent Overstreet <[email protected]>
2024-04-02bcachefs: Improve bch2_btree_update_to_text()Kent Overstreet2-22/+43
Print out the mode as a string, and also print out the btree and watermark. Signed-off-by: Kent Overstreet <[email protected]>
2024-04-02ASoC: codecs: ES8326: solve some hp issues andMark Brown2-17/+22
Merge series from Zhang Yi <[email protected]>: We solved some issues related to headphone detection.And for using the same configuration in different power conditions,we modified the clock table
2024-04-02Merge tag 'docs-6.9-fixes' of git://git.lwn.net/linuxLinus Torvalds4-4/+6
Pull documentation fixes from Jonathan Corbet: "Four small documentation fixes" * tag 'docs-6.9-fixes' of git://git.lwn.net/linux: docs: zswap: fix shell command format tracing: Fix documentation on tp_printk cmdline option docs: Fix bitfield handling in kernel-doc Documentation: dev-tools: Add link to RV docs
2024-04-02ACPI: thermal: Register thermal zones without valid trip pointsStephen Horvath1-12/+10
Some laptops where the thermal control is handled by the EC may provide trip points that fail the kernels new validation, but still have working temperature sensors. An example of this is the Framework 13 AMD. This patch allows the thermal zone to still be registered without trip points if the trip points fail validation, allowing the temperature sensor to be viewed and used by the user. Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218586 Fixes: 9c8647224e9f ("ACPI: thermal: Use library functions to obtain trip point temperature values") Signed-off-by: Stephen Horvath <[email protected]> [ rjw: Subject edits, remove redundant braces ] Signed-off-by: Rafael J. Wysocki <[email protected]>
2024-04-02Merge tag 'bcachefs-2024-04-01' of https://evilpiepirate.org/git/bcachefsLinus Torvalds48-646/+814
Pull bcachefs fixes from Kent Overstreet: "Lots of fixes for situations with extreme filesystem damage. One fix ("Fix journal pins in btree write buffer") applicable to normal usage; also a dio performance fix. New repair/construction code is in the final stages, should be ready in about a week. Anyone that lost btree interior nodes (or a variety of other damage) as a result of the splitbrain bug will be able to repair then" * tag 'bcachefs-2024-04-01' of https://evilpiepirate.org/git/bcachefs: (32 commits) bcachefs: On emergency shutdown, print out current journal sequence number bcachefs: Fix overlapping extent repair bcachefs: Fix remove_dirent() bcachefs: Logged op errors should be ignored bcachefs: Improve -o norecovery; opts.recovery_pass_limit bcachefs: bch2_run_explicit_recovery_pass_persistent() bcachefs: Ensure bch_sb_field_ext always exists bcachefs: Flush journal immediately after replay if we did early repair bcachefs: Resume logged ops after fsck bcachefs: Add error messages to logged ops fns bcachefs: Split out recovery_passes.c bcachefs: fix backpointer for missing alloc key msg bcachefs: Fix bch2_btree_increase_depth() bcachefs: Kill bch2_bkey_ptr_data_type() bcachefs: Fix use after free in check_root_trans() bcachefs: Fix repair path for missing indirect extents bcachefs: Fix use after free in bch2_check_fix_ptrs() bcachefs: Fix btree node keys accounting in topology repair path bcachefs: Check btree ptr min_key in .invalid bcachefs: add REQ_SYNC and REQ_IDLE in write dio ...
2024-04-02mean_and_variance: Drop always failing testsGuenter Roeck1-27/+1
mean_and_variance_test_2 and mean_and_variance_test_4 always fail. The input parameters to those tests are identical to the input parameters to tests 1 and 3, yet the expected result for tests 2 and 4 is different for the mean and stddev tests. That will always fail. Expected mean_and_variance_get_mean(mv) == mean[i], but mean_and_variance_get_mean(mv) == 22 (0x16) mean[i] == 10 (0xa) Drop the bad tests. Fixes: 65bc41090720 ("mean and variance: More tests") Closes: https://lore.kernel.org/lkml/[email protected]/ Cc: Kent Overstreet <[email protected]> Signed-off-by: Guenter Roeck <[email protected]> Signed-off-by: Kent Overstreet <[email protected]>
2024-04-02btrfs: always clear PERTRANS metadata during commitBoris Burkov1-1/+1
It is possible to clear a root's IN_TRANS tag from the radix tree, but not clear its PERTRANS, if there is some error in between. Eliminate that possibility by moving the free up to where we clear the tag. Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Boris Burkov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-04-02btrfs: make btrfs_clear_delalloc_extent() free delalloc reserveBoris Burkov1-1/+1
Currently, this call site in btrfs_clear_delalloc_extent() only converts the reservation. We are marking it not delalloc, so I don't think it makes sense to keep the rsv around. This is a path where we are not sure to join a transaction, so it leads to incorrect free-ing during umount. Helps with the pass rate of generic/269 and generic/475. Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Boris Burkov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-04-02btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_transBoris Burkov1-9/+8
The transaction is only able to free PERTRANS reservations for a root once that root has been recorded with the TRANS tag on the roots radix tree. Therefore, until we are sure that this root will get tagged, it isn't safe to convert. Generally, this is not an issue as *some* transaction will likely tag the root before long and this reservation will get freed in that transaction, but technically it could stick around until unmount and result in a warning about leaked metadata reservation space. This path is most exercised by running the generic/269 fstest with CONFIG_BTRFS_DEBUG. Fixes: a6496849671a ("btrfs: fix start transaction qgroup rsv double free") CC: [email protected] # 6.6+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Boris Burkov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-04-02btrfs: record delayed inode root in transactionBoris Burkov1-0/+3
When running delayed inode updates, we do not record the inode's root in the transaction, but we do allocate PREALLOC and thus converted PERTRANS space for it. To be sure we free that PERTRANS meta rsv, we must ensure that we record the root in the transaction. Fixes: 4f5427ccce5d ("btrfs: delayed-inode: Use new qgroup meta rsv for delayed inode and item") CC: [email protected] # 6.1+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Boris Burkov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-04-02btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operationsBoris Burkov4-22/+40
Create subvolume, create snapshot and delete subvolume all use btrfs_subvolume_reserve_metadata() to reserve metadata for the changes done to the parent subvolume's fs tree, which cannot be mediated in the normal way via start_transaction. When quota groups (squota or qgroups) are enabled, this reserves qgroup metadata of type PREALLOC. Once the operation is associated to a transaction, we convert PREALLOC to PERTRANS, which gets cleared in bulk at the end of the transaction. However, the error paths of these three operations were not implementing this lifecycle correctly. They unconditionally converted the PREALLOC to PERTRANS in a generic cleanup step regardless of errors or whether the operation was fully associated to a transaction or not. This resulted in error paths occasionally converting this rsv to PERTRANS without calling record_root_in_trans successfully, which meant that unless that root got recorded in the transaction by some other thread, the end of the transaction would not free that root's PERTRANS, leaking it. Ultimately, this resulted in hitting a WARN in CONFIG_BTRFS_DEBUG builds at unmount for the leaked reservation. The fix is to ensure that every qgroup PREALLOC reservation observes the following properties: 1. any failure before record_root_in_trans is called successfully results in freeing the PREALLOC reservation. 2. after record_root_in_trans, we convert to PERTRANS, and now the transaction owns freeing the reservation. This patch enforces those properties on the three operations. Without it, generic/269 with squotas enabled at mkfs time would fail in ~5-10 runs on my system. With this patch, it ran successfully 1000 times in a row. Fixes: e85fde5162bf ("btrfs: qgroup: fix qgroup meta rsv leak for subvolume operations") CC: [email protected] # 6.1+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Boris Burkov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-04-02btrfs: qgroup: correctly model root qgroup rsv in convertBoris Burkov1-0/+2
We use add_root_meta_rsv and sub_root_meta_rsv to track prealloc and pertrans reservations for subvolumes when quotas are enabled. The convert function does not properly increment pertrans after decrementing prealloc, so the count is not accurate. Note: we check that the fs is not read-only to mirror the logic in qgroup_convert_meta, which checks that before adding to the pertrans rsv. Fixes: 8287475a2055 ("btrfs: qgroup: Use root::qgroup_meta_rsv_* to record qgroup meta reserved space") CC: [email protected] # 6.1+ Reviewed-by: Qu Wenruo <[email protected]> Signed-off-by: Boris Burkov <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-04-02tools/power turbostat: Add proper re-initialization for perf file descriptorsPatryk Wlazlyn1-0/+26
Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-04-02tools/power turbostat: Clear added counters when in no-msr modePatryk Wlazlyn1-1/+46
If user request --no-msr or is not able to access the MSRs, turbostat should clear all the counters added with --add. Because MSR access permission checks are done after the cmdline is parsed, the decision has to be defered up until the transition into no-msr mode happen. Signed-off-by: Len Brown <[email protected]>
2024-04-02tools/power turbostat: add early exits for permission checksPatryk Wlazlyn1-5/+61
Checking early if the permissions are even needed gets rid of the warnings about some of them missing. Earlier we issued a warning in case of missing MSR and/or perf permissions, even when user never asked for counters that require those. Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-04-02tools/power turbostat: detect and disable unavailable BICs at runtimePatryk Wlazlyn1-63/+125
To allow unprivileged user to run turbostat seamlessly. Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-04-02tools/power turbostat: Add reading aperf and mperf via perf APIPatryk Wlazlyn1-73/+301
By using the perf API we spend less time in between the reads of the counters, resulting in more accurate calculations of the dependent metrics. Using perf API is also usually faster overall, although cache miss, if we get one, is more costly when using perf vs MSR driver. We would fallback to the msr reads if the sysfs isn't there or when in --no-perf mode. Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-04-02tools/power turbostat: Add --no-perf optionPatryk Wlazlyn2-3/+24
Add the --no-perf option to allow users to run turbostat without accessing perf. Signed-off-by: Patryk Wlazlyn <[email protected]> Reviewed-by: Len Brown <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-04-02tools/power turbostat: Add --no-msr optionPatryk Wlazlyn2-56/+151
Add --no-msr option to allow users to run turbostat without accessing MSRs via the MSR driver. Signed-off-by: Patryk Wlazlyn <[email protected]> Reviewed-by: Len Brown <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-04-02tools/power turbostat: enhance -D (debug counter dump) outputLen Brown1-5/+11
Eliminate redundant debug output for core and package scope counters. Include name and path for all "ADDED" counters. Signed-off-by: Len Brown <[email protected]>
2024-04-02tools/power turbostat: Fix warning upon failed /dev/cpu_dma_latency readLen Brown1-1/+2
Previously a failed read of /dev/cpu_dma_latency erroneously complained turbostat: capget(CAP_SYS_ADMIN) failed, try "# setcap cap_sys_admin=ep ./turbostat This went unnoticed because this file is typically visible to root, and turbostat was typically run as root. Going forward, when a non-root user can run turbostat... Complain about failed read access to this file only if --debug is used. Signed-off-by: Len Brown <[email protected]>
2024-04-02tools/power turbostat: Read base_hz and bclk from CPUID.16H if availablePatryk Wlazlyn1-0/+9
If MSRs cannot be read, values can be obtained from cpuid. Signed-off-by: Patryk Wlazlyn <[email protected]> Signed-off-by: Len Brown <[email protected]>
2024-04-02Merge tag 'kvm-riscv-fixes-6.9-1' of https://github.com/kvm-riscv/linux into ↵Paolo Bonzini4-9/+34
HEAD KVM/riscv fixes for 6.9, take #1 - Fix spelling mistake in arch_timer selftest - Remove redundant semicolon in num_isa_ext_regs() - Fix APLIC setipnum_le/be write emulation - Fix APLIC in_clrip[x] read emulation
2024-04-02Merge tag 'kvmarm-fixes-6.9-1' of ↵Paolo Bonzini1689-24000/+50736
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.9, part #1 - Ensure perf events programmed to count during guest execution are actually enabled before entering the guest in the nVHE configuration. - Restore out-of-range handler for stage-2 translation faults. - Several fixes to stage-2 TLB invalidations to avoid stale translations, possibly including partial walk caches. - Fix early handling of architectural VHE-only systems to ensure E2H is appropriately set. - Correct a format specifier warning in the arch_timer selftest. - Make the KVM banner message correctly handle all of the possible configurations.
2024-04-02nvme: split nvme_update_zone_infoChristoph Hellwig3-23/+41
nvme_update_zone_info does (admin queue) I/O to the device and can fail. We fail to abort the queue limits update if that happen, but really should avoid with the frozen I/O queue as much as possible anyway. Split the logic into a helper to query the information that can be called on an unfrozen queue and one to apply it to the queue limits. Fixes: 9b130d681443 ("nvme: use the atomic queue limits update API") Reported-by: Kanchan Joshi <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Kanchan Joshi <[email protected]> Signed-off-by: Keith Busch <[email protected]>
2024-04-02smb: client: serialise cifs_construct_tcon() with cifs_mount_mutexPaulo Alcantara3-4/+27
Serialise cifs_construct_tcon() with cifs_mount_mutex to handle parallel mounts that may end up reusing the session and tcon created by it. Cc: [email protected] # 6.4+ Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-02smb: client: handle DFS tcons in cifs_construct_tcon()Paulo Alcantara1-0/+30
The tcons created by cifs_construct_tcon() on multiuser mounts must also be able to failover and refresh DFS referrals, so set the appropriate fields in order to get a full DFS tcon. They could be shared among different superblocks later, too. Cc: [email protected] # 6.4+ Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-02smb: client: refresh referral without acquiring refpath_lockPaulo Alcantara1-20/+24
Avoid refreshing DFS referral with refpath_lock acquired as the I/O could block for a while due to a potentially disconnected or slow DFS root server and then making other threads - that use same @server and don't require a DFS root server - unable to make any progress. Cc: [email protected] # 6.4+ Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-02smb: client: guarantee refcounted children from parent sessionPaulo Alcantara7-72/+76
Avoid potential use-after-free bugs when walking DFS referrals, mounting and performing DFS failover by ensuring that all children from parent @tcon->ses are also refcounted. They're all needed across the entire DFS mount. Get rid of @tcon->dfs_ses_list while we're at it, too. Cc: [email protected] # 6.4+ Reported-by: kernel test robot <[email protected]> Closes: https://lore.kernel.org/oe-kbuild-all/[email protected]/ Signed-off-by: Paulo Alcantara (Red Hat) <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-04-02swiotlb: do not set total_used to 0 in swiotlb_create_debugfs_files()Dexuan Cui1-4/+0
Sometimes the readout of /sys/kernel/debug/swiotlb/io_tlb_used and io_tlb_used_hiwater can be a huge number (e.g. 18446744073709551615), which is actually a negative number if we use "%ld" to print the number. When swiotlb_create_default_debugfs() is running from late_initcall, mem->total_used may already be non-zero, because the storage driver may have already started to perform I/O operations: if the storage driver is built-in, its probe() callback is called before late_initcall. swiotlb_create_debugfs_files() should not blindly set mem->total_used and mem->used_hiwater to 0; actually it doesn't have to initialize the fields at all, because the fields, as part of the global struct io_tlb_default_mem, have been implicitly initialized to zero. Also don't explicitly set mem->transient_nslabs to 0. Fixes: 8b0977ecc8b3 ("swiotlb: track and report io_tlb_used high water marks in debugfs") Fixes: 02e765697038 ("swiotlb: add debugfs to track swiotlb transient pool usage") Signed-off-by: Dexuan Cui <[email protected]> Reviewed-by: Michael Kelley <[email protected]> Reviewed-by: ZhangPeng <[email protected]> Reviewed-by: Petr Tesarik <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2024-04-02swiotlb: fix swiotlb_bounce() to do partial sync's correctlyMichael Kelley1-17/+13
In current code, swiotlb_bounce() may do partial sync's correctly in some circumstances, but may incorrectly fail in other circumstances. The failure cases require both of these to be true: 1) swiotlb_align_offset() returns a non-zero "offset" value 2) the tlb_addr of the partial sync area points into the first "offset" bytes of the _second_ or subsequent swiotlb slot allocated for the mapping Code added in commit 868c9ddc182b ("swiotlb: add overflow checks to swiotlb_bounce") attempts to WARN on the invalid case where tlb_addr points into the first "offset" bytes of the _first_ allocated slot. But there's no way for swiotlb_bounce() to distinguish the first slot from the second and subsequent slots, so the WARN can be triggered incorrectly when #2 above is true. Related, current code calculates an adjustment to the orig_addr stored in the swiotlb slot. The adjustment compensates for the difference in the tlb_addr used for the partial sync vs. the tlb_addr for the full mapping. The adjustment is stored in the local variable tlb_offset. But when #1 and #2 above are true, it's valid for this adjustment to be negative. In such case the arithmetic to adjust orig_addr produces the wrong result due to tlb_offset being declared as unsigned. Fix these problems by removing the over-constraining validations added in 868c9ddc182b. Change the declaration of tlb_offset to be signed instead of unsigned so the adjustment arithmetic works correctly. Tested with a test-only hack to how swiotlb_tbl_map_single() calls swiotlb_bounce(). Instead of calling swiotlb_bounce() just once for the entire mapped area, do a loop with each iteration doing only a 128 byte partial sync until the entire mapped area is sync'ed. Then with swiotlb=force on the kernel boot line, run a variety of raw disk writes followed by read and verification of all bytes of the written data. The storage device has DMA min_align_mask set, and the writes are done with a variety of original buffer memory address alignments and overall buffer sizes. For many of the combinations, current code triggers the WARN statements, or the data verification fails. With the fixes, no WARNs occur and all verifications pass. Fixes: 5f89468e2f06 ("swiotlb: manipulate orig_addr when tlb_addr has offset") Fixes: 868c9ddc182b ("swiotlb: add overflow checks to swiotlb_bounce") Signed-off-by: Michael Kelley <[email protected]> Dominique Martinet <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>