aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2023-06-05xfs: fix double xfs_perag_rele() in xfs_filestream_pick_ag()Dave Chinner1-1/+0
xfs_bmap_longest_free_extent() can return an error when accessing the AGF fails. In this case, the behaviour of xfs_filestream_pick_ag() is conditional on the error. We may continue the loop, or break out of it. The error handling after the loop cleans up the perag reference held when the break occurs. If we continue, the next loop iteration handles cleaning up the perag reference. EIther way, we don't need to release the active perag reference when xfs_bmap_longest_free_extent() fails. Doing so means we do a double decrement on the active reference count, and this causes tha active reference count to fall to zero. At this point, new active references will fail. This leads to unmount hanging because it tries to grab active references to that perag, only for it to fail. This happens inside a loop that retries until a inode tree radix tree tag is cleared, which cannot happen because we can't get an active reference to the perag. The unmount livelocks in this path: xfs_reclaim_inodes+0x80/0xc0 xfs_unmount_flush_inodes+0x5b/0x70 xfs_unmountfs+0x5b/0x1a0 xfs_fs_put_super+0x49/0x110 generic_shutdown_super+0x7c/0x1a0 kill_block_super+0x27/0x50 deactivate_locked_super+0x30/0x90 deactivate_super+0x3c/0x50 cleanup_mnt+0xc2/0x160 __cleanup_mnt+0x12/0x20 task_work_run+0x5e/0xa0 exit_to_user_mode_prepare+0x1bc/0x1c0 syscall_exit_to_user_mode+0x16/0x40 do_syscall_64+0x40/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd Reported-by: Pengfei Xu <[email protected]> Fixes: eb70aa2d8ed9 ("xfs: use for_each_perag_wrap in xfs_filestream_pick_ag") Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2023-06-05xfs: fix broken logic when detecting mergeable bmap recordsDarrick J. Wong1-12/+13
Commit 6bc6c99a944c was a well-intentioned effort to initiate consolidation of adjacent bmbt mapping records by setting the PREEN flag. Consolidation can only happen if the length of the combined record doesn't overflow the 21-bit blockcount field of the bmbt recordset. Unfortunately, the length test is inverted, leading to it triggering on data forks like these: EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL 0: [0..16777207]: 76110848..92888055 0 (76110848..92888055) 16777208 1: [16777208..20639743]: 92888056..96750591 0 (92888056..96750591) 3862536 Note that record 0 has a length of 16777208 512b blocks. This corresponds to 2097151 4k fsblocks, which is the maximum. Hence the two records cannot be merged. However, the logic is still wrong even if we change the in-loop comparison, because the scope of our examination isn't broad enough inside the loop to detect mappings like this: 0: [0..9]: 76110838..76110847 0 (76110838..76110847) 10 1: [10..16777217]: 76110848..92888055 0 (76110848..92888055) 16777208 2: [16777218..20639753]: 92888056..96750591 0 (92888056..96750591) 3862536 These three records could be merged into two, but one cannot determine this purely from looking at records 0-1 or 1-2 in isolation. Hoist the mergability detection outside the loop, and base its decision making on whether or not a merged mapping could be expressed in fewer bmbt records. While we're at it, fix the incorrect return type of the iter function. Fixes: 336642f79283 ("xfs: alert the user about data/attr fork mappings that could be merged") Signed-off-by: Darrick J. Wong <[email protected]> Reviewed-by: Dave Chinner <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2023-06-05xfs: Fix undefined behavior of shift into sign bitGeert Uytterhoeven1-4/+4
With gcc-5: In file included from ./include/trace/define_trace.h:102:0, from ./fs/xfs/scrub/trace.h:988, from fs/xfs/scrub/trace.c:40: ./fs/xfs/./scrub/trace.h: In function ‘trace_raw_output_xchk_fsgate_class’: ./fs/xfs/scrub/scrub.h:111:28: error: initializer element is not constant #define XREP_ALREADY_FIXED (1 << 31) /* checking our repair work */ ^ Shifting the (signed) value 1 into the sign bit is undefined behavior. Fix this for all definitions in the file by shifting "1U" instead of "1". This was exposed by the first user added in commit 466c525d6d35e691 ("xfs: minimize overhead of drain wakeups by using jump labels"). Fixes: 160b5a784525e8a4 ("xfs: hoist the already_fixed variable to the scrub context") Signed-off-by: Geert Uytterhoeven <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2023-06-05xfs: fix AGF vs inode cluster buffer deadlockDave Chinner4-106/+166
Lock order in XFS is AGI -> AGF, hence for operations involving inode unlinked list operations we always lock the AGI first. Inode unlinked list operations operate on the inode cluster buffer, so the lock order there is AGI -> inode cluster buffer. For O_TMPFILE operations, this now means the lock order set down in xfs_rename and xfs_link is AGI -> inode cluster buffer -> AGF as the unlinked ops are done before the directory modifications that may allocate space and lock the AGF. Unfortunately, we also now lock the inode cluster buffer when logging an inode so that we can attach the inode to the cluster buffer and pin it in memory. This creates a lock order of AGF -> inode cluster buffer in directory operations as we have to log the inode after we've allocated new space for it. This creates a lock inversion between the AGF and the inode cluster buffer. Because the inode cluster buffer is shared across multiple inodes, the inversion is not specific to individual inodes but can occur when inodes in the same cluster buffer are accessed in different orders. To fix this we need move all the inode log item cluster buffer interactions to the end of the current transaction. Unfortunately, xfs_trans_log_inode() calls are littered throughout the transactions with no thought to ordering against other items or locking. This makes it difficult to do anything that involves changing the call sites of xfs_trans_log_inode() to change locking orders. However, we do now have a mechanism that allows is to postpone dirty item processing to just before we commit the transaction: the ->iop_precommit method. This will be called after all the modifications are done and high level objects like AGI and AGF buffers have been locked and modified, thereby providing a mechanism that guarantees we don't lock the inode cluster buffer before those high level objects are locked. This change is largely moving the guts of xfs_trans_log_inode() to xfs_inode_item_precommit() and providing an extra flag context in the inode log item to track the dirty state of the inode in the current transaction. This also means we do a lot less repeated work in xfs_trans_log_inode() by only doing it once per transaction when all the work is done. Fixes: 298f7bec503f ("xfs: pin inode backing buffer to the inode log item") Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2023-06-05xfs: defered work could create precommitsDave Chinner1-0/+5
To fix a AGI-AGF-inode cluster buffer deadlock, we need to move inode cluster buffer operations to the ->iop_precommit() method. However, this means that deferred operations can require precommits to be run on the final transaction that the deferred ops pass back to xfs_trans_commit() context. This will be exposed by attribute handling, in that the last changes to the inode in the attr set state machine "disappear" because the precommit operation is not run. Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2023-06-05xfs: restore allocation trylock iterationDave Chinner1-6/+7
It was accidentally dropped when refactoring the allocation code, resulting in the AG iteration always doing blocking AG iteration. This results in a small performance regression for a specific fsmark test that runs more user data writer threads than there are AGs. Reported-by: kernel test robot <[email protected]> Fixes: 2edf06a50f5b ("xfs: factor xfs_alloc_vextent_this_ag() for _iterate_ags()") Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2023-06-05xfs: buffer pins need to hold a buffer referenceDave Chinner1-23/+65
When a buffer is unpinned by xfs_buf_item_unpin(), we need to access the buffer after we've dropped the buffer log item reference count. This opens a window where we can have two racing unpins for the buffer item (e.g. shutdown checkpoint context callback processing racing with journal IO iclog completion processing) and both attempt to access the buffer after dropping the BLI reference count. If we are unlucky, the "BLI freed" context wins the race and frees the buffer before the "BLI still active" case checks the buffer pin count. This results in a use after free that can only be triggered in active filesystem shutdown situations. To fix this, we need to ensure that buffer existence extends beyond the BLI reference count checks and until the unpin processing is complete. This implies that a buffer pin operation must also take a buffer reference to ensure that the buffer cannot be freed until the buffer unpin processing is complete. Reported-by: yangerkun <[email protected]> Signed-off-by: Dave Chinner <[email protected]> Reviewed-by: Darrick J. Wong <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Signed-off-by: Dave Chinner <[email protected]>
2023-06-04Linux 6.4-rc5Linus Torvalds1-1/+1
2023-06-04Merge tag 'irq_urgent_for_v6.4_rc5' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq fix from Borislav Petkov: - Fix open firmware quirks validation so that they don't get applied wrongly * tag 'irq_urgent_for_v6.4_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irqchip/gic: Correctly validate OF quirk descriptors
2023-06-04net: sched: wrap tc_skip_wrapper with CONFIG_RETPOLINEMin-Hua Chen1-0/+2
This patch fixes the following sparse warning: net/sched/sch_api.c:2305:1: sparse: warning: symbol 'tc_skip_wrapper' was not declared. Should it be static? No functional change intended. Signed-off-by: Min-Hua Chen <[email protected]> Acked-by: Pedro Tammela <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-04Merge branch 'enetc-fixes'David S. Miller1-1/+15
Wei Fang says: ==================== net: enetc: correct the statistics of rx bytes The purpose of this patch set is to fix the issue of rx bytes statistics. The first patch corrects the rx bytes statistics of normal kernel protocol stack path, and the second patch is used to correct the rx bytes statistics of XDP. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-06-04net: enetc: correct rx_bytes statistics of XDPWei Fang1-0/+8
The rx_bytes statistics of XDP are always zero, because rx_byte_cnt is not updated after it is initialized to 0. So fix it. Fixes: d1b15102dd16 ("net: enetc: add support for XDP_DROP and XDP_PASS") Signed-off-by: Wei Fang <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-04net: enetc: correct the statistics of rx bytesWei Fang1-1/+7
The rx_bytes of struct net_device_stats should count the length of ethernet frames excluding the FCS. However, there are two problems with the rx_bytes statistics of the current enetc driver. one is that the length of VLAN header is not counted if the VLAN extraction feature is enabled. The other is that the length of L2 header is not counted, because eth_type_trans() is invoked before updating rx_bytes which will subtract the length of L2 header from skb->len. BTW, the rx_bytes statistics of XDP path also have similar problem, I will fix it in another patch. Fixes: a800abd3ecb9 ("net: enetc: move skb creation into enetc_build_skb") Signed-off-by: Wei Fang <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-04Merge tag 'media/v6.4-4' of ↵Linus Torvalds11-15/+32
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media Pull media fixes from Mauro Carvalho Chehab: "Some driver fixes: - a regression fix for the verisilicon driver - uvcvideo: don't expose unsupported video formats to userspace - camss-video: don't zero subdev format after init - mediatek: some fixes for 4K decoder formats - fix a Sphinx build warning (missing doc for client_caps) - some fixes for imx and atomisp staging drivers And two CEC core fixes: - don't set last_initiator if TX in progress - disable adapter in cec_devnode_unregister" * tag 'media/v6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: media: uvcvideo: Don't expose unsupported formats to userspace media: v4l2-subdev: Fix missing kerneldoc for client_caps media: staging: media: imx: initialize hs_settle to avoid warning media: v4l2-mc: Drop subdev check in v4l2_create_fwnode_links_to_pad() media: staging: media: atomisp: init high & low vars media: cec: core: don't set last_initiator if tx in progress media: cec: core: disable adapter in cec_devnode_unregister media: mediatek: vcodec: Only apply 4K frame sizes on decoder formats media: camss: camss-video: Don't zero subdev format again after initialization media: verisilicon: Additional fix for the crash when opening the driver
2023-06-04Merge tag 'char-misc-6.4-rc5' of ↵Linus Torvalds27-115/+295
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char/misc driver fixes from Greg KH: "Here are a bunch of tiny char/misc/other driver fixes for 6.4-rc5 that resolve a number of reported issues. Included in here are: - iio driver fixes - fpga driver fixes - test_firmware bugfixes - fastrpc driver tiny bugfixes - MAINTAINERS file updates for some subsystems All of these have been in linux-next this past week with no reported issues" * tag 'char-misc-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (34 commits) test_firmware: fix the memory leak of the allocated firmware buffer test_firmware: fix a memory leak with reqs buffer test_firmware: prevent race conditions by a correct implementation of locking firmware_loader: Fix a NULL vs IS_ERR() check MAINTAINERS: Vaibhav Gupta is the new ipack maintainer dt-bindings: fpga: replace Ivan Bornyakov maintainership MAINTAINERS: update Microchip MPF FPGA reviewers misc: fastrpc: reject new invocations during device removal misc: fastrpc: return -EPIPE to invocations on device removal misc: fastrpc: Reassign memory ownership only for remote heap misc: fastrpc: Pass proper scm arguments for secure map request iio: imu: inv_icm42600: fix timestamp reset iio: adc: ad_sigma_delta: Fix IRQ issue by setting IRQ_DISABLE_UNLAZY flag dt-bindings: iio: adc: renesas,rcar-gyroadc: Fix adi,ad7476 compatible value iio: dac: mcp4725: Fix i2c_master_send() return value handling iio: accel: kx022a fix irq getting iio: bu27034: Ensure reset is written iio: dac: build ad5758 driver when AD5758 is selected iio: addac: ad74413: fix resistance input processing iio: light: vcnl4035: fixed chip ID check ...
2023-06-04arm64: dts: imx8mn-beacon: Fix SPI CS pinmuxAdam Ford1-2/+2
The final production baseboard had a different chip select than earlier prototype boards. When the newer board was released, the SPI stopped working because the wrong pin was used in the device tree and conflicted with the UART RTS. Fix the pinmux for production boards. Fixes: 36ca3c8ccb53 ("arm64: dts: imx: Add Beacon i.MX8M Nano development kit") Signed-off-by: Adam Ford <[email protected]> Signed-off-by: Shawn Guo <[email protected]>
2023-06-04Merge tag 'driver-core-6.4-rc5' of ↵Linus Torvalds1-0/+26
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core fixes from Greg KH: "Here are two small driver core cacheinfo fixes for 6.4-rc5 that resolve a number of reported issues with that file. These changes have been in linux-next this past week with no reported problems" * tag 'driver-core-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: drivers: base: cacheinfo: Update cpu_map_populated during CPU Hotplug drivers: base: cacheinfo: Fix shared_cpu_map changes in event of CPU hotplug
2023-06-04Merge tag 'tty-6.4-rc5' of ↵Linus Torvalds6-27/+30
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial driver fixes from Greg KH: "Here are some small tty/serial driver fixes for 6.4-rc5 that have all been in linux-next this past week with no reported problems. Included in here are: - 8250_tegra driver bugfix - fsl uart driver bugfixes - Kconfig fix for dependancy issue - dt-bindings fix for the 8250_omap driver" * tag 'tty-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: dt-bindings: serial: 8250_omap: add rs485-rts-active-high serial: cpm_uart: Fix a COMPILE_TEST dependency soc: fsl: cpm1: Fix TSA and QMC dependencies in case of COMPILE_TEST tty: serial: fsl_lpuart: use UARTCTRL_TXINV to send break instead of UARTCTRL_SBK serial: 8250_tegra: Fix an error handling path in tegra_uart_probe()
2023-06-04Merge tag 'usb-6.4-rc5' of ↵Linus Torvalds12-9/+111
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb Pull USB fixes from Greg KH: "Here are some USB driver and core fixes for 6.4-rc5. Most of these are tiny driver fixes, including: - udc driver bugfix - f_fs gadget driver bugfix - cdns3 driver bugfix - typec bugfixes But the "big" thing in here is a fix yet-again for how the USB buffers are handled from userspace when dealing with DMA issues. The changes were discussed a lot, and tested a lot, on the list, and acked by the relevant mm maintainers and have been in linux-next all this past week with no reported problems" * tag 'usb-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: usb: typec: tps6598x: Fix broken polling mode after system suspend/resume mm: page_table_check: Ensure user pages are not slab pages mm: page_table_check: Make it dependent on EXCLUSIVE_SYSTEM_RAM usb: usbfs: Use consistent mmap functions usb: usbfs: Enforce page requirements for mmap dt-bindings: usb: snps,dwc3: Fix "snps,hsphy_interface" type usb: gadget: udc: fix NULL dereference in remove() usb: gadget: f_fs: Add unbind event before functionfs_unbind usb: cdns3: fix NCM gadget RX speed 20x slow than expection at iMX8QM
2023-06-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds23-95/+248
Pull kvm fixes from Paolo Bonzini: "ARM: - Address some fallout of the locking rework, this time affecting the way the vgic is configured - Fix an issue where the page table walker frees a subtree and then proceeds with walking what it has just freed... - Check that a given PA donated to the guest is actually memory (only affecting pKVM) - Correctly handle MTE CMOs by Set/Way - Fix the reported address of a watchpoint forwarded to userspace - Fix the freeing of the root of stage-2 page tables - Stop creating spurious PMU events to perform detection of the default PMU and use the existing PMU list instead x86: - Fix a memslot lookup bug in the NX recovery thread that could theoretically let userspace bypass the NX hugepage mitigation - Fix a s/BLOCKING/PENDING bug in SVM's vNMI support - Account exit stats for fastpath VM-Exits that never leave the super tight run-loop - Fix an out-of-bounds bug in the optimized APIC map code, and add a regression test for the race" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: selftests: Add test for race in kvm_recalculate_apic_map() KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-bounds KVM: x86: Account fastpath-only VM-Exits in vCPU stats KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASK KVM: x86/mmu: Grab memslot for correct address space in NX recovery worker KVM: arm64: Document default vPMU behavior on heterogeneous systems KVM: arm64: Iterate arm_pmus list to probe for default PMU KVM: arm64: Drop last page ref in kvm_pgtable_stage2_free_removed() KVM: arm64: Populate fault info for watchpoint KVM: arm64: Reload PTE after invoking walker callback on preorder traversal KVM: arm64: Handle trap of tagged Set/Way CMOs arm64: Add missing Set/Way CMO encodings KVM: arm64: Prevent unconditional donation of unmapped regions from the host KVM: arm64: vgic: Fix a comment KVM: arm64: vgic: Fix locking comment KVM: arm64: vgic: Wrap vgic_its_create() with config_lock KVM: arm64: vgic: Fix a circular locking issue
2023-06-04Merge tag 'powerpc-6.4-4' of ↵Linus Torvalds6-24/+33
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Fix link errors in new aes-gcm-p10 code when built-in with other drivers - Limit number of TCEs passed to H_STUFF_TCE hcall as per spec - Use KSYM_NAME_LEN in xmon array size to avoid possible OOB write Thanks to Gaurav Batra and Maninder Singh Vishal Chourasia. * tag 'powerpc-6.4-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/xmon: Use KSYM_NAME_LEN in array size powerpc/iommu: Limit number of TCEs to 512 for H_STUFF_TCE hcall powerpc/crypto: Fix aes-gcm-p10 link errors
2023-06-03blk-mq: fix blk_mq_hw_ctx active request accountingTian Lan1-4/+4
The nr_active counter continues to increase over time which causes the blk_mq_get_tag to hang until the thread is rescheduled to a different core despite there are still tags available. kernel-stack INFO: task inboundIOReacto:3014879 blocked for more than 2 seconds Not tainted 6.1.15-amd64 #1 Debian 6.1.15~debian11 "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. task:inboundIOReacto state:D stack:0 pid:3014879 ppid:4557 flags:0x00000000 Call Trace: <TASK> __schedule+0x351/0xa20 scheduler+0x5d/0xe0 io_schedule+0x42/0x70 blk_mq_get_tag+0x11a/0x2a0 ? dequeue_task_stop+0x70/0x70 __blk_mq_alloc_requests+0x191/0x2e0 kprobe output showing RQF_MQ_INFLIGHT bit is not cleared before __blk_mq_free_request being called. 320 320 kworker/29:1H __blk_mq_free_request rq_flags 0x220c0 in-flight 1 b'__blk_mq_free_request+0x1 [kernel]' b'bt_iter+0x50 [kernel]' b'blk_mq_queue_tag_busy_iter+0x318 [kernel]' b'blk_mq_timeout_work+0x7c [kernel]' b'process_one_work+0x1c4 [kernel]' b'worker_thread+0x4d [kernel]' b'kthread+0xe6 [kernel]' b'ret_from_fork+0x1f [kernel]' Signed-off-by: Tian Lan <[email protected]> Fixes: 2e315dc07df0 ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter") Reviewed-by: Ming Lei <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-06-03net/smc: Avoid to access invalid RMBs' MRs in SMCRv1 ADD LINK CONTWen Gu1-2/+2
SMCRv1 has a similar issue to SMCRv2 (see link below) that may access invalid MRs of RMBs when construct LLC ADD LINK CONT messages. BUG: kernel NULL pointer dereference, address: 0000000000000014 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 5 PID: 48 Comm: kworker/5:0 Kdump: loaded Tainted: G W E 6.4.0-rc3+ #49 Workqueue: events smc_llc_add_link_work [smc] RIP: 0010:smc_llc_add_link_cont+0x160/0x270 [smc] RSP: 0018:ffffa737801d3d50 EFLAGS: 00010286 RAX: ffff964f82144000 RBX: ffffa737801d3dd8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff964f81370c30 RBP: ffffa737801d3dd4 R08: ffff964f81370000 R09: ffffa737801d3db0 R10: 0000000000000001 R11: 0000000000000060 R12: ffff964f82e70000 R13: ffff964f81370c38 R14: ffffa737801d3dd3 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff9652bfd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000014 CR3: 000000008fa20004 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> smc_llc_srv_rkey_exchange+0xa7/0x190 [smc] smc_llc_srv_add_link+0x3ae/0x5a0 [smc] smc_llc_add_link_work+0xb8/0x140 [smc] process_one_work+0x1e5/0x3f0 worker_thread+0x4d/0x2f0 ? __pfx_worker_thread+0x10/0x10 kthread+0xe5/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2c/0x50 </TASK> When an alernate RNIC is available in system, SMC will try to add a new link based on the RNIC for resilience. All the RMBs in use will be mapped to the new link. Then the RMBs' MRs corresponding to the new link will be filled into LLC messages. For SMCRv1, they are ADD LINK CONT messages. However smc_llc_add_link_cont() may mistakenly access to unused RMBs which haven't been mapped to the new link and have no valid MRs, thus causing a crash. So this patch fixes it. Fixes: 87f88cda2128 ("net/smc: rkey processing for a new link as SMC client") Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Wen Gu <[email protected]> Reviewed-by: Wenjia Zhang <[email protected]> Reviewed-by: Tony Lu <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-03Fix gitignore for recently added usptream self testsWeihao Gao1-0/+3
This resolves the issue that generated binary is showing up as an untracked git file after every build on the kernel. Signed-off-by: Weihao Gao <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-06-03Merge tag 'kvm-x86-fixes-6.4' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini6-4/+101
KVM x86 fixes for 6.4 - Fix a memslot lookup bug in the NX recovery thread that could theoretically let userspace bypass the NX hugepage mitigation - Fix a s/BLOCKING/PENDING bug in SVM's vNMI support - Account exit stats for fastpath VM-Exits that never leave the super tight run-loop - Fix an out-of-bounds bug in the optimized APIC map code, and add a regression test for the race.
2023-06-03Merge tag 'kvmarm-fixes-6.4-3' of ↵Paolo Bonzini5-37/+35
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.4, take #3 - Fix the reported address of a watchpoint forwarded to userspace - Fix the freeing of the root of stage-2 page tables - Stop creating spurious PMU events to perform detection of the default PMU and use the existing PMU list instead.
2023-06-03Merge tag 'kvmarm-fixes-6.4-2' of ↵Paolo Bonzini13-54/+112
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 6.4, take #2 - Address some fallout of the locking rework, this time affecting the way the vgic is configured - Fix an issue where the page table walker frees a subtree and then proceeds with walking what it has just freed... - Check that a given PA donated to the gues is actually memory (only affecting pKVM) - Correctly handle MTE CMOs by Set/Way
2023-06-03Merge tag 'scsi-fixes' of ↵Linus Torvalds11-95/+120
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Five fixes, all in drivers. The most extensive is the target change to fix the hang in the login code, which involves changing timers from per login to per connection" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: stex: Fix gcc 13 warnings scsi: qla2xxx: Fix NULL pointer dereference in target mode scsi: target: iscsi: Prevent login threads from racing between each other scsi: target: iscsi: Remove unused transport_timer scsi: target: iscsi: Fix hang in the iSCSI login code
2023-06-03Merge tag 'leds-6.4-rc5' of ↵Linus Torvalds1-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/johan/linux Pull LED fix from Johan Hovold: "Here's a fix for a regression in 6.4-rc1 which broke the backlight on machines such as the Lenovo ThinkPad X13s" Acked-by: Lee Jones <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ * tag 'leds-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/johan/linux: leds: qcom-lpg: Fix PWM period limits
2023-06-03leds: qcom-lpg: Fix PWM period limitsBjorn Andersson1-4/+4
The introduction of high resolution PWM support changed the order of the operations in the calculation of min and max period. The result in both divisions is in most cases a truncation to 0, which limits the period to the range of [0, 0]. Both numerators (and denominators) are within 64 bits, so the whole expression can be put directly into the div64_u64, instead of doing it partially. Fixes: b00d2ed37617 ("leds: rgb: leds-qcom-lpg: Add support for high resolution PWM") Reviewed-by: Caleb Connolly <[email protected]> Tested-by: Steev Klimaszewski <[email protected]> Signed-off-by: Bjorn Andersson <[email protected]> Acked-by: Lee Jones <[email protected]> Tested-by: Johan Hovold <[email protected]> Tested-by: Neil Armstrong <[email protected]> # on SM8550-QRD Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Johan Hovold <[email protected]>
2023-06-03Merge tag 'probes-fixes-6.4-rc4' of ↵Linus Torvalds2-19/+28
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull probes fixes from Masami Hiramatsu: - Return NULL if the trace_probe list on trace_probe_event is empty - selftests/ftrace: Choose testing symbol name for filtering feature from sample data instead of fixed symbol * tag 'probes-fixes-6.4-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: selftests/ftrace: Choose target function for filter test from samples tracing/probe: trace_probe_primary_from_call(): checked list_first_entry
2023-06-02net: phylink: actually fix ksettings_set() ethtool callRussell King (Oracle)1-5/+10
Raju Lakkaraju reported that the below commit caused a regression with Lan743x drivers and a 2.5G SFP. Sadly, this is because the commit was utterly wrong. Let's fix this properly by not moving the linkmode_and(), but instead copying the link ksettings and then modifying the advertising mask before passing the modified link ksettings to phylib. Fixes: df0acdc59b09 ("net: phylink: fix ksettings_set() ethtool call") Signed-off-by: Russell King (Oracle) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-03selftests/ftrace: Choose target function for filter test from samplesMasami Hiramatsu (Google)1-18/+27
Since the event-filter-function.tc expects the 'exit_mmap()' directly calls 'kmem_cache_free()', this is vulnerable to code modifications. Choose the target function for the filter test from the sample event data so that it can keep test running correctly even if the caller function name will be changed. Link: https://lore.kernel.org/linux-trace-kernel/167919441260.1922645.18355804179347364057.stgit@mhiramat.roam.corp.google.com/ Link: https://lore.kernel.org/all/CA+G9fYtF-XEKi9YNGgR=Kf==7iRb2FrmEC7qtwAeQbfyah-UhA@mail.gmail.com/ Reported-by: Linux Kernel Functional Testing <[email protected]> Fixes: 7f09d639b8c4 ("tracing/selftests: Add test for event filtering on function name") Signed-off-by: Masami Hiramatsu (Google) <[email protected]> Acked-by: Steven Rostedt (Google) <[email protected]>
2023-06-02Merge branch 'net-ipv6-skip_notify_on_dev_down-fix'Jakub Kicinski2-3/+3
Eric Dumazet says: ==================== net/ipv6: skip_notify_on_dev_down fix While reviewing Matthieu Baerts recent patch [1], I found it copied/pasted an existing bug around skip_notify_on_dev_down. First patch is a stable candidate, and second one can simply land in net tree. https://lore.kernel.org/lkml/20230601-net-next-skip_print_link_becomes_ready-v1-1-c13e64c14095@tessares.net/ ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02net/ipv6: convert skip_notify_on_dev_down sysctl to u8Eric Dumazet2-3/+3
Save a bit a space, and could help future sysctls to use the same pattern. Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Acked-by: Matthieu Baerts <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02net/ipv6: fix bool/int mismatch for skip_notify_on_dev_downEric Dumazet1-1/+1
skip_notify_on_dev_down ctl table expects this field to be an int (4 bytes), not a bool (1 byte). Because proc_dou8vec_minmax() was added in 5.13, this patch converts skip_notify_on_dev_down to an int. Following patch then converts the field to u8 and use proc_dou8vec_minmax(). Fixes: 7c6bb7d2faaf ("net/ipv6: Add knob to skip DELROUTE message on device down") Signed-off-by: Eric Dumazet <[email protected]> Reviewed-by: David Ahern <[email protected]> Acked-by: Matthieu Baerts <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-06-02KVM: selftests: Add test for race in kvm_recalculate_apic_map()Michal Luczaj2-0/+75
Keep switching between LAPIC_MODE_X2APIC and LAPIC_MODE_DISABLED during APIC map construction to hunt for TOCTOU bugs in KVM. KVM's optimized map recalc makes multiple passes over the list of vCPUs, and the calculations ignore vCPU's whose APIC is hardware-disabled, i.e. there's a window where toggling LAPIC_MODE_DISABLED is quite interesting. Signed-off-by: Michal Luczaj <[email protected]> Co-developed-by: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-02KVM: x86: Bail from kvm_recalculate_phys_map() if x2APIC ID is out-of-boundsSean Christopherson1-2/+18
Bail from kvm_recalculate_phys_map() and disable the optimized map if the target vCPU's x2APIC ID is out-of-bounds, i.e. if the vCPU was added and/or enabled its local APIC after the map was allocated. This fixes an out-of-bounds access bug in the !x2apic_format path where KVM would write beyond the end of phys_map. Check the x2APIC ID regardless of whether or not x2APIC is enabled, as KVM's hardcodes x2APIC ID to be the vCPU ID, i.e. it can't change, and the map allocation in kvm_recalculate_apic_map() doesn't check for x2APIC being enabled, i.e. the check won't get false postivies. Note, this also affects the x2apic_format path, which previously just ignored the "x2apic_id > new->max_apic_id" case. That too is arguably a bug fix, as ignoring the vCPU meant that KVM would not send interrupts to the vCPU until the next map recalculation. In practice, that "bug" is likely benign as a newly present vCPU/APIC would immediately trigger a recalc. But, there's no functional downside to disabling the map, and a future patch will gracefully handle the -E2BIG case by retrying instead of simply disabling the optimized map. Opportunistically add a sanity check on the xAPIC ID size, along with a comment explaining why the xAPIC ID is guaranteed to be "good". Reported-by: Michal Luczaj <[email protected]> Fixes: 5b84b0291702 ("KVM: x86: Honor architectural behavior for aliased 8-bit APIC IDs") Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-02Merge branch 'Fix elem_size not being set for inner maps'Martin KaFai Lau3-2/+82
Rhys Rustad-Elliott says: ==================== Commit d937bc3449fa ("bpf: make uniform use of array->elem_size everywhere in arraymap.c") changed array_map_gen_lookup to use array->elem_size instead of round_up(map->value_size, 8) as the element size when generating code to access a value in an array map. array->elem_size, however, is not set by bpf_map_meta_alloc when initializing an BPF_MAP_TYPE_ARRAY_OF_MAPS or BPF_MAP_TYPE_HASH_OF_MAPS. This results in array_map_gen_lookup incorrectly outputting code that always accesses index 0 in the array (as the index will be calculated via a multiplication with the element size, which is incorrectly set to 0). This patchset sets elem_size on the bpf_array object when allocating an array or hash of maps to fix this and adds a selftest that accesses an array map nested within a hash of maps at a nonzero index to prevent regressions. v1: https://lore.kernel.org/bpf/[email protected]/ Changelog: v1 -> v2: Address comments by Martin KaFai Lau: - Directly use inner_array->elem_size instead of using round_up - Move selftests to a new patch - Use ASSERT_* macros instead of CHECK and remove duration - Remove unnecessary usleep - Shorten selftest name ==================== Signed-off-by: Martin KaFai Lau <[email protected]>
2023-06-02selftests/bpf: Add access_inner_map selftestRhys Rustad-Elliott2-0/+76
Add a selftest that accesses a BPF_MAP_TYPE_ARRAY (at a nonzero index) nested within a BPF_MAP_TYPE_HASH_OF_MAPS to flex a previously buggy case. Signed-off-by: Rhys Rustad-Elliott <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-06-02x86/head/64: Switch to KERNEL_CS as soon as new GDT is installedTom Lendacky1-9/+9
The call to startup_64_setup_env() will install a new GDT but does not actually switch to using the KERNEL_CS entry until returning from the function call. Commit bcce82908333 ("x86/sev: Detect/setup SEV/SME features earlier in boot") moved the call to sme_enable() earlier in the boot process and in between the call to startup_64_setup_env() and the switch to KERNEL_CS. An SEV-ES or an SEV-SNP guest will trigger #VC exceptions during the call to sme_enable() and if the CS pushed on the stack as part of the exception and used by IRETQ is not mapped by the new GDT, then problems occur. Today, the current CS when entering startup_64 is the kernel CS value because it was set up by the decompressor code, so no issue is seen. However, a recent patchset that looked to avoid using the legacy decompressor during an EFI boot exposed this bug. At entry to startup_64, the CS value is that of EFI and is not mapped in the new kernel GDT. So when a #VC exception occurs, the CS value used by IRETQ is not valid and the guest boot crashes. Fix this issue by moving the block that switches to the KERNEL_CS value to be done immediately after returning from startup_64_setup_env(). Fixes: bcce82908333 ("x86/sev: Detect/setup SEV/SME features earlier in boot") Signed-off-by: Tom Lendacky <[email protected]> Signed-off-by: Dave Hansen <[email protected]> Reviewed-by: Joerg Roedel <[email protected]> Link: https://lore.kernel.org/all/6ff1f28af2829cc9aea357ebee285825f90a431f.1684340801.git.thomas.lendacky%40amd.com
2023-06-02KVM: x86: Account fastpath-only VM-Exits in vCPU statsSean Christopherson1-0/+3
Increment vcpu->stat.exits when handling a fastpath VM-Exit without going through any part of the "slow" path. Not bumping the exits stat can result in wildly misleading exit counts, e.g. if the primary reason the guest is exiting is to program the TSC deadline timer. Fixes: 404d5d7bff0d ("KVM: X86: Introduce more exit_fastpath_completion enum values") Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-02KVM: SVM: vNMI pending bit is V_NMI_PENDING_MASK not V_NMI_BLOCKING_MASKMaciej S. Szmigiero1-1/+1
While testing Hyper-V enabled Windows Server 2019 guests on Zen4 hardware I noticed that with vCPU count large enough (> 16) they sometimes froze at boot. With vCPU count of 64 they never booted successfully - suggesting some kind of a race condition. Since adding "vnmi=0" module parameter made these guests boot successfully it was clear that the problem is most likely (v)NMI-related. Running kvm-unit-tests quickly showed failing NMI-related tests cases, like "multiple nmi" and "pending nmi" from apic-split, x2apic and xapic tests and the NMI parts of eventinj test. The issue was that once one NMI was being serviced no other NMI was allowed to be set pending (NMI limit = 0), which was traced to svm_is_vnmi_pending() wrongly testing for the "NMI blocked" flag rather than for the "NMI pending" flag. Fix this by testing for the right flag in svm_is_vnmi_pending(). Once this is done, the NMI-related kvm-unit-tests pass successfully and the Windows guest no longer freezes at boot. Fixes: fa4c027a7956 ("KVM: x86: Add support for SVM's Virtual NMI") Signed-off-by: Maciej S. Szmigiero <[email protected]> Reviewed-by: Sean Christopherson <[email protected]> Link: https://lore.kernel.org/r/be4ca192eb0c1e69a210db3009ca984e6a54ae69.1684495380.git.maciej.szmigiero@oracle.com Signed-off-by: Sean Christopherson <[email protected]>
2023-06-02KVM: x86/mmu: Grab memslot for correct address space in NX recovery workerSean Christopherson1-1/+4
Factor in the address space (non-SMM vs. SMM) of the target shadow page when recovering potential NX huge pages, otherwise KVM will retrieve the wrong memslot when zapping shadow pages that were created for SMM. The bug most visibly manifests as a WARN on the memslot being non-NULL, but the worst case scenario is that KVM could unaccount the shadow page without ensuring KVM won't install a huge page, i.e. if the non-SMM slot is being dirty logged, but the SMM slot is not. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 3911 at arch/x86/kvm/mmu/mmu.c:7015 kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm] CPU: 1 PID: 3911 Comm: kvm-nx-lpage-re RIP: 0010:kvm_nx_huge_page_recovery_worker+0x38c/0x3d0 [kvm] RSP: 0018:ffff99b284f0be68 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff99b284edd000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffff9271397024e0 R08: 0000000000000000 R09: ffff927139702450 R10: 0000000000000000 R11: 0000000000000001 R12: ffff99b284f0be98 R13: 0000000000000000 R14: ffff9270991fcd80 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff927f9f640000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f0aacad3ae0 CR3: 000000088fc2c005 CR4: 00000000003726e0 Call Trace: <TASK> __pfx_kvm_nx_huge_page_recovery_worker+0x10/0x10 [kvm] kvm_vm_worker_thread+0x106/0x1c0 [kvm] kthread+0xd9/0x100 ret_from_fork+0x2c/0x50 </TASK> ---[ end trace 0000000000000000 ]--- This bug was exposed by commit edbdb43fc96b ("KVM: x86: Preserve TDP MMU roots until they are explicitly invalidated"), which allowed KVM to retain SMM TDP MMU roots effectively indefinitely. Before commit edbdb43fc96b, KVM would zap all SMM TDP MMU roots and thus all SMM TDP MMU shadow pages once all vCPUs exited SMM, which made the window where this bug (recovering an SMM NX huge page) could be encountered quite tiny. To hit the bug, the NX recovery thread would have to run while at least one vCPU was in SMM. Most VMs typically only use SMM during boot, and so the problematic shadow pages were gone by the time the NX recovery thread ran. Now that KVM preserves TDP MMU roots until they are explicitly invalidated (e.g. by a memslot deletion), the window to trigger the bug is effectively never closed because most VMMs don't delete memslots after boot (except for a handful of special scenarios). Fixes: eb298605705a ("KVM: x86/mmu: Do not recover dirty-tracked NX Huge Pages") Reported-by: Fabio Coatti <[email protected]> Closes: https://lore.kernel.org/all/CADpTngX9LESCdHVu_2mQkNGena_Ng2CphWNwsRGSMxzDsTjU2A@mail.gmail.com Cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Sean Christopherson <[email protected]>
2023-06-02bpf: Fix elem_size not being set for inner mapsRhys Rustad-Elliott1-2/+6
Commit d937bc3449fa ("bpf: make uniform use of array->elem_size everywhere in arraymap.c") changed array_map_gen_lookup to use array->elem_size instead of round_up(map->value_size, 8) as the element size when generating code to access a value in an array map. array->elem_size, however, is not set by bpf_map_meta_alloc when initializing an BPF_MAP_TYPE_ARRAY_OF_MAPS or BPF_MAP_TYPE_HASH_OF_MAPS. This results in array_map_gen_lookup incorrectly outputting code that always accesses index 0 in the array (as the index will be calculated via a multiplication with the element size, which is incorrectly set to 0). Set elem_size on the bpf_array object when allocating an array or hash of maps to fix this. Fixes: d937bc3449fa ("bpf: make uniform use of array->elem_size everywhere in arraymap.c") Signed-off-by: Rhys Rustad-Elliott <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2023-06-02tpm, tpm_tis: correct tpm_tis_flags enumeration valuesLino Sanfilippo1-4/+4
With commit 858e8b792d06 ("tpm, tpm_tis: Avoid cache incoherency in test for interrupts") bit accessor functions are used to access flags in tpm_tis_data->flags. However these functions expect bit numbers, while the flags are defined as bit masks in enum tpm_tis_flag. Fix this inconsistency by using numbers instead of masks also for the flags in the enum. Reported-by: Pavel Machek <[email protected]> Fixes: 858e8b792d06 ("tpm, tpm_tis: Avoid cache incoherency in test for interrupts") Signed-off-by: Lino Sanfilippo <[email protected]> Cc: [email protected] Reviewed-by: Pavel Machek <[email protected]> Signed-off-by: Linus Torvalds <[email protected]>
2023-06-02Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds1-1/+15
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 fix from Ted Ts'o: "Fix an ext4 regression which landed during the 6.4 merge window" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits"
2023-06-02Merge tag 'for-6.4-rc4-tag' of ↵Linus Torvalds2-20/+32
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One regression fix. The rewrite of scrub code in 6.4 broke device replace in zoned mode, some of the writes could happen out of order so this had to be adjusted for all cases" * tag 'for-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix dev-replace after the scrub rework
2023-06-02Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ↵Ojaswin Mujoo1-1/+15
ext4_mb_check_limits" This reverts commit 32c0869370194ae5ac9f9f501953ef693040f6a1. The reverted commit was intended to remove a dead check however it was observed that this check was actually being used to exit early instead of looping sbi->s_mb_max_to_scan times when we are able to find a free extent bigger than the goal extent. Due to this, a my performance tests (fsmark, parallel file writes in a highly fragmented FS) were seeing a 2x-3x regression. Example, the default value of the following variables is: sbi->s_mb_max_to_scan = 200 sbi->s_mb_min_to_scan = 10 In ext4_mb_check_limits() if we find an extent smaller than goal, then we return early and try again. This loop will go on until we have processed sbi->s_mb_max_to_scan(=200) number of free extents at which point we exit and just use whatever we have even if it is smaller than goal extent. Now, the regression comes when we find an extent bigger than goal. Earlier, in this case we would loop only sbi->s_mb_min_to_scan(=10) times and then just use the bigger extent. However with commit 32c08693 that check was removed and hence we would loop sbi->s_mb_max_to_scan(=200) times even though we have a big enough free extent to satisfy the request. The only time we would exit early would be when the free extent is *exactly* the size of our goal, which is pretty uncommon occurrence and so we would almost always end up looping 200 times. Hence, revert the commit by adding the check back to fix the regression. Also add a comment to outline this policy. Fixes: 32c086937019 ("ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits") Signed-off-by: Ojaswin Mujoo <[email protected]> Reviewed-by: Ritesh Harjani (IBM) <[email protected]> Reviewed-by: Kemeng Shi <[email protected]> Link: https://lore.kernel.org/r/ddcae9658e46880dfec2fb0aa61d01fb3353d202.1685449706.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <[email protected]>
2023-06-02media: uvcvideo: Don't expose unsupported formats to userspaceLaurent Pinchart1-5/+11
When the uvcvideo driver encounters a format descriptor with an unknown format GUID, it creates a corresponding struct uvc_format instance with the fcc field set to 0. Since commit 50459f103edf ("media: uvcvideo: Remove format descriptions"), the driver relies on the V4L2 core to provide the format description string, which the V4L2 core can't do without a valid 4CC. This triggers a WARN_ON. As a format with a zero 4CC can't be selected, it is unusable for applications. Ignore the format completely without creating a uvc_format instance, which fixes the warning. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217252 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2180107 Fixes: 50459f103edf ("media: uvcvideo: Remove format descriptions") Signed-off-by: Laurent Pinchart <[email protected]> Reviewed-by: Ricardo Ribalda <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>