blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-05-24	powercap: intel_rapl: Support per Interface primitive information	Zhang Rui	1	-0/+2
	RAPL primitive information is Interface specific. Although current MSR and MMIO Interface share the same RAPL primitives, new Interface like TPMI has its own RAPL primitive information. Save the primitive information in the Interface private structure. Plus, using variant name "rp" for struct rapl_primitive_info is confusing because "rp" is also used for struct rapl_package. Use "rpi" as the variant name for struct rapl_primitive_info, and rename the previous rpi[] array to avoid conflict. No functional change. Signed-off-by: Zhang Rui <[email protected]> Tested-by: Wang Wendy <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-05-24	powercap: intel_rapl: Support per Interface rapl_defaults	Zhang Rui	1	-0/+2
	rapl_defaults is Interface specific. Although current MSR and MMIO Interface share the same rapl_defaults, new Interface like TPMI need its own rapl_defaults callbacks. Save the rapl_defaults information in the Interface private structure. No functional change. Signed-off-by: Zhang Rui <[email protected]> Tested-by: Wang Wendy <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-05-24	powercap: intel_rapl: Remove unused field in struct rapl_if_priv	Zhang Rui	1	-1/+0
	After commit f1e8d7560d30 ("powercap/intel_rapl: enumerate Psys RAPL domain together with package RAPL domain"), the platform_rapl_domain field is not used anymore. Remove it from rapl_if_priv structure. Fixes: f1e8d7560d30 ("powercap/intel_rapl: enumerate Psys RAPL domain together with package RAPL domain") Signed-off-by: Zhang Rui <[email protected]> Signed-off-by: Rafael J. Wysocki <[email protected]>
2023-05-24	net: phylink: add function to resolve clause 73 negotiation	Russell King (Oracle)	1	-0/+2
	Add a function to resolve clause 73 negotiation according to the priority resolution function described in clause 73.3.6. Signed-off-by: Russell King (Oracle) <[email protected]> Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-24	net: mdio: add clause 73 to ethtool conversion helper	Russell King (Oracle)	1	-0/+39
	Add a helper to convert a clause 73 advertisement to an ethtool bitmap. Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Russell King (Oracle) <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-24	x86/pci/xen: populate MSI sysfs entries	Maximilian Heyne	1	-1/+8
	Commit bf5e758f02fc ("genirq/msi: Simplify sysfs handling") reworked the creation of sysfs entries for MSI IRQs. The creation used to be in msi_domain_alloc_irqs_descs_locked after calling ops->domain_alloc_irqs. Then it moved into __msi_domain_alloc_irqs which is an implementation of domain_alloc_irqs. However, Xen comes with the only other implementation of domain_alloc_irqs and hence doesn't run the sysfs population code anymore. Commit 6c796996ee70 ("x86/pci/xen: Fixup fallout from the PCI/MSI overhaul") set the flag MSI_FLAG_DEV_SYSFS for the xen msi_domain_info but that doesn't actually have an effect because Xen uses it's own domain_alloc_irqs implementation. Fix this by making use of the fallback functions for sysfs population. Fixes: bf5e758f02fc ("genirq/msi: Simplify sysfs handling") Signed-off-by: Maximilian Heyne <[email protected]> Reviewed-by: Juergen Gross <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Juergen Gross <[email protected]>
2023-05-24	block: Add BIO_PAGE_PINNED and associated infrastructure	David Howells	2	-1/+3
	Add BIO_PAGE_PINNED to indicate that the pages in a bio are pinned (FOLL_PIN) and that the pin will need removing. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: John Hubbard <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Jan Kara <[email protected]> cc: Matthew Wilcox <[email protected]> cc: Logan Gunthorpe <[email protected]> cc: [email protected] Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24	block: Replace BIO_NO_PAGE_REF with BIO_PAGE_REFFED with inverted logic	Christoph Hellwig	2	-2/+2
	Replace BIO_NO_PAGE_REF with a BIO_PAGE_REFFED flag that has the inverted meaning is only set when a page reference has been acquired that needs to be released by bio_release_pages(). Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: John Hubbard <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Jan Kara <[email protected]> cc: Matthew Wilcox <[email protected]> cc: Logan Gunthorpe <[email protected]> cc: [email protected] Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24	block: Fix bio_flagged() so that gcc can better optimise it	David Howells	1	-1/+1
	Fix bio_flagged() so that multiple instances of it, such as: if (bio_flagged(bio, BIO_PAGE_REFFED) \|\| bio_flagged(bio, BIO_PAGE_PINNED)) can be combined by the gcc optimiser into a single test in assembly (arguably, this is a compiler optimisation issue[1]). The missed optimisation stems from bio_flagged() comparing the result of the bitwise-AND to zero. This results in an out-of-line bio_release_page() being compiled to something like: <+0>: mov 0x14(%rdi),%eax <+3>: test $0x1,%al <+5>: jne 0xffffffff816dac53 <bio_release_pages+11> <+7>: test $0x2,%al <+9>: je 0xffffffff816dac5c <bio_release_pages+20> <+11>: movzbl %sil,%esi <+15>: jmp 0xffffffff816daba1 <__bio_release_pages> <+20>: jmp 0xffffffff81d0b800 <__x86_return_thunk> However, the test is superfluous as the return type is bool. Removing it results in: <+0>: testb $0x3,0x14(%rdi) <+4>: je 0xffffffff816e4af4 <bio_release_pages+15> <+6>: movzbl %sil,%esi <+10>: jmp 0xffffffff816dab7c <__bio_release_pages> <+15>: jmp 0xffffffff81d0b7c0 <__x86_return_thunk> instead. Also, the MOVZBL instruction looks unnecessary[2] - I think it's just 're-booling' the mark_dirty parameter. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: John Hubbard <[email protected]> cc: Jens Axboe <[email protected]> cc: [email protected] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108370 [1] Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108371 [2] Link: https://lore.kernel.org/r/167391056756.2311931.356007731815807265.stgit@warthog.procyon.org.uk/ # v6 Reviewed-by: Christian Brauner <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24	Merge branch 'for-6.5/splice' into for-6.5/block	Jens Axboe	3	-19/+6
	Merge splice bits as subsequent block cleanups and improvements for DIO depend on them. * for-6.5/splice: (31 commits) splice: kdoc for filemap_splice_read() and copy_splice_read() iov_iter: Kill ITER_PIPE splice: Remove generic_file_splice_read() splice: Use filemap_splice_read() instead of generic_file_splice_read() cifs: Use filemap_splice_read() trace: Convert trace/seq to use copy_splice_read() zonefs: Provide a splice-read wrapper xfs: Provide a splice-read wrapper orangefs: Provide a splice-read wrapper ocfs2: Provide a splice-read wrapper ntfs3: Provide a splice-read wrapper nfs: Provide a splice-read wrapper f2fs: Provide a splice-read wrapper ext4: Provide a splice-read wrapper ecryptfs: Provide a splice-read wrapper ceph: Provide a splice-read wrapper afs: Provide a splice-read wrapper 9p: Add splice_read wrapper net: Make sock_splice_read() use copy_splice_read() by default tty, proc, kernfs, random: Use copy_splice_read() ...
2023-05-24	iov_iter: Kill ITER_PIPE	David Howells	1	-14/+0
	The ITER_PIPE-type iterator was only used by generic_file_splice_read() and that has been replaced and removed. This leaves ITER_PIPE unused - so remove it too. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: David Hildenbrand <[email protected]> cc: John Hubbard <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24	splice: Remove generic_file_splice_read()	David Howells	1	-2/+0
	Remove generic_file_splice_read() as it has been replaced with calls to filemap_splice_read() and copy_splice_read(). With this, ITER_PIPE is no longer used. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Jens Axboe <[email protected]> cc: Steve French <[email protected]> cc: Al Viro <[email protected]> cc: David Hildenbrand <[email protected]> cc: John Hubbard <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24	splice: Make do_splice_to() generic and export it	David Howells	1	-0/+3
	Rename do_splice_to() to vfs_splice_read() and export it so that it can be used as a helper when calling down to a lower layer filesystem as it performs all the necessary checks[1]. Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Miklos Szeredi <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: John Hubbard <[email protected]> cc: David Hildenbrand <[email protected]> cc: Matthew Wilcox <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/CAJfpeguGksS3sCigmRi9hJdUec8qtM9f+_9jC1rJhsXT+dV01w@mail.gmail.com/ [1] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24	splice: Rename direct_splice_read() to copy_splice_read()	David Howells	1	-3/+3
	Rename direct_splice_read() to copy_splice_read() to better reflect as to what it does. Suggested-by: Christoph Hellwig <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Christian Brauner <[email protected]> cc: Steve French <[email protected]> cc: Jens Axboe <[email protected]> cc: Al Viro <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-24	ARM/musb: omap2: Remove global GPIO numbers from TUSB6010	Linus Walleij	1	-13/+0
	The TUSB6010 (MUSB) device is picking up some GPIO lines hardcoded by number and passing on to the TUSB6010 device when registering it. Instead of nasty workarounds, provide a GPIO descriptor table and then make the TUSB6010 MUSB glue driver pick up the GPIO lines directly, convert it to an IRQ and pass down to the MUSB driver. OMAP2 is the only system using the TUSB6010. Stash the GPIO descriptors in the glue layer and use then to power up and down the TUSB6010 on-demand, instead of using boardfile callbacks. Since the OMAP2 boards are the only boards using the .set_power() and .board_set_power() callbacks, we can just delete them as the power is now handled directly in the TUSB6010 glue code. Cc: Bin Liu <[email protected]> Cc: [email protected] Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Acked-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
2023-05-24	ARM/gpio: Push OMAP2 quirk down into TWL4030 driver	Linus Walleij	1	-3/+0
	The TWL4030 GPIO driver has a custom platform data .set_up() callback to call back into the platform and do misc stuff such as hog and export a GPIO for WLAN PWR on a specific OMAP3 board. Avoid all the kludgery in the platform data and the boardfile and just put the quirks right into the driver. Make it conditional on OMAP3. I think the exported GPIO is used by some kind of userspace so ordinary DTS hogs will probably not work. Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Signed-off-by: Linus Walleij <[email protected]>
2023-05-24	ARM/mmc: Convert old mmci-omap to GPIO descriptors	Linus Walleij	1	-2/+0
	A recent change to the OMAP driver making it use a dynamic GPIO base created problems with some old OMAP1 board files, among them Nokia 770, SX1 and also the OMAP2 Nokia n8x0. Fix up all instances of GPIOs being used for the MMC driver by pushing the handling of power, slot selection and MMC "cover" into the driver as optional GPIOs. This is maybe not the most perfect solution as the MMC framework have some central handlers for some of the stuff, but it at least makes the situtation better and solves the immediate issue. Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Acked-by: Ulf Hansson <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
2023-05-24	Input: ads7846 - Convert to use software nodes	Linus Walleij	2	-4/+0
	The Nokia 770 is using GPIOs from the global numberspace on the CBUS node to pass down to the LCD controller. This regresses when we let the OMAP GPIO driver use dynamic GPIO base. The Nokia 770 now has dynamic allocation of IRQ numbers, so this needs to be fixed for it to work. As this is the only user of LCD MIPID we can easily augment the driver to use a GPIO descriptor instead and resolve the issue. The platform data .shutdown() callback wasn't even used in the code, but we encode a shutdown asserting RESET in the remove() callback for completeness sake. The CBUS also has the ADS7846 touchscreen attached. Populate the devices on the Nokia 770 CBUS I2C using software nodes instead of platform data quirks. This includes the LCD and the ADS7846 touchscreen so the conversion just brings the LCD along with it as software nodes is an all-or-nothing design pattern. The ADS7846 has some limited support for using GPIO descriptors, let's convert it over completely to using device properties and then fix all remaining boardfile users to provide all platform data using software nodes. Dump the of includes and of_match_ptr() in the ADS7846 driver as part of the job. Since we have to move ADS7846 over to obtaining the GPIOs it is using exclusively from descriptors, we provide descriptor tables for the two remaining in-kernel boardfiles using ADS7846: - PXA Spitz - MIPS Alchemy DB1000 development board It was too hard for me to include software node conversion of these two remaining users at this time: the spitz is using a hscync callback in the platform data that would require further GPIO descriptor conversion of the Spitz, and moving the hsync callback down into the driver: it will just become too big of a job, but it can be done separately. The MIPS Alchemy DB1000 is simply something I cannot test, so take the easier approach of just providing some GPIO descriptors in this case as I don't want the patch to grow too intrusive. As we see that several device trees have incorrect polarity flags and just expect to bypass the gpiolib polarity handling, fix up all device trees too, in a separate patch. Suggested-by: Dmitry Torokhov <[email protected]> Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Acked-by: Dmitry Torokhov <[email protected]> Reviewed-by: Dmitry Torokhov <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
2023-05-24	ARM/mfd/gpio: Fixup TPS65010 regression on OMAP1 OSK1	Linus Walleij	1	-7/+4
	Aaro reports problems on the OSK1 board after we altered the dynamic base for GPIO allocations. It appears this happens because the OMAP driver now allocates GPIO numbers dynamically, so all that is references by number is a bit up in the air. Let's bite the bullet and try to just move the gpio_chip in the tps65010 MFD driver over to using dynamic allocations. Alter everything in the OSK1 board file to use a GPIO descriptor table and lookups. Utilize the NULL device to define some board-specific GPIO lookups and use these to immediately look up the same GPIOs, convert to IRQ numbers and pass as resources to the devices. This is ugly but should work. The .setup() callback for tps65010 was used for some GPIO hogging, but since the OSK1 is the only user in the entire kernel we can alter the signatures to something that is helpful and make a clean transition. Fixes: 92bf78b33b0b ("gpio: omap: use dynamic allocation of base") Cc: Christophe Leroy <[email protected]> Cc: [email protected] Cc: Andreas Kemnade <[email protected]> Acked-by: Lee Jones <[email protected]> Reviewed-by: Lee Jones <[email protected]> Reported-by: Aaro Koskinen <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Linus Walleij <[email protected]>
2023-05-24	spi: Merge up v6.4-rc3	Mark Brown	13	-18/+43
	Merge up v6.4-rc3 in order to get fixes to improve the stability of my CI.
2023-05-24	regulator: Merge up v6.4-rc3	Mark Brown	13	-18/+43
	Merge up v6.4-rc3 in order to get fixes to improve the stability of my CI.
2023-05-24	genirq: Use hlist for managing resend handlers	Shanker Donthineni	1	-0/+3
	The current implementation utilizes a bitmap for managing interrupt resend handlers, which is allocated based on the SPARSE_IRQ/NR_IRQS macros. However, this method may not efficiently utilize memory during runtime, particularly when IRQ_BITMAP_BITS is large. Address this issue by using an hlist to manage interrupt resend handlers instead of relying on a static bitmap memory allocation. Additionally, a new function, clear_irq_resend(), is introduced and called from irq_shutdown to ensure a graceful teardown of the interrupt. Signed-off-by: Shanker Donthineni <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-05-24	net: phy: avoid kernel warning dump when stopping an errored PHY	Russell King (Oracle)	1	-2/+5
	When taking a network interface down (or removing a SFP module) after the PHY has encountered an error, phy_stop() complains incorrectly that it was called from HALTED state. The reason this is incorrect is that the network driver will have called phy_start() when the interface was brought up, and the fact that the PHY has a problem bears no relationship to the administrative state of the interface. Taking the interface administratively down (which calls phy_stop()) is always the right thing to do after a successful phy_start() call, whether or not the PHY has encountered an error. Signed-off-by: Russell King (Oracle) <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-24	dmaengine: dw-edma: Add support for native HDMA	Cai Huoqing	1	-1/+2
	Add support for HDMA NATIVE, as long the IP design has set the compatible register map parameter-HDMA_NATIVE, which allows compatibility for native HDMA register configuration. The HDMA Hyper-DMA IP is an enhancement of the eDMA embedded-DMA IP. And the native HDMA registers are different from eDMA, so this patch add support for HDMA NATIVE mode. HDMA write and read channels operate independently to maximize the performance of the HDMA read and write data transfer over the link When you configure the HDMA with multiple read channels, then it uses a round robin (RR) arbitration scheme to select the next read channel to be serviced.The same applies when you have multiple write channels. The native HDMA driver also supports a maximum of 16 independent channels (8 write + 8 read), which can run simultaneously. Both SAR (Source Address Register) and DAR (Destination Address Register) are aligned to byte. Signed-off-by: Cai Huoqing <[email protected]> Reviewed-by: Serge Semin <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]> Tested-by: Serge Semin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>
2023-05-24	dmaengine: dw-edma: Rename dw_edma_core_ops structure to dw_edma_plat_ops	Cai Huoqing	1	-2/+2
	The dw_edma_core_ops structure contains a set of the operations: device IRQ numbers getter, CPU/PCI address translation. Based on the functions semantics the structure name "dw_edma_plat_ops" looks more descriptive since indeed the operations are platform-specific. The "dw_edma_core_ops" name shall be used for a structure with the IP-core specific set of callbacks in order to abstract out DW eDMA and DW HDMA setups. Such structure will be added in one of the next commit in the framework of the set of changes adding the DW HDMA device support. Anyway the renaming was necessary to distinguish two types of the implementation callbacks: 1. DW eDMA/hDMA IP-core specific operations: device-specific CSR setups in one or another aspect of the DMA-engine initialization. 2. DW eDMA/hDMA platform specific operations: the DMA device environment configs like IRQs, address translation, etc. Signed-off-by: Cai Huoqing <[email protected]> Reviewed-by: Serge Semin <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]> Tested-by: Serge Semin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>
2023-05-23	sysctl: Refactor base paths registrations	Joel Granados	1	-23/+0
	This is part of the general push to deprecate register_sysctl_paths and register_sysctl_table. The old way of doing this through register_sysctl_base and DECLARE_SYSCTL_BASE macro is replaced with a call to register_sysctl_init. The 5 base paths affected are: "kernel", "vm", "debug", "dev" and "fs". We remove the register_sysctl_base function and the DECLARE_SYSCTL_BASE macro since they are no longer needed. In order to quickly acertain that the paths did not actually change I executed `find /proc/sys/ \| sha1sum` and made sure that the sha was the same before and after the commit. We end up saving 563 bytes with this change: ./scripts/bloat-o-meter vmlinux.0.base vmlinux.1.refactor-base-paths add/remove: 0/5 grow/shrink: 2/0 up/down: 77/-640 (-563) Function old new delta sysctl_init_bases 55 111 +56 init_fs_sysctls 12 33 +21 vm_base_table 128 - -128 kernel_base_table 128 - -128 fs_base_table 128 - -128 dev_base_table 128 - -128 debug_base_table 128 - -128 Total: Before=21258215, After=21257652, chg -0.00% [mcgrof: modified to use register_sysctl_init() over register_sysctl() and add bloat-o-meter stats] Signed-off-by: Joel Granados <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]> Tested-by: Stephen Rothwell <[email protected]> Acked-by: Christian Brauner <[email protected]>
2023-05-23	sysctl: stop exporting register_sysctl_table	Joel Granados	1	-7/+1
	We make register_sysctl_table static because the only function calling it is in fs/proc/proc_sysctl.c (__register_sysctl_base). We remove it from the sysctl.h header and modify the documentation in both the header and proc_sysctl.c files to mention "register_sysctl" instead of "register_sysctl_table". This plus the commits that remove register_sysctl_table from parport save 217 bytes: ./scripts/bloat-o-meter .bsysctl/vmlinux.old .bsysctl/vmlinux.new add/remove: 0/1 grow/shrink: 5/1 up/down: 458/-675 (-217) Function old new delta __register_sysctl_base 8 286 +278 parport_proc_register 268 379 +111 parport_device_proc_register 195 247 +52 kzalloc.constprop 598 608 +10 parport_default_proc_register 62 69 +7 register_sysctl_table 291 - -291 parport_sysctl_template 1288 904 -384 Total: Before=8603076, After=8602859, chg -0.00% Signed-off-by: Joel Granados <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2023-05-23	parport: Move magic number "15" to a define	Joel Granados	1	-0/+2
	Put the size of a parport name behind a define so we can use it in other files. This is a preparation patch to be able to use this size in parport/procfs.c. Signed-off-by: Joel Granados <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2023-05-23	net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES	David Howells	1	-0/+3
	Add a function to handle MSG_SPLICE_PAGES being passed internally to sendmsg(). Pages are spliced into the given socket buffer if possible and copied in if not (e.g. they're slab pages or have a zero refcount). Signed-off-by: David Howells <[email protected]> cc: David Ahern <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-23	net: Pass max frags into skb_append_pagefrags()	David Howells	1	-1/+1
	Pass the maximum number of fragments into skb_append_pagefrags() rather than using MAX_SKB_FRAGS so that it can be used from code that wants to specify sysctl_max_skb_frags. Signed-off-by: David Howells <[email protected]> cc: David Ahern <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-23	net: Declare MSG_SPLICE_PAGES internal sendmsg() flag	David Howells	1	-0/+3
	Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a network protocol that it should splice pages from the source iterator rather than copying the data if it can. This flag is added to a list that is cleared by sendmsg syscalls on entry. This is intended as a replacement for the ->sendpage() op, allowing a way to splice in several multipage folios in one go. Signed-off-by: David Howells <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-23	tracing: Rename stacktrace field to common_stacktrace	Steven Rostedt (Google)	1	-0/+1
	The histogram and synthetic events can use a pseudo event called "stacktrace" that will create a stacktrace at the time of the event and use it just like it was a normal field. We have other pseudo events such as "common_cpu" and "common_timestamp". To stay consistent with that, convert "stacktrace" to "common_stacktrace". As this was used in older kernels, to keep backward compatibility, this will act just like "common_cpu" did with "cpu". That is, "cpu" will be the same as "common_cpu" unless the event has a "cpu" field. In which case, the event's field is used. The same is true with "stacktrace". Also update the documentation to reflect this change. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-05-23	tracing/user_events: Document user_event_mm one-shot list usage	Beau Belgrave	1	-0/+1
	During 6.4 development it became clear that the one-shot list used by the user_event_mm's next field was confusing to others. It is not clear how this list is protected or what the next field usage is for unless you are familiar with the code. Add comments into the user_event_mm struct indicating lock requirement and usage. Also document how and why this approach was used via comments in both user_event_enabler_update() and user_event_mm_get_all() and the rules to properly use it. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wicngggxVpbnrYHjRTwGE0WYscPRM+L2HO2BF8ia1EXgQ@mail.gmail.com/ Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-05-23	tracing/user_events: Rename link fields for clarity	Beau Belgrave	1	-1/+1
	Currently most list_head fields of various structs within user_events are simply named link. This causes folks to keep additional context in their head when working with the code, which can be confusing. Instead of using link, describe what the actual link is, for example: list_del_rcu(&mm->link); Changes into: list_del_rcu(&mm->mms_link); The reader now is given a hint the link is to the mms global list instead of having to remember or spot check within the code. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wicngggxVpbnrYHjRTwGE0WYscPRM+L2HO2BF8ia1EXgQ@mail.gmail.com/ Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-05-23	regmap: Merge up v6.4-rc3	Mark Brown	13	-18/+43
	Merge up v6.4-rc3 to get fixes which make my CI more stable.
2023-05-23	vfio/pci: Probe and store ability to support dynamic MSI-X	Reinette Chatre	1	-0/+1
	Not all MSI-X devices support dynamic MSI-X allocation. Whether a device supports dynamic MSI-X should be queried using pci_msix_can_alloc_dyn(). Instead of scattering code with pci_msix_can_alloc_dyn(), probe this ability once and store it as a property of the virtual device. Suggested-by: Alex Williamson <[email protected]> Signed-off-by: Reinette Chatre <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/f1ae022c060ecb7e527f4f53c8ccafe80768da47.1683740667.git.reinette.chatre@intel.com Signed-off-by: Alex Williamson <[email protected]>
2023-05-23	vfio/pci: Use bitfield for struct vfio_pci_core_device flags	Reinette Chatre	1	-11/+11
	struct vfio_pci_core_device contains eleven boolean flags. Boolean flags clearly indicate their usage but space usage starts to be a concern when there are many. An upcoming change adds another boolean flag to struct vfio_pci_core_device, thereby increasing the concern that the boolean flags are consuming unnecessary space. Transition the boolean flags to use bitfields. On a system that uses one byte per boolean this reduces the space consumed by existing flags from 11 bytes to 2 bytes with room for a few more flags without increasing the structure's size. Suggested-by: Jason Gunthorpe <[email protected]> Signed-off-by: Reinette Chatre <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/cf34bf0499c889554a8105eeb18cc0ab673005be.1683740667.git.reinette.chatre@intel.com Signed-off-by: Alex Williamson <[email protected]>
2023-05-23	vfio/pci: Remove interrupt context counter	Reinette Chatre	1	-1/+0
	struct vfio_pci_core_device::num_ctx counts how many interrupt contexts have been allocated. When all interrupt contexts are allocated simultaneously num_ctx provides the upper bound of all vectors that can be used as indices into the interrupt context array. With the upcoming support for dynamic MSI-X the number of interrupt contexts does not necessarily span the range of allocated interrupts. Consequently, num_ctx is no longer a trusted upper bound for valid indices. Stop using num_ctx to determine if a provided vector is valid. Use the existence of allocated interrupt. This changes behavior on the error path when user space provides an invalid vector range. Behavior changes from early exit without any modifications to possible modifications to valid vectors within the invalid range. This is acceptable considering that an invalid range is not a valid scenario, see link to discussion. The checks that ensure that user space provides a range of vectors that is valid for the device are untouched. Signed-off-by: Reinette Chatre <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Reviewed-by: Kevin Tian <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/e27d350f02a65b8cbacd409b4321f5ce35b3186d.1683740667.git.reinette.chatre@intel.com Signed-off-by: Alex Williamson <[email protected]>
2023-05-23	vfio/pci: Use xarray for interrupt context storage	Reinette Chatre	1	-1/+1
	Interrupt context is statically allocated at the time interrupts are allocated. Following allocation, the context is managed by directly accessing the elements of the array using the vector as index. The storage is released when interrupts are disabled. It is possible to dynamically allocate a single MSI-X interrupt after MSI-X is enabled. A dynamic storage for interrupt context is needed to support this. Replace the interrupt context array with an xarray (similar to what the core uses as store for MSI descriptors) that can support the dynamic expansion while maintaining the custom that uses the vector as index. With a dynamic storage it is no longer required to pre-allocate interrupt contexts at the time the interrupts are allocated. MSI and MSI-X interrupt contexts are only used when interrupts are enabled. Their allocation can thus be delayed until interrupt enabling. Only enabled interrupts will have associated interrupt contexts. Whether an interrupt has been allocated (a Linux irq number exists for it) becomes the criteria for whether an interrupt can be enabled. Signed-off-by: Reinette Chatre <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Reviewed-by: Kevin Tian <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/40e235f38d427aff79ae35eda0ced42502aa0937.1683740667.git.reinette.chatre@intel.com Signed-off-by: Alex Williamson <[email protected]>
2023-05-23	bpf: Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands	Andrii Nakryiko	1	-2/+2
	Current UAPI of BPF_OBJ_PIN and BPF_OBJ_GET commands of bpf() syscall forces users to specify pinning location as a string-based absolute or relative (to current working directory) path. This has various implications related to security (e.g., symlink-based attacks), forces BPF FS to be exposed in the file system, which can cause races with other applications. One of the feedbacks we got from folks working with containers heavily was that inability to use purely FD-based location specification was an unfortunate limitation and hindrance for BPF_OBJ_PIN and BPF_OBJ_GET commands. This patch closes this oversight, adding path_fd field to BPF_OBJ_PIN and BPF_OBJ_GET UAPI, following conventions established by *at() syscalls for dirfd + pathname combinations. This now allows interesting possibilities like working with detached BPF FS mount (e.g., to perform multiple pinnings without running a risk of someone interfering with them), and generally making pinning/getting more secure and not prone to any races and/or security attacks. This is demonstrated by a selftest added in subsequent patch that takes advantage of new mount APIs (fsopen, fsconfig, fsmount) to demonstrate creating detached BPF FS mount, pinning, and then getting BPF map out of it, all while never exposing this private instance of BPF FS to outside worlds. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-05-23	iio: trigger: Add simple trigger_validation helper	Matti Vaittinen	1	-0/+1
	Some triggers can only be attached to the IIO device that corresponds to the same physical device. Implement generic helper which can be used as a validate_trigger callback for such devices. Suggested-by: Jonathan Cameron <[email protected]> Signed-off-by: Matti Vaittinen <[email protected]> Link: https://lore.kernel.org/r/51cd3e3e74a6addf8d333f4a109fb9c5a11086ee.1683541225.git.mazziesaccount@gmail.com Signed-off-by: Jonathan Cameron <[email protected]>
2023-05-23	rcuwait: Support timeouts	Davidlohr Bueso	1	-3/+20
	The rcuwait utility provides an efficient and safe single wait/wake mechanism. It is used in situations where queued wait is the wrong semantics, and often too bulky. For example, cases where the wait is already done under a lock. In the past, rcuwait has been extended to support beyond only uninterruptible sleep, and similarly, there are users that can benefit for the addition of timeouts. As such, tntroduce rcuwait_wait_event_timeout(), with semantics equivalent to calls for queued wait counterparts. Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Davidlohr Bueso <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
2023-05-23	regulator: expose regulator_find_closest_bigger	Sebastian Reichel	1	-0/+2
	Expose and document the table lookup logic used by regulator_set_ramp_delay_regmap, so that it can be reused for devices that cannot be configured via regulator_set_ramp_delay_regmap. Tested-by: Diederik de Haas <[email protected]> # Rock64, Quartz64 Model A + B Tested-by: Vincent Legoll <[email protected]> # Pine64 QuartzPro64 Signed-off-by: Sebastian Reichel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-05-23	block/rq_qos: protect rq_qos apis with a new lock	Yu Kuai	1	-0/+1
	commit 50e34d78815e ("block: disable the elevator int del_gendisk") move rq_qos_exit() from disk_release() to del_gendisk(), this will introduce some problems: 1) If rq_qos_add() is triggered by enabling iocost/iolatency through cgroupfs, then it can concurrent with del_gendisk(), it's not safe to write 'q->rq_qos' concurrently. 2) Activate cgroup policy that is relied on rq_qos will call rq_qos_add() and blkcg_activate_policy(), and if rq_qos_exit() is called in the middle, null-ptr-dereference will be triggered in blkcg_activate_policy(). 3) blkg_conf_open_bdev() can call blkdev_get_no_open() first to find the disk, then if rq_qos_exit() from del_gendisk() is done before rq_qos_add(), then memory will be leaked. This patch add a new disk level mutex 'rq_qos_mutex': 1) The lock will protect rq_qos_exit() directly. 2) For wbt that doesn't relied on blk-cgroup, rq_qos_add() can only be called from disk initialization for now because wbt can't be destructed until rq_qos_exit(), so it's safe not to protect wbt for now. Hoever, in case that rq_qos dynamically destruction is supported in the furture, this patch also protect rq_qos_add() from wbt_init() directly, this is enough because blk-sysfs already synchronize writers with disk removal. 3) For iocost and iolatency, in order to synchronize disk removal and cgroup configuration, the lock is held after blkdev_get_no_open() from blkg_conf_open_bdev(), and is released in blkg_conf_exit(). In order to fix the above memory leak, disk_live() is checked after holding the new lock. Fixes: 50e34d78815e ("block: disable the elevator int del_gendisk") Signed-off-by: Yu Kuai <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-23	block: remove redundant req_op in blk_rq_is_passthrough	Li Nan	1	-1/+1
	op &= REQ_OP_MASK in blk_op_is_passthrough() is exactly what req_op() do. Therefore, it is redundant to call req_op() for blk_op_is_passthrough(). Signed-off-by: Li Nan <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-23	bpf, sockmap: Improved check for empty queue	John Fastabend	1	-1/+0
	We noticed some rare sk_buffs were stepping past the queue when system was under memory pressure. The general theory is to skip enqueueing sk_buffs when its not necessary which is the normal case with a system that is properly provisioned for the task, no memory pressure and enough cpu assigned. But, if we can't allocate memory due to an ENOMEM error when enqueueing the sk_buff into the sockmap receive queue we push it onto a delayed workqueue to retry later. When a new sk_buff is received we then check if that queue is empty. However, there is a problem with simply checking the queue length. When a sk_buff is being processed from the ingress queue but not yet on the sockmap msg receive queue its possible to also recv a sk_buff through normal path. It will check the ingress queue which is zero and then skip ahead of the pkt being processed. Previously we used sock lock from both contexts which made the problem harder to hit, but not impossible. To fix instead of popping the skb from the queue entirely we peek the skb from the queue and do the copy there. This ensures checks to the queue length are non-zero while skb is being processed. Then finally when the entire skb has been copied to user space queue or another socket we pop it off the queue. This way the queue length check allows bypassing the queue only after the list has been completely processed. To reproduce issue we run NGINX compliance test with sockmap running and observe some flakes in our testing that we attributed to this issue. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Suggested-by: Jakub Sitnicki <[email protected]> Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: William Findlay <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-05-23	bpf, sockmap: Convert schedule_work into delayed_work	John Fastabend	1	-1/+1
	Sk_buffs are fed into sockmap verdict programs either from a strparser (when the user might want to decide how framing of skb is done by attaching another parser program) or directly through tcp_read_sock. The tcp_read_sock is the preferred method for performance when the BPF logic is a stream parser. The flow for Cilium's common use case with a stream parser is, tcp_read_sock() sk_psock_verdict_recv ret = bpf_prog_run_pin_on_cpu() sk_psock_verdict_apply(sock, skb, ret) // if system is under memory pressure or app is slow we may // need to queue skb. Do this queuing through ingress_skb and // then kick timer to wake up handler skb_queue_tail(ingress_skb, skb) schedule_work(work); The work queue is wired up to sk_psock_backlog(). This will then walk the ingress_skb skb list that holds our sk_buffs that could not be handled, but should be OK to run at some later point. However, its possible that the workqueue doing this work still hits an error when sending the skb. When this happens the skbuff is requeued on a temporary 'state' struct kept with the workqueue. This is necessary because its possible to partially send an skbuff before hitting an error and we need to know how and where to restart when the workqueue runs next. Now for the trouble, we don't rekick the workqueue. This can cause a stall where the skbuff we just cached on the state variable might never be sent. This happens when its the last packet in a flow and no further packets come along that would cause the system to kick the workqueue from that side. To fix we could do simple schedule_work(), but while under memory pressure it makes sense to back off some instead of continue to retry repeatedly. So instead to fix convert schedule_work to schedule_delayed_work and add backoff logic to reschedule from backlog queue on errors. Its not obvious though what a good backoff is so use '1'. To test we observed some flakes whil running NGINX compliance test with sockmap we attributed these failed test to this bug and subsequent issue. >From on list discussion. This commit bec217197b41("skmsg: Schedule psock work if the cached skb exists on the psock") was intended to address similar race, but had a couple cases it missed. Most obvious it only accounted for receiving traffic on the local socket so if redirecting into another socket we could still get an sk_buff stuck here. Next it missed the case where copied=0 in the recv() handler and then we wouldn't kick the scheduler. Also its sub-optimal to require userspace to kick the internal mechanisms of sockmap to wake it up and copy data to user. It results in an extra syscall and requires the app to actual handle the EAGAIN correctly. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: William Findlay <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-05-23	ALSA: usb-audio: Define USB MIDI 2.0 specs	Takashi Iwai	1	-0/+94
	Define new structs and constants from USB MIDI 2.0 specification, to be used in the upcoming MIDI 2.0 support in USB-audio driver. A new class-specific endpoint descriptor and group terminal block descriptors are defined. Acked-by: Greg Kroah-Hartman <[email protected]> Acked-by: Jaroslav Kysela <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-05-22	net/mlx5: DR, Check force-loopback RC QP capability independently from RoCE	Yevgeny Kliteynik	1	-1/+3
	SW Steering uses RC QP for writing STEs to ICM. This writingis done in LB (loopback), and FL (force-loopback) QP is preferred for performance. FL is available when RoCE is enabled or disabled based on RoCE caps. This patch adds reading of FL capability from HCA caps in addition to the existing reading from RoCE caps, thus fixing the case where we didn't have loopback enabled when RoCE was disabled. Fixes: 7304d603a57a ("net/mlx5: DR, Add support for force-loopback QP") Signed-off-by: Itamar Gozlan <[email protected]> Signed-off-by: Yevgeny Kliteynik <[email protected]> Signed-off-by: Saeed Mahameed <[email protected]>
2023-05-22	Merge patch series "Add Command Duration Limits support"	Martin K. Petersen	3	-14/+45
	Niklas Cassel <[email protected]> says: This series adds support for Command Duration Limits. The series is based on linux tag: v6.4-rc1 The series can also be found in git: https://github.com/floatious/linux/commits/cdl-v7 ================= CDL in ATA / SCSI ================= Command Duration Limits is defined in: T13 ATA Command Set - 5 (ACS-5) and T10 SCSI Primary Commands - 6 (SPC-6) respectively (a simpler version of CDL is defined in T10 SPC-5). CDL defines Duration Limits Descriptors (DLD). 7 DLDs for read commands and 7 DLDs for write commands. Simply put, a DLD contains a limit and a policy. A command can specify that a certain limit should be applied by setting the DLD index field (3 bits, so 0-7) in the command itself. The DLD index points to one of the 7 DLDs. DLD index 0 means no descriptor, so no limit. DLD index 1-7 means DLD 1-7. A DLD can have a few different policies, but the two major ones are: -Policy 0xF (abort), command will be completed with command aborted error (ATA) or status CHECK CONDITION (SCSI), with sense data indicating that the command timed out. -Policy 0xD (complete-unavailable), command will be completed without error (ATA) or status GOOD (SCSI), with sense data indicating that the command timed out. Note that the command will not have transferred any data to/from the device when the command timed out, even though the command returned success. Regardless of the CDL policy, in case of a CDL timeout, the I/O will result in a -ETIME error to user-space. The DLDs are defined in the CDL log page(s) and are readable and writable. Reading and writing the CDL DLDs are outside the scope of the kernel. If a user wants to read or write the descriptors, they can do so using a user-space application that sends passthrough commands, such as cdl-tools: https://github.com/westerndigitalcorporation/cdl-tools ================================ The introduction of ioprio hints ================================ What the kernel does provide, is a method to let I/O use one of the CDL DLDs defined in the device. Note that the kernel will simply forward the DLD index to the device, so the kernel currently does not know, nor does it need to know, how the DLDs are defined inside the device. The way that the CDL DLD index is supplied to the kernel is by introducing a new 10 bit "ioprio hint" field within the existing 16 bit ioprio definition. Currently, only 6 out of the 16 ioprio bits are in use, the remaining 10 bits are unused, and are currently explicitly disallowed to be set by the kernel. For now, we only add ioprio hints representing CDL DLD index 1-7. Additional ioprio hints for other QoS features could be defined in the future. A theoretical future work could be to make an I/O scheduler aware of these hints. E.g. for CDL, an I/O scheduler could make use of the duration limit in each descriptor, and take that information into account while scheduling commands. Right now, the ioprio hints will be ignored by the I/O schedulers. ============================== How to use CDL from user-space ============================== Since CDL is mutually exclusive with NCQ priority (see ncq_prio_enable and sas_ncq_prio_enable in Documentation/ABI/testing/sysfs-block-device), CDL has to be explicitly enabled using: echo 1 > /sys/block/$bdev/device/cdl_enable Since the ioprio hints are supplied through the existing I/O priority API, it should be simple for an application to make use of the ioprio hints. It simply has to reuse one of the new macros defined in include/uapi/linux/ioprio.h: IOPRIO_PRIO_HINT() or IOPRIO_PRIO_VALUE_HINT(), and supply one of the new hints defined in include/uapi/linux/ioprio.h: IOPRIO_HINT_DEV_DURATION_LIMIT_[1-7], which indicates that the I/O should use the corresponding CDL DLD index 1-7. By reusing the I/O priority API, the user can both define a DLD to use per AIO (io_uring sqe->ioprio or libaio iocb->aio_reqprio) or per-thread (ioprio_set()). ======= Testing ======= With the following fio patches: https://github.com/floatious/fio/commits/cdl fio adds support for ioprio hints, such that CDL can be tested using e.g.: fio --ioengine=io_uring --cmdprio_percentage=10 --cmdprio_hint=DLD_index A simple way to test is to use a DLD with a very short duration limit, and send large reads. Regardless of the CDL policy, in case of a CDL timeout, the I/O will result in a -ETIME error to user-space. We also provide a CDL test suite located in the cdl-tools repo, see: https://github.com/westerndigitalcorporation/cdl-tools#testing-a-system-command-duration-limits-support We have tested this patch series using: -real hardware -the following QEMU implementation: https://github.com/floatious/qemu/tree/cdl (NOTE: the QEMU implementation requires you to define the CDL policy at compile time, so you currently need to recompile QEMU when switching between policies.) =================== Further information =================== For further information about CDL, see Damien's slides: Presented at SDC 2021: https://www.snia.org/sites/default/files/SDC/2021/pdfs/SNIA-SDC21-LeMoal-Be-On-Time-command-duration-limits-Feature-Support-in%20Linux.pdf Presented at Lund Linux Con 2022: https://drive.google.com/file/d/1I6ChFc0h4JY9qZdO1bY5oCAdYCSZVqWw/view?usp=sharing ================ Changes since V6 ================ -Rebased series on v6.4-rc1. -Picked up Reviewed-by tags from Hannes (Thank you Hannes!) -Picked up Reviewed-by tag from Christoph (Thank you Christoph!) -Changed KernelVersion from 6.4 to 6.5 for new sysfs attributes. For older change logs, see previous patch series versions: https://lore.kernel.org/linux-scsi/[email protected]/ https://lore.kernel.org/linux-scsi/[email protected]/ https://lore.kernel.org/linux-scsi/[email protected]/ https://lore.kernel.org/linux-scsi/[email protected]/ https://lore.kernel.org/linux-scsi/[email protected]/ https://lore.kernel.org/linux-scsi/[email protected]/ Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin K. Petersen <[email protected]>