blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2023-05-24	ALSA: emu10k1: add synchronized start of multi-channel playback	Oswald Buddenhagen	1	-1/+9
	We use independent voices for the channels, so we need to make an effort to ensure that they are actually in sync. The hardware doesn't provide atomicity, so we may need to retry a few times, due to NMIs, PCI contention, and the wrong phase of the moon. Solution inspired by kX-project. Signed-off-by: Oswald Buddenhagen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-05-24	spi: Merge up v6.4-rc3	Mark Brown	24	-52/+90
	Merge up v6.4-rc3 in order to get fixes to improve the stability of my CI.
2023-05-24	regulator: Merge up v6.4-rc3	Mark Brown	24	-52/+90
	Merge up v6.4-rc3 in order to get fixes to improve the stability of my CI.
2023-05-24	genirq: Use hlist for managing resend handlers	Shanker Donthineni	1	-0/+3
	The current implementation utilizes a bitmap for managing interrupt resend handlers, which is allocated based on the SPARSE_IRQ/NR_IRQS macros. However, this method may not efficiently utilize memory during runtime, particularly when IRQ_BITMAP_BITS is large. Address this issue by using an hlist to manage interrupt resend handlers instead of relying on a static bitmap memory allocation. Additionally, a new function, clear_irq_resend(), is introduced and called from irq_shutdown to ensure a graceful teardown of the interrupt. Signed-off-by: Shanker Donthineni <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2023-05-24	devlink: pass devlink_port pointer to ops->port_del() instead of index	Jiri Pirko	1	-2/+2
	Historically there was a reason why port_dev() along with for example port_split() did get port_index instead of the devlink_port pointer. With the locking changes that were done which ensured devlink instance mutex is hold for every command, the port ops could get devlink_port pointer directly. Change the forgotten port_dev() op to be as others and pass devlink_port pointer instead of port_index. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-24	devlink: remove no longer true locking comment from port_new/del()	Jiri Pirko	1	-4/+0
	All commands are called holding instance lock. Remove the outdated comment that says otherwise. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-24	devlink: remove duplicate port notification	Jiri Pirko	1	-3/+1
	The notification about created port is send from devl_port_register() function called from ops->port_new(). No need to send it again here, so remove the call and the helper function. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Simon Horman <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-24	Merge tag 'mlx5-fixes-2023-05-22' of ↵	David S. Miller	1	-1/+3
	git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-fixes-2023-05-22 This series provides bug fixes for the mlx5 driver. Please pull and let me know if there is any problem. ==================== Signed-off-by: David S. Miller <[email protected]>
2023-05-24	net: phy: avoid kernel warning dump when stopping an errored PHY	Russell King (Oracle)	1	-2/+5
	When taking a network interface down (or removing a SFP module) after the PHY has encountered an error, phy_stop() complains incorrectly that it was called from HALTED state. The reason this is incorrect is that the network driver will have called phy_start() when the interface was brought up, and the fact that the PHY has a problem bears no relationship to the administrative state of the interface. Taking the interface administratively down (which calls phy_stop()) is always the right thing to do after a successful phy_start() call, whether or not the PHY has encountered an error. Signed-off-by: Russell King (Oracle) <[email protected]> Acked-by: Florian Fainelli <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-24	raw: Stop using RTO_ONLINK.	Guillaume Nault	1	-5/+0
	Use ip_sendmsg_scope() to properly initialise the scope in flowi4_init_output(), instead of overriding tos with the RTO_ONLINK flag. The objective is to eventually remove RTO_ONLINK, which will allow converting .flowi4_tos to dscp_t. The MSG_DONTROUTE and SOCK_LOCALROUTE cases were already handled by raw_sendmsg() (SOCK_LOCALROUTE was handled by the RT_CONN_FLAGS*() macros called by get_rtconn_flags()). However, opt.is_strictroute wasn't taken into account. Therefore, a side effect of this patch is to now honour opt.is_strictroute, and thus align raw_sendmsg() with ping_v4_sendmsg() and udp_sendmsg(). Since raw_sendmsg() was the only user of get_rtconn_flags(), we can now remove this function. Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-24	ping: Stop using RTO_ONLINK.	Guillaume Nault	1	-0/+13
	Define a new helper to figure out the correct route scope to use on TX, depending on socket configuration, ancillary data and send flags. Use this new helper to properly initialise the scope in flowi4_init_output(), instead of overriding tos with the RTO_ONLINK flag. The objective is to eventually remove RTO_ONLINK, which will allow converting .flowi4_tos to dscp_t. Signed-off-by: Guillaume Nault <[email protected]> Reviewed-by: David Ahern <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2023-05-24	Merge branch 'topic/midi20' into for-next	Takashi Iwai	1	-1/+1
	Pull a fixup for build error on big-endian archs. Signed-off-by: Takashi Iwai <[email protected]>
2023-05-24	ALSA: ump: Correct snd_ump_midi1_msg_program definition	Stephen Rothwell	1	-1/+1
	The #endif is placed obviously at a wrong position, which caused a build error on the big endian machine. Fixes: 0b5288f5fe63 ("ALSA: ump: Add legacy raw MIDI support") Signed-off-by: Stephen Rothwell <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-05-24	dmaengine: dw-edma: Add support for native HDMA	Cai Huoqing	1	-1/+2
	Add support for HDMA NATIVE, as long the IP design has set the compatible register map parameter-HDMA_NATIVE, which allows compatibility for native HDMA register configuration. The HDMA Hyper-DMA IP is an enhancement of the eDMA embedded-DMA IP. And the native HDMA registers are different from eDMA, so this patch add support for HDMA NATIVE mode. HDMA write and read channels operate independently to maximize the performance of the HDMA read and write data transfer over the link When you configure the HDMA with multiple read channels, then it uses a round robin (RR) arbitration scheme to select the next read channel to be serviced.The same applies when you have multiple write channels. The native HDMA driver also supports a maximum of 16 independent channels (8 write + 8 read), which can run simultaneously. Both SAR (Source Address Register) and DAR (Destination Address Register) are aligned to byte. Signed-off-by: Cai Huoqing <[email protected]> Reviewed-by: Serge Semin <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]> Tested-by: Serge Semin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>
2023-05-24	dmaengine: dw-edma: Rename dw_edma_core_ops structure to dw_edma_plat_ops	Cai Huoqing	1	-2/+2
	The dw_edma_core_ops structure contains a set of the operations: device IRQ numbers getter, CPU/PCI address translation. Based on the functions semantics the structure name "dw_edma_plat_ops" looks more descriptive since indeed the operations are platform-specific. The "dw_edma_core_ops" name shall be used for a structure with the IP-core specific set of callbacks in order to abstract out DW eDMA and DW HDMA setups. Such structure will be added in one of the next commit in the framework of the set of changes adding the DW HDMA device support. Anyway the renaming was necessary to distinguish two types of the implementation callbacks: 1. DW eDMA/hDMA IP-core specific operations: device-specific CSR setups in one or another aspect of the DMA-engine initialization. 2. DW eDMA/hDMA platform specific operations: the DMA device environment configs like IRQs, address translation, etc. Signed-off-by: Cai Huoqing <[email protected]> Reviewed-by: Serge Semin <[email protected]> Reviewed-by: Manivannan Sadhasivam <[email protected]> Tested-by: Serge Semin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Vinod Koul <[email protected]>
2023-05-23	sysctl: Refactor base paths registrations	Joel Granados	1	-23/+0
	This is part of the general push to deprecate register_sysctl_paths and register_sysctl_table. The old way of doing this through register_sysctl_base and DECLARE_SYSCTL_BASE macro is replaced with a call to register_sysctl_init. The 5 base paths affected are: "kernel", "vm", "debug", "dev" and "fs". We remove the register_sysctl_base function and the DECLARE_SYSCTL_BASE macro since they are no longer needed. In order to quickly acertain that the paths did not actually change I executed `find /proc/sys/ \| sha1sum` and made sure that the sha was the same before and after the commit. We end up saving 563 bytes with this change: ./scripts/bloat-o-meter vmlinux.0.base vmlinux.1.refactor-base-paths add/remove: 0/5 grow/shrink: 2/0 up/down: 77/-640 (-563) Function old new delta sysctl_init_bases 55 111 +56 init_fs_sysctls 12 33 +21 vm_base_table 128 - -128 kernel_base_table 128 - -128 fs_base_table 128 - -128 dev_base_table 128 - -128 debug_base_table 128 - -128 Total: Before=21258215, After=21257652, chg -0.00% [mcgrof: modified to use register_sysctl_init() over register_sysctl() and add bloat-o-meter stats] Signed-off-by: Joel Granados <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]> Tested-by: Stephen Rothwell <[email protected]> Acked-by: Christian Brauner <[email protected]>
2023-05-23	sysctl: stop exporting register_sysctl_table	Joel Granados	1	-7/+1
	We make register_sysctl_table static because the only function calling it is in fs/proc/proc_sysctl.c (__register_sysctl_base). We remove it from the sysctl.h header and modify the documentation in both the header and proc_sysctl.c files to mention "register_sysctl" instead of "register_sysctl_table". This plus the commits that remove register_sysctl_table from parport save 217 bytes: ./scripts/bloat-o-meter .bsysctl/vmlinux.old .bsysctl/vmlinux.new add/remove: 0/1 grow/shrink: 5/1 up/down: 458/-675 (-217) Function old new delta __register_sysctl_base 8 286 +278 parport_proc_register 268 379 +111 parport_device_proc_register 195 247 +52 kzalloc.constprop 598 608 +10 parport_default_proc_register 62 69 +7 register_sysctl_table 291 - -291 parport_sysctl_template 1288 904 -384 Total: Before=8603076, After=8602859, chg -0.00% Signed-off-by: Joel Granados <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2023-05-23	parport: Move magic number "15" to a define	Joel Granados	1	-0/+2
	Put the size of a parport name behind a define so we can use it in other files. This is a preparation patch to be able to use this size in parport/procfs.c. Signed-off-by: Joel Granados <[email protected]> Reviewed-by: Luis Chamberlain <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2023-05-23	ip: Remove ip_append_page()	David Howells	1	-2/+0
	ip_append_page() is no longer used with the removal of udp_sendpage(), so remove it. Signed-off-by: David Howells <[email protected]> cc: Willem de Bruijn <[email protected]> cc: David Ahern <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-23	tcp: Fold do_tcp_sendpages() into tcp_sendpage_locked()	David Howells	1	-2/+0
	Fold do_tcp_sendpages() into its last remaining caller, tcp_sendpage_locked(). Signed-off-by: David Howells <[email protected]> cc: David Ahern <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-23	tls: Inline do_tcp_sendpages()	David Howells	1	-1/+1
	do_tcp_sendpages() is now just a small wrapper around tcp_sendmsg_locked(), so inline it, allowing do_tcp_sendpages() to be removed. This is part of replacing ->sendpage() with a call to sendmsg() with MSG_SPLICE_PAGES set. Signed-off-by: David Howells <[email protected]> cc: Boris Pismenny <[email protected]> cc: John Fastabend <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-23	net: Add a function to splice pages into an skbuff for MSG_SPLICE_PAGES	David Howells	1	-0/+3
	Add a function to handle MSG_SPLICE_PAGES being passed internally to sendmsg(). Pages are spliced into the given socket buffer if possible and copied in if not (e.g. they're slab pages or have a zero refcount). Signed-off-by: David Howells <[email protected]> cc: David Ahern <[email protected]> cc: Al Viro <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-23	net: Pass max frags into skb_append_pagefrags()	David Howells	1	-1/+1
	Pass the maximum number of fragments into skb_append_pagefrags() rather than using MAX_SKB_FRAGS so that it can be used from code that wants to specify sysctl_max_skb_frags. Signed-off-by: David Howells <[email protected]> cc: David Ahern <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-23	net: Declare MSG_SPLICE_PAGES internal sendmsg() flag	David Howells	1	-0/+3
	Declare MSG_SPLICE_PAGES, an internal sendmsg() flag, that hints to a network protocol that it should splice pages from the source iterator rather than copying the data if it can. This flag is added to a list that is cleared by sendmsg syscalls on entry. This is intended as a replacement for the ->sendpage() op, allowing a way to splice in several multipage folios in one go. Signed-off-by: David Howells <[email protected]> Reviewed-by: Willem de Bruijn <[email protected]> cc: Jens Axboe <[email protected]> cc: Matthew Wilcox <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-23	tracing: Rename stacktrace field to common_stacktrace	Steven Rostedt (Google)	1	-0/+1
	The histogram and synthetic events can use a pseudo event called "stacktrace" that will create a stacktrace at the time of the event and use it just like it was a normal field. We have other pseudo events such as "common_cpu" and "common_timestamp". To stay consistent with that, convert "stacktrace" to "common_stacktrace". As this was used in older kernels, to keep backward compatibility, this will act just like "common_cpu" did with "cpu". That is, "cpu" will be the same as "common_cpu" unless the event has a "cpu" field. In which case, the event's field is used. The same is true with "stacktrace". Also update the documentation to reflect this change. Link: https://lore.kernel.org/linux-trace-kernel/[email protected] Cc: Masami Hiramatsu <[email protected]> Cc: Tom Zanussi <[email protected]> Cc: Mark Rutland <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-05-23	page_pool: fix inconsistency for page_pool_ring_[un]lock()	Yunsheng Lin	1	-18/+0
	page_pool_ring_[un]lock() use in_softirq() to decide which spin lock variant to use, and when they are called in the context with in_softirq() being false, spin_lock_bh() is called in page_pool_ring_lock() while spin_unlock() is called in page_pool_ring_unlock(), because spin_lock_bh() has disabled the softirq in page_pool_ring_lock(), which causes inconsistency for spin lock pair calling. This patch fixes it by returning in_softirq state from page_pool_producer_lock(), and use it to decide which spin lock variant to use in page_pool_producer_unlock(). As pool->ring has both producer and consumer lock, so rename it to page_pool_producer_[un]lock() to reflect the actual usage. Also move them to page_pool.c as they are only used there, and remove the 'inline' as the compiler may have better idea to do inlining or not. Fixes: 7886244736a4 ("net: page_pool: Add bulk support for ptr_ring") Signed-off-by: Yunsheng Lin <[email protected]> Acked-by: Jesper Dangaard Brouer <[email protected]> Acked-by: Ilias Apalodimas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2023-05-23	tracing/user_events: Document user_event_mm one-shot list usage	Beau Belgrave	1	-0/+1
	During 6.4 development it became clear that the one-shot list used by the user_event_mm's next field was confusing to others. It is not clear how this list is protected or what the next field usage is for unless you are familiar with the code. Add comments into the user_event_mm struct indicating lock requirement and usage. Also document how and why this approach was used via comments in both user_event_enabler_update() and user_event_mm_get_all() and the rules to properly use it. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wicngggxVpbnrYHjRTwGE0WYscPRM+L2HO2BF8ia1EXgQ@mail.gmail.com/ Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-05-23	tracing/user_events: Rename link fields for clarity	Beau Belgrave	1	-1/+1
	Currently most list_head fields of various structs within user_events are simply named link. This causes folks to keep additional context in their head when working with the code, which can be confusing. Instead of using link, describe what the actual link is, for example: list_del_rcu(&mm->link); Changes into: list_del_rcu(&mm->mms_link); The reader now is given a hint the link is to the mms global list instead of having to remember or spot check within the code. Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wicngggxVpbnrYHjRTwGE0WYscPRM+L2HO2BF8ia1EXgQ@mail.gmail.com/ Suggested-by: Linus Torvalds <[email protected]> Signed-off-by: Beau Belgrave <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2023-05-23	regmap: Merge up v6.4-rc3	Mark Brown	24	-52/+90
	Merge up v6.4-rc3 to get fixes which make my CI more stable.
2023-05-23	vfio/pci: Clear VFIO_IRQ_INFO_NORESIZE for MSI-X	Reinette Chatre	1	-0/+3
	Dynamic MSI-X is supported. Clear VFIO_IRQ_INFO_NORESIZE to provide guidance to user space. Signed-off-by: Reinette Chatre <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/fd1ef2bf6ae972da8e2805bc95d5155af5a8fb0a.1683740667.git.reinette.chatre@intel.com Signed-off-by: Alex Williamson <[email protected]>
2023-05-23	vfio/pci: Probe and store ability to support dynamic MSI-X	Reinette Chatre	1	-0/+1
	Not all MSI-X devices support dynamic MSI-X allocation. Whether a device supports dynamic MSI-X should be queried using pci_msix_can_alloc_dyn(). Instead of scattering code with pci_msix_can_alloc_dyn(), probe this ability once and store it as a property of the virtual device. Suggested-by: Alex Williamson <[email protected]> Signed-off-by: Reinette Chatre <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/f1ae022c060ecb7e527f4f53c8ccafe80768da47.1683740667.git.reinette.chatre@intel.com Signed-off-by: Alex Williamson <[email protected]>
2023-05-23	vfio/pci: Use bitfield for struct vfio_pci_core_device flags	Reinette Chatre	1	-11/+11
	struct vfio_pci_core_device contains eleven boolean flags. Boolean flags clearly indicate their usage but space usage starts to be a concern when there are many. An upcoming change adds another boolean flag to struct vfio_pci_core_device, thereby increasing the concern that the boolean flags are consuming unnecessary space. Transition the boolean flags to use bitfields. On a system that uses one byte per boolean this reduces the space consumed by existing flags from 11 bytes to 2 bytes with room for a few more flags without increasing the structure's size. Suggested-by: Jason Gunthorpe <[email protected]> Signed-off-by: Reinette Chatre <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/cf34bf0499c889554a8105eeb18cc0ab673005be.1683740667.git.reinette.chatre@intel.com Signed-off-by: Alex Williamson <[email protected]>
2023-05-23	vfio/pci: Remove interrupt context counter	Reinette Chatre	1	-1/+0
	struct vfio_pci_core_device::num_ctx counts how many interrupt contexts have been allocated. When all interrupt contexts are allocated simultaneously num_ctx provides the upper bound of all vectors that can be used as indices into the interrupt context array. With the upcoming support for dynamic MSI-X the number of interrupt contexts does not necessarily span the range of allocated interrupts. Consequently, num_ctx is no longer a trusted upper bound for valid indices. Stop using num_ctx to determine if a provided vector is valid. Use the existence of allocated interrupt. This changes behavior on the error path when user space provides an invalid vector range. Behavior changes from early exit without any modifications to possible modifications to valid vectors within the invalid range. This is acceptable considering that an invalid range is not a valid scenario, see link to discussion. The checks that ensure that user space provides a range of vectors that is valid for the device are untouched. Signed-off-by: Reinette Chatre <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Reviewed-by: Kevin Tian <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/e27d350f02a65b8cbacd409b4321f5ce35b3186d.1683740667.git.reinette.chatre@intel.com Signed-off-by: Alex Williamson <[email protected]>
2023-05-23	vfio/pci: Use xarray for interrupt context storage	Reinette Chatre	1	-1/+1
	Interrupt context is statically allocated at the time interrupts are allocated. Following allocation, the context is managed by directly accessing the elements of the array using the vector as index. The storage is released when interrupts are disabled. It is possible to dynamically allocate a single MSI-X interrupt after MSI-X is enabled. A dynamic storage for interrupt context is needed to support this. Replace the interrupt context array with an xarray (similar to what the core uses as store for MSI descriptors) that can support the dynamic expansion while maintaining the custom that uses the vector as index. With a dynamic storage it is no longer required to pre-allocate interrupt contexts at the time the interrupts are allocated. MSI and MSI-X interrupt contexts are only used when interrupts are enabled. Their allocation can thus be delayed until interrupt enabling. Only enabled interrupts will have associated interrupt contexts. Whether an interrupt has been allocated (a Linux irq number exists for it) becomes the criteria for whether an interrupt can be enabled. Signed-off-by: Reinette Chatre <[email protected]> Link: https://lore.kernel.org/lkml/[email protected]/ Reviewed-by: Kevin Tian <[email protected]> Acked-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/40e235f38d427aff79ae35eda0ced42502aa0937.1683740667.git.reinette.chatre@intel.com Signed-off-by: Alex Williamson <[email protected]>
2023-05-23	bpf: Support O_PATH FDs in BPF_OBJ_PIN and BPF_OBJ_GET commands	Andrii Nakryiko	2	-2/+12
	Current UAPI of BPF_OBJ_PIN and BPF_OBJ_GET commands of bpf() syscall forces users to specify pinning location as a string-based absolute or relative (to current working directory) path. This has various implications related to security (e.g., symlink-based attacks), forces BPF FS to be exposed in the file system, which can cause races with other applications. One of the feedbacks we got from folks working with containers heavily was that inability to use purely FD-based location specification was an unfortunate limitation and hindrance for BPF_OBJ_PIN and BPF_OBJ_GET commands. This patch closes this oversight, adding path_fd field to BPF_OBJ_PIN and BPF_OBJ_GET UAPI, following conventions established by *at() syscalls for dirfd + pathname combinations. This now allows interesting possibilities like working with detached BPF FS mount (e.g., to perform multiple pinnings without running a risk of someone interfering with them), and generally making pinning/getting more secure and not prone to any races and/or security attacks. This is demonstrated by a selftest added in subsequent patch that takes advantage of new mount APIs (fsopen, fsconfig, fsmount) to demonstrate creating detached BPF FS mount, pinning, and then getting BPF map out of it, all while never exposing this private instance of BPF FS to outside worlds. Signed-off-by: Andrii Nakryiko <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Christian Brauner <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-05-23	iio: trigger: Add simple trigger_validation helper	Matti Vaittinen	1	-0/+1
	Some triggers can only be attached to the IIO device that corresponds to the same physical device. Implement generic helper which can be used as a validate_trigger callback for such devices. Suggested-by: Jonathan Cameron <[email protected]> Signed-off-by: Matti Vaittinen <[email protected]> Link: https://lore.kernel.org/r/51cd3e3e74a6addf8d333f4a109fb9c5a11086ee.1683541225.git.mazziesaccount@gmail.com Signed-off-by: Jonathan Cameron <[email protected]>
2023-05-23	rcuwait: Support timeouts	Davidlohr Bueso	1	-3/+20
	The rcuwait utility provides an efficient and safe single wait/wake mechanism. It is used in situations where queued wait is the wrong semantics, and often too bulky. For example, cases where the wait is already done under a lock. In the past, rcuwait has been extended to support beyond only uninterruptible sleep, and similarly, there are users that can benefit for the addition of timeouts. As such, tntroduce rcuwait_wait_event_timeout(), with semantics equivalent to calls for queued wait counterparts. Acked-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Davidlohr Bueso <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dan Williams <[email protected]>
2023-05-23	regulator: expose regulator_find_closest_bigger	Sebastian Reichel	1	-0/+2
	Expose and document the table lookup logic used by regulator_set_ramp_delay_regmap, so that it can be reused for devices that cannot be configured via regulator_set_ramp_delay_regmap. Tested-by: Diederik de Haas <[email protected]> # Rock64, Quartz64 Model A + B Tested-by: Vincent Legoll <[email protected]> # Pine64 QuartzPro64 Signed-off-by: Sebastian Reichel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-05-23	ASoC: da7219: Add Jack insertion detection polarity	David Rau	1	-0/+6
	Add support of selecting insertion detection polarity - Default polarity (Low) - Inverted polarity (High) Correct the keywords of parsing `dlg,jack-det-rate` bases on the new DT binding. Signed-off-by: David Rau <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2023-05-23	block/rq_qos: protect rq_qos apis with a new lock	Yu Kuai	1	-0/+1
	commit 50e34d78815e ("block: disable the elevator int del_gendisk") move rq_qos_exit() from disk_release() to del_gendisk(), this will introduce some problems: 1) If rq_qos_add() is triggered by enabling iocost/iolatency through cgroupfs, then it can concurrent with del_gendisk(), it's not safe to write 'q->rq_qos' concurrently. 2) Activate cgroup policy that is relied on rq_qos will call rq_qos_add() and blkcg_activate_policy(), and if rq_qos_exit() is called in the middle, null-ptr-dereference will be triggered in blkcg_activate_policy(). 3) blkg_conf_open_bdev() can call blkdev_get_no_open() first to find the disk, then if rq_qos_exit() from del_gendisk() is done before rq_qos_add(), then memory will be leaked. This patch add a new disk level mutex 'rq_qos_mutex': 1) The lock will protect rq_qos_exit() directly. 2) For wbt that doesn't relied on blk-cgroup, rq_qos_add() can only be called from disk initialization for now because wbt can't be destructed until rq_qos_exit(), so it's safe not to protect wbt for now. Hoever, in case that rq_qos dynamically destruction is supported in the furture, this patch also protect rq_qos_add() from wbt_init() directly, this is enough because blk-sysfs already synchronize writers with disk removal. 3) For iocost and iolatency, in order to synchronize disk removal and cgroup configuration, the lock is held after blkdev_get_no_open() from blkg_conf_open_bdev(), and is released in blkg_conf_exit(). In order to fix the above memory leak, disk_live() is checked after holding the new lock. Fixes: 50e34d78815e ("block: disable the elevator int del_gendisk") Signed-off-by: Yu Kuai <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-23	block: remove redundant req_op in blk_rq_is_passthrough	Li Nan	1	-1/+1
	op &= REQ_OP_MASK in blk_op_is_passthrough() is exactly what req_op() do. Therefore, it is redundant to call req_op() for blk_op_is_passthrough(). Signed-off-by: Li Nan <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2023-05-23	bpf, sockmap: Incorrectly handling copied_seq	John Fastabend	1	-0/+10
	The read_skb() logic is incrementing the tcp->copied_seq which is used for among other things calculating how many outstanding bytes can be read by the application. This results in application errors, if the application does an ioctl(FIONREAD) we return zero because this is calculated from the copied_seq value. To fix this we move tcp->copied_seq accounting into the recv handler so that we update these when the recvmsg() hook is called and data is in fact copied into user buffers. This gives an accurate FIONREAD value as expected and improves ACK handling. Before we were calling the tcp_rcv_space_adjust() which would update 'number of bytes copied to user in last RTT' which is wrong for programs returning SK_PASS. The bytes are only copied to the user when recvmsg is handled. Doing the fix for recvmsg is straightforward, but fixing redirect and SK_DROP pkts is a bit tricker. Build a tcp_psock_eat() helper and then call this from skmsg handlers. This fixes another issue where a broken socket with a BPF program doing a resubmit could hang the receiver. This happened because although read_skb() consumed the skb through sock_drop() it did not update the copied_seq. Now if a single reccv socket is redirecting to many sockets (for example for lb) the receiver sk will be hung even though we might expect it to continue. The hang comes from not updating the copied_seq numbers and memory pressure resulting from that. We have a slight layer problem of calling tcp_eat_skb even if its not a TCP socket. To fix we could refactor and create per type receiver handlers. I decided this is more work than we want in the fix and we already have some small tweaks depending on caller that use the helper skb_bpf_strparser(). So we extend that a bit and always set the strparser bit when it is in use and then we can gate the seq_copied updates on this. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-05-23	bpf, sockmap: Improved check for empty queue	John Fastabend	1	-1/+0
	We noticed some rare sk_buffs were stepping past the queue when system was under memory pressure. The general theory is to skip enqueueing sk_buffs when its not necessary which is the normal case with a system that is properly provisioned for the task, no memory pressure and enough cpu assigned. But, if we can't allocate memory due to an ENOMEM error when enqueueing the sk_buff into the sockmap receive queue we push it onto a delayed workqueue to retry later. When a new sk_buff is received we then check if that queue is empty. However, there is a problem with simply checking the queue length. When a sk_buff is being processed from the ingress queue but not yet on the sockmap msg receive queue its possible to also recv a sk_buff through normal path. It will check the ingress queue which is zero and then skip ahead of the pkt being processed. Previously we used sock lock from both contexts which made the problem harder to hit, but not impossible. To fix instead of popping the skb from the queue entirely we peek the skb from the queue and do the copy there. This ensures checks to the queue length are non-zero while skb is being processed. Then finally when the entire skb has been copied to user space queue or another socket we pop it off the queue. This way the queue length check allows bypassing the queue only after the list has been completely processed. To reproduce issue we run NGINX compliance test with sockmap running and observe some flakes in our testing that we attributed to this issue. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Suggested-by: Jakub Sitnicki <[email protected]> Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: William Findlay <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-05-23	bpf, sockmap: Convert schedule_work into delayed_work	John Fastabend	1	-1/+1
	Sk_buffs are fed into sockmap verdict programs either from a strparser (when the user might want to decide how framing of skb is done by attaching another parser program) or directly through tcp_read_sock. The tcp_read_sock is the preferred method for performance when the BPF logic is a stream parser. The flow for Cilium's common use case with a stream parser is, tcp_read_sock() sk_psock_verdict_recv ret = bpf_prog_run_pin_on_cpu() sk_psock_verdict_apply(sock, skb, ret) // if system is under memory pressure or app is slow we may // need to queue skb. Do this queuing through ingress_skb and // then kick timer to wake up handler skb_queue_tail(ingress_skb, skb) schedule_work(work); The work queue is wired up to sk_psock_backlog(). This will then walk the ingress_skb skb list that holds our sk_buffs that could not be handled, but should be OK to run at some later point. However, its possible that the workqueue doing this work still hits an error when sending the skb. When this happens the skbuff is requeued on a temporary 'state' struct kept with the workqueue. This is necessary because its possible to partially send an skbuff before hitting an error and we need to know how and where to restart when the workqueue runs next. Now for the trouble, we don't rekick the workqueue. This can cause a stall where the skbuff we just cached on the state variable might never be sent. This happens when its the last packet in a flow and no further packets come along that would cause the system to kick the workqueue from that side. To fix we could do simple schedule_work(), but while under memory pressure it makes sense to back off some instead of continue to retry repeatedly. So instead to fix convert schedule_work to schedule_delayed_work and add backoff logic to reschedule from backlog queue on errors. Its not obvious though what a good backoff is so use '1'. To test we observed some flakes whil running NGINX compliance test with sockmap we attributed these failed test to this bug and subsequent issue. >From on list discussion. This commit bec217197b41("skmsg: Schedule psock work if the cached skb exists on the psock") was intended to address similar race, but had a couple cases it missed. Most obvious it only accounted for receiving traffic on the local socket so if redirecting into another socket we could still get an sk_buff stuck here. Next it missed the case where copied=0 in the recv() handler and then we wouldn't kick the scheduler. Also its sub-optimal to require userspace to kick the internal mechanisms of sockmap to wake it up and copy data to user. It results in an extra syscall and requires the app to actual handle the EAGAIN correctly. Fixes: 04919bed948dc ("tcp: Introduce tcp_read_skb()") Signed-off-by: John Fastabend <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: William Findlay <[email protected]> Reviewed-by: Jakub Sitnicki <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2023-05-23	ipv{4,6}/raw: fix output xfrm lookup wrt protocol	Nicolas Dichtel	2	-0/+3
	With a raw socket bound to IPPROTO_RAW (ie with hdrincl enabled), the protocol field of the flow structure, build by raw_sendmsg() / rawv6_sendmsg()), is set to IPPROTO_RAW. This breaks the ipsec policy lookup when some policies are defined with a protocol in the selector. For ipv6, the sin6_port field from 'struct sockaddr_in6' could be used to specify the protocol. Just accept all values for IPPROTO_RAW socket. For ipv4, the sin_port field of 'struct sockaddr_in' could not be used without breaking backward compatibility (the value of this field was never checked). Let's add a new kind of control message, so that the userland could specify which protocol is used. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") CC: [email protected] Signed-off-by: Nicolas Dichtel <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
2023-05-23	Merge branch 'topic/midi20' into for-next	Takashi Iwai	17	-34/+995
	This is a (largish) patch set for adding the support of MIDI 2.0 functionality, mainly targeted for USB devices. MIDI 2.0 is a complete overhaul of the 40-years old MIDI 1.0. Unlike MIDI 1.0 byte stream, MIDI 2.0 uses packets in 32bit words for Universal MIDI Packet (UMP) protocol. It supports both MIDI 1.0 commands for compatibility and the extended MIDI 2.0 commands for higher resolutions and more functions. For supporting the UMP, the patch set extends the existing ALSA rawmidi and sequencer interfaces, and adds the USB MIDI 2.0 support to the standard USB-audio driver. The rawmidi for UMP has a different device name (/dev/snd/umpCD) and it reads/writes UMP packet data in 32bit CPU-native endianness. For the old MIDI 1.0 applications, the legacy rawmidi interface is provided, too. As default, USB-audio driver will take the alternate setting for MIDI 2.0 interface, and the compatibility with MIDI 1.0 is provided via the rawmidi common layer. However, user may let the driver falling back to the old MIDI 1.0 interface by a module option, too. A UMP-capable rawmidi device can create the corresponding ALSA sequencer client(s) to support the UMP Endpoint and UMP Group connections. As a nature of ALSA sequencer, arbitrary connections between clients/ports are allowed, and the ALSA sequencer core performs the automatic conversions for the connections between a new UMP sequencer client and a legacy MIDI 1.0 sequencer client. It allows the existing application to use MIDI 2.0 devices without changes. The MIDI-CI, which is another major extension in MIDI 2.0, isn't covered by this patch set. It would be implemented rather in user-space. Roughly speaking, the first half of this patch set is for extending the rawmidi and USB-audio, and the second half is for extending the ALSA sequencer interface. The patch set is based on 6.4-rc2 kernel, but all patches can be cleanly applicable on 6.2 and 6.3 kernels, too (while 6.1 and older kernels would need minor adjustment for uapi header changes). The updates for alsa-lib and alsa-utils will follow shortly later. The author thanks members of MIDI Association OS/API Working Group, especially Andrew Mee, for great helps for the initial design and debugging / testing the drivers. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-05-23	ALSA: seq: Add UMP group filter	Takashi Iwai	1	-1/+2
	Add a new filter bitmap for UMP groups for reducing the unnecessary read/write when the client is connected to UMP EP seq port. The new group_filter field contains the bitmap for the groups, i.e. when the bit is set, the corresponding group is filtered out and the messages to that group won't be delivered. The filter bitmap consists of each bit of 1-based UMP Group number. The bit 0 is reserved for the future use. Reviewed-by: Jaroslav Kysela <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-05-23	ALSA: seq: Add ioctls for client UMP info query and setup	Takashi Iwai	1	-0/+14
	Add new ioctls for sequencer clients to query and set the UMP endpoint and block information. As a sequencer client corresponds to a UMP Endpoint, one UMP Endpoint information can be assigned at most to a single sequencer client while multiple UMP block infos can be assigned by passing the type with the offset of block id (i.e. type = block_id + 1). For the kernel client, only SNDRV_SEQ_IOCTL_GET_CLIENT_UMP_INFO is allowed. Reviewed-by: Jaroslav Kysela <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-05-23	ALSA: seq: Bind UMP device	Takashi Iwai	3	-1/+16
	This patch introduces a new ALSA sequencer client for the kernel UMP object, snd-seq-ump-client. It's a UMP version of snd-seq-midi driver, while this driver creates a sequencer client per UMP endpoint which contains (fixed) 16 ports. The UMP rawmidi device is opened in APPEND mode for output, so that multiple sequencer clients can share the same UMP endpoint, as well as the legacy UMP rawmidi devices that are opened in APPEND mode, too. For input, on the other hand, the incoming data is processed on the fly in the dedicated hook, hence it doesn't open a rawmidi device. The UMP packet group is updated upon delivery depending on the target sequencer port (which corresponds to the actual UMP group). Each sequencer port sets a new port type bit, SNDRV_SEQ_PORT_TYPE_MIDI_UMP, in addition to the other standard types for MIDI. Reviewed-by: Jaroslav Kysela <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>
2023-05-23	ALSA: seq: Allow suppressing UMP conversions	Takashi Iwai	1	-0/+1
	A sequencer client like seq_dummy rather doesn't want to convert UMP events but receives / sends as is. Add a new event filter flag to suppress the automatic UMP conversion and applies accordingly. Reviewed-by: Jaroslav Kysela <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Takashi Iwai <[email protected]>