blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2022-11-17	random: use rejection sampling for uniform bounded random integers	Jason A. Donenfeld	2	-16/+42
	Until the very recent commits, many bounded random integers were calculated using `get_random_u32() % max_plus_one`, which not only incurs the price of a division -- indicating performance mostly was not a real issue -- but also does not result in a uniformly distributed output if max_plus_one is not a power of two. Recent commits moved to using `prandom_u32_max(max_plus_one)`, which replaces the division with a faster multiplication, but still does not solve the issue with non-uniform output. For some users, maybe this isn't a problem, and for others, maybe it is, but for the majority of users, probably the question has never been posed and analyzed, and nobody thought much about it, probably assuming random is random is random. In other words, the unthinking expectation of most users is likely that the resultant numbers are uniform. So we implement here an efficient way of generating uniform bounded random integers. Through use of compile-time evaluation, and avoiding divisions as much as possible, this commit introduces no measurable overhead. At least for hot-path uses tested, any potential difference was lost in the noise. On both clang and gcc, code generation is pretty small. The new function, get_random_u32_below(), lives in random.h, rather than prandom.h, and has a "get_random_xxx" function name, because it is suitable for all uses, including cryptography. In order to be efficient, we implement a kernel-specific variant of Daniel Lemire's algorithm from "Fast Random Integer Generation in an Interval", linked below. The kernel's variant takes advantage of constant folding to avoid divisions entirely in the vast majority of cases, works on both 32-bit and 64-bit architectures, and requests a minimal amount of bytes from the RNG. Link: https://arxiv.org/pdf/1805.10941.pdf Cc: [email protected] # to ease future backports that use this api Signed-off-by: Jason A. Donenfeld <[email protected]>
2022-11-17	bpf: Pass map file to .map_update_batch directly	Hou Tao	1	-2/+3
	Currently bpf_map_do_batch() first invokes fdget(batch.map_fd) to get the target map file, then it invokes generic_map_update_batch() to do batch update. generic_map_update_batch() will get the target map file by using fdget(batch.map_fd) again and pass it to bpf_map_update_value(). The problem is map file returned by the second fdget() may be NULL or a totally different file compared by map file in bpf_map_do_batch(). The reason is that the first fdget() only guarantees the liveness of struct file instead of file descriptor and the file description may be released by concurrent close() through pick_file(). It doesn't incur any problem as for now, because maps with batch update support don't use map file in .map_fd_get_ptr() ops. But it is better to fix the potential access of an invalid map file. Using __bpf_map_get() again in generic_map_update_batch() can not fix the problem, because batch.map_fd may be closed and reopened, and the returned map file may be different with map file got in bpf_map_do_batch(), so just passing the map file directly to .map_update_batch() in bpf_map_do_batch(). Signed-off-by: Hou Tao <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Acked-by: Yonghong Song <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-11-17	fbdev: Make fb_modesetting_disabled() static inline	Thomas Zimmermann	1	-1/+1
	Make fb_modesetting_disabled() a static-inline function when it is defined in the header file. Avoid the linker error shown below. ld: drivers/video/fbdev/core/fbmon.o: in function `fb_modesetting_disabled': fbmon.c:(.text+0x1e4): multiple definition of `fb_modesetting_disabled'; drivers/video/fbdev/core/fbmem.o:fbmem.c:(.text+0x1bac): first defined here A bug report is at [1]. Reported-by: Stephen Rothwell <[email protected]> Reported-by: kernel test robot <[email protected]> Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Javier Martinez Canillas <[email protected]> Fixes: 0ba2fa8cbd29 ("fbdev: Add support for the nomodeset kernel parameter") Cc: Javier Martinez Canillas <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Helge Deller <[email protected]> Cc: [email protected] Cc: [email protected] Link: https://lore.kernel.org/dri-devel/[email protected]/T/#u # 1 Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-11-17	KVM: Obey kvm.halt_poll_ns in VMs not using KVM_CAP_HALT_POLL	David Matlack	1	-0/+1
	Obey kvm.halt_poll_ns in VMs not using KVM_CAP_HALT_POLL on every halt, rather than just sampling the module parameter when the VM is first created. This restore the original behavior of kvm.halt_poll_ns for VMs that have not opted into KVM_CAP_HALT_POLL. Notably, this change restores the ability for admins to disable or change the maximum halt-polling time system wide for VMs not using KVM_CAP_HALT_POLL. Reported-by: Christian Borntraeger <[email protected]> Fixes: acd05785e48c ("kvm: add capability for halt polling") Signed-off-by: David Matlack <[email protected]> Message-Id: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2022-11-17	genirq/msi: Remove msi_domain_ops:: Msi_check()	Thomas Gleixner	1	-4/+0
	No more users. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-17	PCI/MSI: Move pci_alloc_irq_vectors() to api.c	Ahmed S. Darwish	1	-4/+11
	To disentangle the maze in msi.c, all exported device-driver MSI APIs are now to be grouped in one file, api.c. Make pci_alloc_irq_vectors() a real function instead of wrapper and add proper kernel doc to it. Signed-off-by: Ahmed S. Darwish <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-17	genirq: Get rid of GENERIC_MSI_IRQ_DOMAIN	Thomas Gleixner	3	-11/+7
	Adjust to reality and remove another layer of pointless Kconfig indirection. CONFIG_GENERIC_MSI_IRQ is good enough to serve all purposes. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-17	PCI/MSI: Get rid of PCI_MSI_IRQ_DOMAIN	Thomas Gleixner	1	-18/+11
	What a zoo: PCI_MSI select GENERIC_MSI_IRQ PCI_MSI_IRQ_DOMAIN def_bool y depends on PCI_MSI select GENERIC_MSI_IRQ_DOMAIN Ergo PCI_MSI enables PCI_MSI_IRQ_DOMAIN which in turn selects GENERIC_MSI_IRQ_DOMAIN. So all the dependencies on PCI_MSI_IRQ_DOMAIN are just an indirection to PCI_MSI. Match the reality and just admit that PCI_MSI requires GENERIC_MSI_IRQ_DOMAIN. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Acked-by: Bjorn Helgaas <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-17	genirq/msi: Add bus token to struct msi_domain_info	Ahmed S. Darwish	1	-8/+11
	Add a bus token member to struct msi_domain_info and let msi_create_irq_domain() set the bus token. That allows to remove the bus token updates at the call sites. Suggested-by: Thomas Gleixner <[email protected]> Signed-off-by: Ahmed S. Darwish <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-17	genirq/irqdomain: Move bus token enum into a seperate header	Thomas Gleixner	2	-21/+27
	Split the bus token defines out into a seperate header file to avoid inclusion of irqdomain.h in msi.h. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Ashok Raj <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-17	genirq/msi: Make __msi_domain_free_irqs() static	Thomas Gleixner	1	-4/+0
	Now that the last user is gone, confine it to the core code. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Ashok Raj <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-17	genirq/msi: Provide msi_domain_ops:: Post_free()	Thomas Gleixner	1	-0/+4
	To prepare for removing the exposure of __msi_domain_free_irqs() provide a post_free() callback in the MSI domain ops which can be used to solve the problem of the only user of __msi_domain_free_irqs() in arch/powerpc. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Ashok Raj <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-17	genirq/msi: Make __msi_domain_alloc_irqs() static	Thomas Gleixner	1	-5/+2
	Nothing outside of the core code requires this. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Ashok Raj <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-17	genirq/msi: Remove filter from msi_free_descs_free_range()	Thomas Gleixner	1	-3/+2
	When a range of descriptors is freed then all of them are not associated to a linux interrupt. Remove the filter and add a warning to the free function. Signed-off-by: Thomas Gleixner <[email protected]> Reviewed-by: Ashok Raj <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-17	timerqueue: Use rb_entry_safe() in timerqueue_getnext()	Barnabás Pőcze	1	-1/+1
	When `timerqueue_getnext()` is called on an empty timer queue, it will use `rb_entry()` on a NULL pointer, which is invalid. Fix that by using `rb_entry_safe()` which handles NULL pointers. This has not caused any issues so far because the offset of the `rb_node` member in `timerqueue_node` is 0, so `rb_entry()` is essentially a no-op. Fixes: 511885d7061e ("lib/timerqueue: Rely on rbtree semantics for next timer") Signed-off-by: Barnabás Pőcze <[email protected]> Signed-off-by: Thomas Gleixner <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-17	pinconf-generic: clarify pull up and pull down config values	Niyas Sait	1	-2/+4
	PIN_CONFIG_BIAS_PULL_DOWN and PIN_CONFIG_BIAS_PULL_UP values can be custom or an SI unit such as ohms Signed-off-by: Niyas Sait <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Linus Walleij <[email protected]>
2022-11-16	tracing: Fix warning on variable 'struct trace_array'	Aashish Sharma	1	-2/+2
	Move the declaration of 'struct trace_array' out of #ifdef CONFIG_TRACING block, to fix the following warning when CONFIG_TRACING is not set: >> include/linux/trace.h:63:45: warning: 'struct trace_array' declared inside parameter list will not be visible outside of this definition or declaration Link: https://lkml.kernel.org/r/[email protected] Fixes: 1a77dd1c2bb5 ("scsi: tracing: Fix compile error in trace_array calls when TRACING is disabled") Cc: "Martin K. Petersen" <[email protected]> Cc: Arun Easi <[email protected]> Acked-by: Masami Hiramatsu (Google) <[email protected]> Reviewed-by: Guenter Roeck <[email protected]> Signed-off-by: Aashish Sharma <[email protected]> Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-11-16	blk-cgroup: Flush stats at blkgs destruction path	Waiman Long	1	-0/+1
	As noted by Michal, the blkg_iostat_set's in the lockless list hold reference to blkg's to protect against their removal. Those blkg's hold reference to blkcg. When a cgroup is being destroyed, cgroup_rstat_flush() is only called at css_release_work_fn() which is called when the blkcg reference count reaches 0. This circular dependency will prevent blkcg from being freed until some other events cause cgroup_rstat_flush() to be called to flush out the pending blkcg stats. To prevent this delayed blkcg removal, add a new cgroup_rstat_css_flush() function to flush stats for a given css and cpu and call it at the blkgs destruction path, blkcg_destroy_blkgs(), whenever there are still some pending stats to be flushed. This will ensure that blkcg reference count can reach 0 ASAP. Signed-off-by: Waiman Long <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-16	fixp-arith: do not require users to include bug.h	Matti Vaittinen	1	-0/+1
	The fixp_sin32_rad() contains a call to BUG_ON(). If users of fixp-arith.h have not included the bug.h prior including the fixp-arith.h the compiler will not find the BUG_ON. Thus every user of fixp-arith.h must also include bug.h (or roll own variant of BUG_ON()). Include bug.h from fixp-arith.h so every user of fixp-arith.h does not need to include the bug.h prior inclusion of fixp-arith.h Fixes: 559addc25b00 ("[media] fixp-arith: replace sin/cos table by a better precision one") Signed-off-by: Matti Vaittinen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Dmitry Torokhov <[email protected]>
2022-11-16	block: make blk_set_default_limits() private	Keith Busch	1	-1/+0
	There are no external users of this function. Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-16	block: make dma_alignment a stacking queue_limit	Keith Busch	1	-7/+8
	Device mappers had always been getting the default 511 dma mask, but the underlying device might have a larger alignment requirement. Since this value is used to determine alloweable direct-io alignment, this needs to be a stackable limit. Signed-off-by: Keith Busch <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-16	block: remove delayed holder registration	Christoph Hellwig	1	-5/+0
	Now that dm has been fixed to track of holder registrations before add_disk, the somewhat buggy block layer code can be safely removed. Signed-off-by: Christoph Hellwig <[email protected]> Signed-off-by: Yu Kuai <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-16	tracing/ring-buffer: Have polling block on watermark	Steven Rostedt (Google)	1	-1/+1
	Currently the way polling works on the ring buffer is broken. It will return immediately if there's any data in the ring buffer whereas a read will block until the watermark (defined by the tracefs buffer_percent file) is hit. That is, a select() or poll() will return as if there's data available, but then the following read will block. This is broken for the way select()s and poll()s are supposed to work. Have the polling on the ring buffer also block the same way reads and splice does on the ring buffer. Link: https://lkml.kernel.org/r/[email protected] Cc: Linux Trace Kernel <[email protected]> Cc: Masami Hiramatsu <[email protected]> Cc: Mathieu Desnoyers <[email protected]> Cc: Primiano Tucci <[email protected]> Cc: [email protected] Fixes: 1e0d6714aceb7 ("ring-buffer: Do not wake up a splice waiter when page is not full") Signed-off-by: Steven Rostedt (Google) <[email protected]>
2022-11-16	HID: bpf: return non NULL data pointer when CONFIG_HID_BPF is not set	Benjamin Tissoires	1	-1/+1
	dispatch_hid_bpf_device_event() is supposed to return either an error, or a valid pointer to memory containing the data. Returning NULL simply makes a segfault when CONFIG_HID_BPF is not set for any processed event. Fixes: 658ee5a64fcfbbf ("HID: bpf: allocate data memory for device_event BPF program") Signed-off-by: Benjamin Tissoires <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2022-11-16	wait: Return number of exclusive waiters awaken	Gabriel Krisman Bertazi	1	-1/+1
	Sbitmap code will need to know how many waiters were actually woken for its batched wakeups implementation. Return the number of woken exclusive waiters from __wake_up() to facilitate that. Suggested-by: Jan Kara <[email protected]> Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Reviewed-by: Jan Kara <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-16	mempool: introduce mempool_is_saturated	Pavel Begunkov	1	-0/+5
	Introduce a helper mempool_is_saturated(), which tells if the mempool is under-filled or not. We need it to figure out whether it should be freed right into the mempool or could be cached with top level caches. Signed-off-by: Pavel Begunkov <[email protected]> Link: https://lore.kernel.org/r/636aed30be8c35d78f45e244998bc6209283cccc.1667384020.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <[email protected]>
2022-11-16	net: add atomic_long_t to net_device_stats fields	Eric Dumazet	1	-23/+35
	Long standing KCSAN issues are caused by data-race around some dev->stats changes. Most performance critical paths already use per-cpu variables, or per-queue ones. It is reasonable (and more correct) to use atomic operations for the slow paths. This patch adds an union for each field of net_device_stats, so that we can convert paths that are not yet protected by a spinlock or a mutex. netdev_stats_to_stats64() no longer has an #if BITS_PER_LONG==64 Note that the memcpy() we were using on 64bit arches had no provision to avoid load-tearing, while atomic_long_read() is providing the needed protection at no cost. Signed-off-by: Eric Dumazet <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-11-16	fbdev: Add support for the nomodeset kernel parameter	Thomas Zimmermann	1	-0/+9
	Support the kernel's nomodeset parameter for all PCI-based fbdev drivers that use aperture helpers to remove other, hardware-agnostic graphics drivers. The parameter is a simple way of using the firmware-provided scanout buffer if the hardware's native driver is broken. The same effect could be achieved with per-driver options, but the importance of the graphics output for many users makes a single, unified approach worthwhile. With nomodeset specified, the fbdev driver module will not load. This unifies behavior with similar DRM drivers. In DRM helpers, modules first check the nomodeset parameter before registering the PCI driver. As fbdev has no such module helpers, we have to modify each driver individually. The name 'nomodeset' is slightly misleading, but has been chosen for historical reasons. Several drivers implemented it before it became a general option for DRM. So keeping the existing name was preferred over introducing a new one. v2: * print a warning if a driver does not init (Helge) * wrap video_firmware_drivers_only() in helper Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Javier Martinez Canillas <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-11-16	drm/fb-helper: Schedule deferred-I/O worker after writing to framebuffer	Thomas Zimmermann	1	-0/+1
	Schedule the deferred-I/O worker instead of the damage worker after writing to the fbdev framebuffer. The deferred-I/O worker then performs the dirty-fb update. The fbdev emulation will initialize deferred I/O for all drivers that require damage updates. It is therefore a valid assumption that the deferred-I/O worker is present. It would be possible to perform the damage handling directly from within the write operation. But doing this could increase the overhead of the write or interfere with a concurrently scheduled deferred-I/O worker. Instead, scheduling the deferred-I/O worker with its regular delay of 50 ms removes load off the write operation and allows the deferred-I/O worker to handle multiple write operations that arrived during the delay time window. v3: * remove unused variable (lkp) v2: * keep drm_fb_helper_damage() (Daniel) * use fb_deferred_io_schedule_flush() (Daniel) * clarify comments (Daniel) Signed-off-by: Thomas Zimmermann <[email protected]> Reviewed-by: Daniel Vetter <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-11-16	udp: Introduce optional per-netns hash table.	Kuniyuki Iwashima	1	-0/+2
	The maximum hash table size is 64K due to the nature of the protocol. [0] It's smaller than TCP, and fewer sockets can cause a performance drop. On an EC2 c5.24xlarge instance (192 GiB memory), after running iperf3 in different netns, creating 32Mi sockets without data transfer in the root netns causes regression for the iperf3's connection. uhash_entries sockets length Gbps 64K 1 1 5.69 1Mi 16 5.27 2Mi 32 4.90 4Mi 64 4.09 8Mi 128 2.96 16Mi 256 2.06 32Mi 512 1.12 The per-netns hash table breaks the lengthy lists into shorter ones. It is useful on a multi-tenant system with thousands of netns. With smaller hash tables, we can look up sockets faster, isolate noisy neighbours, and reduce lock contention. The max size of the per-netns table is 64K as well. This is because the possible hash range by udp_hashfn() always fits in 64K within the same netns and we cannot make full use of the whole buckets larger than 64K. /* 0 < num < 64K -> X < hash < X + 64K */ (num + net_hash_mix(net)) & mask; Also, the min size is 128. We use a bitmap to search for an available port in udp_lib_get_port(). To keep the bitmap on the stack and not fire the CONFIG_FRAME_WARN error at build time, we round up the table size to 128. The sysctl usage is the same with TCP: $ dmesg \| cut -d ' ' -f 6- \| grep "UDP hash" UDP hash table entries: 65536 (order: 9, 2097152 bytes, vmalloc) # sysctl net.ipv4.udp_hash_entries net.ipv4.udp_hash_entries = 65536 # can be changed by uhash_entries # sysctl net.ipv4.udp_child_hash_entries net.ipv4.udp_child_hash_entries = 0 # disabled by default # ip netns add test1 # ip netns exec test1 sysctl net.ipv4.udp_hash_entries net.ipv4.udp_hash_entries = -65536 # share the global table # sysctl -w net.ipv4.udp_child_hash_entries=100 net.ipv4.udp_child_hash_entries = 100 # ip netns add test2 # ip netns exec test2 sysctl net.ipv4.udp_hash_entries net.ipv4.udp_hash_entries = 128 # own a per-netns table with 2^n buckets We could optimise the hash table lookup/iteration further by removing the netns comparison for the per-netns one in the future. Also, we could optimise the sparse udp_hslot layout by putting it in udp_table. [0]: https://lore.kernel.org/netdev/[email protected]/ Signed-off-by: Kuniyuki Iwashima <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-11-16	wifi: wl1251: drop support for platform data	Dmitry Torokhov	1	-44/+0
	Remove support for configuring the device via platform data because there are no users of wl1251_platform_data left in the mainline kernel. Signed-off-by: Dmitry Torokhov <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-15	regset: make user_regset_copyin_ignore() void	Sergey Shtylyov	1	-8/+7
	user_regset_copyin_ignore() apparently cannot fail and so always returns 0. Let's make this function return void instead of int... Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Sergey Shtylyov <[email protected]> Cc: Brian Cain <[email protected]> Cc: Catalin Marinas <[email protected]> Cc: Christophe Leroy <[email protected]> Cc: David S. Miller <[email protected]> Cc: Dinh Nguyen <[email protected]> Cc: Helge Deller <[email protected]> Cc: James Bottomley <[email protected]> Cc: Jonas Bonn <[email protected]> Cc: Michael Ellerman <[email protected]> Cc: Nicholas Piggin <[email protected]> Cc: Oleg Nesterov <[email protected]> Cc: Rich Felker <[email protected]> Cc: Russell King <[email protected]> Cc: Stafford Horne <[email protected]> Cc: Stefan Kristiansson <[email protected]> Cc: Thomas Bogendoerfer <[email protected]> Cc: Will Deacon <[email protected]> Cc: Yoshinori Sato <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
2022-11-15	bpf: Expand map key argument of bpf_redirect_map to u64	Toke Høiland-Jørgensen	2	-7/+7
	For queueing packets in XDP we want to add a new redirect map type with support for 64-bit indexes. To prepare fore this, expand the width of the 'key' argument to the bpf_redirect_map() helper. Since BPF registers are always 64-bit, this should be safe to do after the fact. Acked-by: Song Liu <[email protected]> Reviewed-by: Stanislav Fomichev <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-11-15	dev: Move received_rps counter next to RPS members in softnet data	Toke Høiland-Jørgensen	1	-1/+1
	Move the received_rps counter value next to the other RPS-related members in softnet_data. This closes two four-byte holes in the structure, making room for another pointer in the first two cache lines without bumping the xmit struct to its own line. Acked-by: Song Liu <[email protected]> Reviewed-by: Stanislav Fomichev <[email protected]> Signed-off-by: Toke Høiland-Jørgensen <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-11-15	HID: bpf: allow to change the report descriptor	Benjamin Tissoires	1	-0/+8
	Add a new tracepoint hid_bpf_rdesc_fixup() so we can trigger a report descriptor fixup in the bpf world. Whenever the program gets attached/detached, the device is reconnected meaning that userspace will see it disappearing and reappearing with the new report descriptor. Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Benjamin Tissoires <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2022-11-15	HID: bpf: introduce hid_hw_request()	Benjamin Tissoires	1	-1/+12
	This function can not be called under IRQ, thus it is only available while in SEC("syscall"). For consistency, this function requires a HID-BPF context to work with, and so we also provide a helper to create one based on the HID unique ID. Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Benjamin Tissoires <[email protected]> -- changes in v12: - variable dereferenced before check 'ctx' \|Reported-by: kernel test robot <[email protected]> \|Reported-by: Dan Carpenter <[email protected]> no changes in v11 no changes in v10 changes in v9: - fixed kfunc declaration aaccording to latest upstream changes no changes in v8 changes in v7: - hid_bpf_allocate_context: remove unused variable - ensures buf is not NULL changes in v6: - rename parameter size into buf__sz to teach the verifier about the actual buffer size used by the call - remove the allocated data in the user created context, it's not used new-ish in v5 Signed-off-by: Jiri Kosina <[email protected]>
2022-11-15	HID: bpf: allocate data memory for device_event BPF programs	Benjamin Tissoires	1	-5/+32
	We need to also be able to change the size of the report. Reducing it is easy, because we already have the incoming buffer that is big enough, but extending it is harder. Pre-allocate a buffer that is big enough to handle all reports of the device, and use that as the primary buffer for BPF programs. To be able to change the size of the buffer, we change the device_event API and request it to return the size of the buffer. Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Benjamin Tissoires <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2022-11-15	HID: initial BPF implementation	Benjamin Tissoires	2	-0/+122
	Declare an entry point that can use fmod_ret BPF programs, and also an API to access and change the incoming data. A simpler implementation would consist in just calling hid_bpf_device_event() for any incoming event and let users deal with the fact that they will be called for any event of any device. The goal of HID-BPF is to partially replace drivers, so this situation can be problematic because we might have programs which will step on each other toes. For that, we add a new API hid_bpf_attach_prog() that can be called from a syscall and we manually deal with a jump table in hid-bpf. Whenever we add a program to the jump table (in other words, when we attach a program to a HID device), we keep the number of time we added this program in the jump table so we can release it whenever there are no other users. HID devices have an RCU protected list of available programs in the jump table, and those programs are called one after the other thanks to bpf_tail_call(). To achieve the detection of users losing their fds on the programs we attached, we add 2 tracing facilities on bpf_prog_release() (for when a fd is closed) and bpf_free_inode() (for when a pinned program gets unpinned). Reviewed-by: Greg Kroah-Hartman <[email protected]> Signed-off-by: Benjamin Tissoires <[email protected]> Signed-off-by: Jiri Kosina <[email protected]>
2022-11-15	gpiolib: add support for software nodes	Dmitry Torokhov	1	-0/+11
	Now that static device properties understand notion of child nodes and references, let's teach gpiolib to handle them: - GPIOs are represented as a references to software nodes representing gpiochip - references must have 2 arguments - GPIO number within the chip and GPIO flags (GPIO_ACTIVE_LOW/GPIO_ACTIVE_HIGH, etc) - a new PROPERTY_ENTRY_GPIO() macro is supplied to ensure the above - name of the software node representing gpiochip must match label of the gpiochip, as we use it to locate gpiochip structure at runtime The following illustrates use of software nodes to describe a "System" button that is currently specified via use of gpio_keys_platform_data in arch/mips/alchemy/board-mtx1.c. It follows bindings specified in Documentation/devicetree/bindings/input/gpio-keys.yaml. static const struct software_node mxt1_gpiochip2_node = { .name = "alchemy-gpio2", }; static const struct property_entry mtx1_gpio_button_props[] = { PROPERTY_ENTRY_U32("linux,code", BTN_0), PROPERTY_ENTRY_STRING("label", "System button"), PROPERTY_ENTRY_GPIO("gpios", &mxt1_gpiochip2_node, 7, GPIO_ACTIVE_LOW), { } }; Similarly, arch/arm/mach-tegra/board-paz00.c can be converted to: static const struct software_node tegra_gpiochip_node = { .name = "tegra-gpio", }; static struct property_entry wifi_rfkill_prop[] __initdata = { PROPERTY_ENTRY_STRING("name", "wifi_rfkill"), PROPERTY_ENTRY_STRING("type", "wlan"), PROPERTY_ENTRY_GPIO("reset-gpios", &tegra_gpiochip_node, 25, GPIO_ACTIVE_HIGH); PROPERTY_ENTRY_GPIO("shutdown-gpios", &tegra_gpiochip_node, 85, GPIO_ACTIVE_HIGH); { }, }; static struct platform_device wifi_rfkill_device = { .name = "rfkill_gpio", .id = -1, }; ... software_node_register(&tegra_gpiochip_node); device_create_managed_software_node(&wifi_rfkill_device.dev, wifi_rfkill_prop, NULL); Acked-by: Linus Walleij <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Signed-off-by: Dmitry Torokhov <[email protected]> Signed-off-by: Bartosz Golaszewski <[email protected]>
2022-11-15	nvme: implement the DEAC bit for the Write Zeroes command	Christoph Hellwig	1	-0/+1
	While the specification allows devices to either deallocate data or to actually write zeroes on any Write Zeroes command, many SSDs only do the sensible thing and deallocate data when the DEAC bit is specific. Set it when it is supported and the caller doesn't explicitly opt out of deallocation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Martin K. Petersen <[email protected]>
2022-11-15	nvme: fine-granular CAP_SYS_ADMIN for nvme io commands	Kanchan Joshi	1	-0/+1
	Currently both io and admin commands are kept under a coarse-granular CAP_SYS_ADMIN check, disregarding file mode completely. $ ls -l /dev/ng* crw-rw-rw- 1 root root 242, 0 Sep 9 19:20 /dev/ng0n1 crw------- 1 root root 242, 1 Sep 9 19:20 /dev/ng0n2 In the example above, ng0n1 appears as if it may allow unprivileged read/write operation but it does not and behaves same as ng0n2. This patch implements a shift from CAP_SYS_ADMIN to more fine-granular control for io-commands. If CAP_SYS_ADMIN is present, nothing else is checked as before. Otherwise, following rules are in place - any admin-cmd is not allowed - vendor-specific and fabric commmand are not allowed - io-commands that can write are allowed if matching FMODE_WRITE permission is present - io-commands that read are allowed Add a helper nvme_cmd_allowed that implements above policy. Change all the callers of CAP_SYS_ADMIN to go through nvme_cmd_allowed for any decision making. Since file open mode is counted for any approval/denial, change at various places to keep file-mode information handy. Signed-off-by: Kanchan Joshi <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Signed-off-by: Christoph Hellwig <[email protected]>
2022-11-15	kallsyms: Add self-test facility	Zhen Lei	1	-0/+1
	Added test cases for basic functions and performance of functions kallsyms_lookup_name(), kallsyms_on_each_symbol() and kallsyms_on_each_match_symbol(). It also calculates the compression rate of the kallsyms compression algorithm for the current symbol set. The basic functions test begins by testing a set of symbols whose address values are known. Then, traverse all symbol addresses and find the corresponding symbol name based on the address. It's impossible to determine whether these addresses are correct, but we can use the above three functions along with the addresses to test each other. Due to the traversal operation of kallsyms_on_each_symbol() is too slow, only 60 symbols can be tested in one second, so let it test on average once every 128 symbols. The other two functions validate all symbols. If the basic functions test is passed, print only performance test results. If the test fails, print error information, but do not perform subsequent performance tests. Start self-test automatically after system startup if CONFIG_KALLSYMS_SELFTEST=y. Example of output content: (prefix 'kallsyms_selftest:' is omitted start --------------------------------------------------------- \| nr_symbols \| compressed size \| original size \| ratio(%) \| \|---------------------------------------------------------\| \| 107543 \| 1357912 \| 2407433 \| 56.40 \| --------------------------------------------------------- kallsyms_lookup_name() looked up 107543 symbols The time spent on each symbol is (ns): min=630, max=35295, avg=7353 kallsyms_on_each_symbol() traverse all: 11782628 ns kallsyms_on_each_match_symbol() traverse all: 9261 ns finish Signed-off-by: Zhen Lei <[email protected]> Signed-off-by: Luis Chamberlain <[email protected]>
2022-11-14	bpf: Refactor btf_struct_access	Kumar Kartikeya Dwivedi	2	-13/+12
	Instead of having to pass multiple arguments that describe the register, pass the bpf_reg_state into the btf_struct_access callback. Currently, all call sites simply reuse the btf and btf_id of the reg they want to check the access of. The only exception to this pattern is the callsite in check_ptr_to_map_access, hence for that case create a dummy reg to simulate PTR_TO_BTF_ID access. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-11-14	bpf: Rename MEM_ALLOC to MEM_RINGBUF	Kumar Kartikeya Dwivedi	1	-7/+4
	Currently, verifier uses MEM_ALLOC type tag to specially tag memory returned from bpf_ringbuf_reserve helper. However, this is currently only used for this purpose and there is an implicit assumption that it only refers to ringbuf memory (e.g. the check for ARG_PTR_TO_ALLOC_MEM in check_func_arg_reg_off). Hence, rename MEM_ALLOC to MEM_RINGBUF to indicate this special relationship and instead open the use of MEM_ALLOC for more generic allocations made for user types. Also, since ARG_PTR_TO_ALLOC_MEM_OR_NULL is unused, simply drop it. Finally, update selftests using 'alloc_' verifier string to 'ringbuf_'. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-11-14	bpf: Rename RET_PTR_TO_ALLOC_MEM	Kumar Kartikeya Dwivedi	1	-3/+3
	Currently, the verifier has two return types, RET_PTR_TO_ALLOC_MEM, and RET_PTR_TO_ALLOC_MEM_OR_NULL, however the former is confusingly named to imply that it carries MEM_ALLOC, while only the latter does. This causes confusion during code review leading to conclusions like that the return value of RET_PTR_TO_DYNPTR_MEM_OR_NULL (which is RET_PTR_TO_ALLOC_MEM \| PTR_MAYBE_NULL) may be consumable by bpf_ringbuf_{submit,commit}. Rename it to make it clear MEM_ALLOC needs to be tacked on top of RET_PTR_TO_MEM. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-11-14	bpf: Support bpf_list_head in map values	Kumar Kartikeya Dwivedi	1	-0/+17
	Add the support on the map side to parse, recognize, verify, and build metadata table for a new special field of the type struct bpf_list_head. To parameterize the bpf_list_head for a certain value type and the list_node member it will accept in that value type, we use BTF declaration tags. The definition of bpf_list_head in a map value will be done as follows: struct foo { struct bpf_list_node node; int data; }; struct map_value { struct bpf_list_head head __contains(foo, node); }; Then, the bpf_list_head only allows adding to the list 'head' using the bpf_list_node 'node' for the type struct foo. The 'contains' annotation is a BTF declaration tag composed of four parts, "contains:name:node" where the name is then used to look up the type in the map BTF, with its kind hardcoded to BTF_KIND_STRUCT during the lookup. The node defines name of the member in this type that has the type struct bpf_list_node, which is actually used for linking into the linked list. For now, 'kind' part is hardcoded as struct. This allows building intrusive linked lists in BPF, using container_of to obtain pointer to entry, while being completely type safe from the perspective of the verifier. The verifier knows exactly the type of the nodes, and knows that list helpers return that type at some fixed offset where the bpf_list_node member used for this list exists. The verifier also uses this information to disallow adding types that are not accepted by a certain list. For now, no elements can be added to such lists. Support for that is coming in future patches, hence draining and freeing items is done with a TODO that will be resolved in a future patch. Note that the bpf_list_head_free function moves the list out to a local variable under the lock and releases it, doing the actual draining of the list items outside the lock. While this helps with not holding the lock for too long pessimizing other concurrent list operations, it is also necessary for deadlock prevention: unless every function called in the critical section would be notrace, a fentry/fexit program could attach and call bpf_map_update_elem again on the map, leading to the same lock being acquired if the key matches and lead to a deadlock. While this requires some special effort on part of the BPF programmer to trigger and is highly unlikely to occur in practice, it is always better if we can avoid such a condition. While notrace would prevent this, doing the draining outside the lock has advantages of its own, hence it is used to also fix the deadlock related problem. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-11-14	bpf: Fix copy_map_value, zero_map_value	Kumar Kartikeya Dwivedi	1	-2/+2
	The current offset needs to also skip over the already copied region in addition to the size of the next field. This case manifests where there are gaps between adjacent special fields. It was observed that for a map value with size 48, having fields at: off: 0, 16, 32 size: 4, 16, 16 The current code does: memcpy(dst + 0, src + 0, 0) memcpy(dst + 4, src + 4, 12) memcpy(dst + 20, src + 20, 12) memcpy(dst + 36, src + 36, 12) With the fix, it is done correctly as: memcpy(dst + 0, src + 0, 0) memcpy(dst + 4, src + 4, 12) memcpy(dst + 32, src + 32, 0) memcpy(dst + 48, src + 48, 0) Fixes: 4d7d7f69f4b1 ("bpf: Adapt copy_map_value for multiple offset case") Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-11-14	bpf: Remove BPF_MAP_OFF_ARR_MAX	Kumar Kartikeya Dwivedi	1	-5/+4
	In f71b2f64177a ("bpf: Refactor map->off_arr handling"), map->off_arr was refactored to be btf_field_offs. The number of field offsets is equal to maximum possible fields limited by BTF_FIELDS_MAX. Hence, reuse BTF_FIELDS_MAX as spin_lock and timer no longer are to be handled specially for offset sorting, fix the comment, and remove incorrect WARN_ON as its rec->cnt can never exceed this value. The reason to keep separate constant was the it was always more 2 more than total kptrs. This is no longer the case. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-11-14	Merge tag 'vfio-v6.1-rc6' of https://github.com/awilliam/linux-vfio	Linus Torvalds	1	-0/+1
	Pull VFIO fixes from Alex Williamson: - Fixes for potential container registration leak for drivers not implementing a close callback, duplicate container de-registrations, and a regression in support for bus reset on last device close from a device set (Anthony DeRossi) * tag 'vfio-v6.1-rc6' of https://github.com/awilliam/linux-vfio: vfio/pci: Check the device set open count on reset vfio: Export the device set open count vfio: Fix container device registration life cycle
2022-11-14	i2c: core: Introduce i2c_client_get_device_id helper function	Angel Iglesias	1	-0/+1
	Introduces new helper function to aid in .probe_new() refactors. In order to use existing i2c_get_device_id() on the probe callback, the device match table needs to be accessible in that function, which would require bigger refactors in some drivers using the deprecated .probe callback. This issue was discussed in more detail in the IIO mailing list. Link: https://lore.kernel.org/all/[email protected]/ Suggested-by: Nuno Sá <[email protected]> Suggested-by: Andy Shevchenko <[email protected]> Suggested-by: Jonathan Cameron <[email protected]> Signed-off-by: Angel Iglesias <[email protected]> Reviewed-by: Andy Shevchenko <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Signed-off-by: Wolfram Sang <[email protected]>