blaster4385/linux-IllusionX - Linux kernel with personal config changes for arch linux

Age	Commit message (Collapse)	Author	Files	Lines
2024-08-22	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	1	-1/+1
	Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt.h c948c0973df5 ("bnxt_en: Don't clear ntuple filters and rss contexts during ethtool ops") f2878cdeb754 ("bnxt_en: Add support to call FW to update a VNIC") Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-08-22	virt: vbox: struct vmmdev_hgcm_pagelist: Replace 1-element array with ↵	Kees Cook	1	-1/+4
	flexible array Replace the deprecated[1] use of a 1-element array in struct vmmdev_hgcm_pagelist with a modern flexible array. As this is UAPI, we cannot trivially change the size of the struct, so use a union to retain the old first element's size, but switch "pages" to a flexible array. No binary differences are present after this conversion. Link: https://github.com/KSPP/linux/issues/79 [1] Reviewed-by: Gustavo A. R. Silva <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Kees Cook <[email protected]>
2024-08-22	USB: gadget: f_hid: Add GET_REPORT via userspace IOCTL	Chris Wulff	2	-1/+41
	While supporting GET_REPORT is a mandatory request per the HID specification the current implementation of the GET_REPORT request responds to the USB Host with an empty reply of the request length. However, some USB Hosts will request the contents of feature reports via the GET_REPORT request. In addition, some proprietary HID 'protocols' will expect different data, for the same report ID, to be to become available in the feature report by sending a preceding SET_REPORT to the USB Device that defines what data is to be presented when that feature report is subsequently retrieved via GET_REPORT (with a very fast < 5ms turn around between the SET_REPORT and the GET_REPORT). There are two other patch sets already submitted for adding GET_REPORT support. The first [1] allows for pre-priming a list of reports via IOCTLs which then allows the USB Host to perform the request, with no further userspace interaction possible during the GET_REPORT request. And another [2] which allows for a single report to be setup by userspace via IOCTL, which will be fetched and returned by the kernel for subsequent GET_REPORT requests by the USB Host, also with no further userspace interaction possible. This patch, while loosely based on both the patch sets, differs by allowing the option for userspace to respond to each GET_REPORT request by setting up a poll to notify userspace that a new GET_REPORT request has arrived. To support this, two extra IOCTLs are supplied. The first of which is used to retrieve the report ID of the GET_REPORT request (in the case of having non-zero report IDs in the HID descriptor). The second IOCTL allows for storing report responses in a list for responding to requests. The report responses are stored in a list (it will be either added if it does not exist or updated if it exists already). A flag (userspace_req) can be set to whether subsequent requests notify userspace or not. Basic operation when a GET_REPORT request arrives from USB Host: - If the report ID exists in the list and it is set for immediate return (i.e. userspace_req == false) then response is sent immediately, userspace is not notified - The report ID does not exist, or exists but is set to notify userspace (i.e. userspace_req == true) then notify userspace via poll: - If userspace responds, and either adds or update the response in the list and respond to the host with the contents - If userspace does not respond within the fixed timeout (2500ms) but the report has been set prevously, then send 'old' report contents - If userspace does not respond within the fixed timeout (2500ms) and the report does not exist in the list then send an empty report Note that userspace could 'prime' the report list at any other time. While this patch allows for flexibility in how the system responds to requests, and therefore the HID 'protocols' that could be supported, a drawback is the time it takes to service the requests and therefore the maximum throughput that would be achievable. The USB HID Specification v1.11 itself states that GET_REPORT is not intended for periodic data polling, so this limitation is not severe. Testing on an iMX8M Nano Ultra Lite with a heavy multi-core CPU loading showed that userspace can typically respond to the GET_REPORT request within 1200ms - which is well within the 5000ms most operating systems seem to allow, and within the 2500ms set by this patch. [1] https://lore.kernel.org/all/[email protected]/ [2] https://lore.kernel.org/all/[email protected]/ Signed-off-by: David Sands <[email protected]> Signed-off-by: Chris Wulff <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-08-22	net: ipv6: ioam6: new feature tunsrc	Justin Iurman	1	-0/+6
	This patch provides a new feature (i.e., "tunsrc") for the tunnel (i.e., "encap") mode of ioam6. Just like seg6 already does, except it is attached to a route. The "tunsrc" is optional: when not provided (by default), the automatic resolution is applied. Using "tunsrc" when possible has a benefit: performance. See the comparison: - before (= "encap" mode): https://ibb.co/bNCzvf7 - after (= "encap" mode with "tunsrc"): https://ibb.co/PT8L6yq Signed-off-by: Justin Iurman <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-08-20	audit,ipe: add IPE auditing support	Deven Bowers	1	-0/+3
	Users of IPE require a way to identify when and why an operation fails, allowing them to both respond to violations of policy and be notified of potentially malicious actions on their systems with respect to IPE itself. This patch introduces 3 new audit events. AUDIT_IPE_ACCESS(1420) indicates the result of an IPE policy evaluation of a resource. AUDIT_IPE_CONFIG_CHANGE(1421) indicates the current active IPE policy has been changed to another loaded policy. AUDIT_IPE_POLICY_LOAD(1422) indicates a new IPE policy has been loaded into the kernel. This patch also adds support for success auditing, allowing users to identify why an allow decision was made for a resource. However, it is recommended to use this option with caution, as it is quite noisy. Here are some examples of the new audit record types: AUDIT_IPE_ACCESS(1420): audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1 pid=297 comm="sh" path="/root/vol/bin/hello" dev="tmpfs" ino=3897 rule="op=EXECUTE boot_verified=TRUE action=ALLOW" audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1 pid=299 comm="sh" path="/mnt/ipe/bin/hello" dev="dm-0" ino=2 rule="DEFAULT action=DENY" audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1 pid=300 path="/tmp/tmpdp2h1lub/deny/bin/hello" dev="tmpfs" ino=131 rule="DEFAULT action=DENY" The above three records were generated when the active IPE policy only allows binaries from the initramfs to run. The three identical `hello` binary were placed at different locations, only the first hello from the rootfs(initramfs) was allowed. Field ipe_op followed by the IPE operation name associated with the log. Field ipe_hook followed by the name of the LSM hook that triggered the IPE event. Field enforcing followed by the enforcement state of IPE. (it will be introduced in the next commit) Field pid followed by the pid of the process that triggered the IPE event. Field comm followed by the command line program name of the process that triggered the IPE event. Field path followed by the file's path name. Field dev followed by the device name as found in /dev where the file is from. Note that for device mappers it will use the name `dm-X` instead of the name in /dev/mapper. For a file in a temp file system, which is not from a device, it will use `tmpfs` for the field. The implementation of this part is following another existing use case LSM_AUDIT_DATA_INODE in security/lsm_audit.c Field ino followed by the file's inode number. Field rule followed by the IPE rule made the access decision. The whole rule must be audited because the decision is based on the combination of all property conditions in the rule. Along with the syscall audit event, user can know why a blocked happened. For example: audit: AUDIT1420 ipe_op=EXECUTE ipe_hook=BPRM_CHECK enforcing=1 pid=2138 comm="bash" path="/mnt/ipe/bin/hello" dev="dm-0" ino=2 rule="DEFAULT action=DENY" audit[1956]: SYSCALL arch=c000003e syscall=59 success=no exit=-13 a0=556790138df0 a1=556790135390 a2=5567901338b0 a3=ab2a41a67f4f1f4e items=1 ppid=147 pid=1956 auid=4294967295 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0 fsgid=0 tty=pts0 ses=4294967295 comm="bash" exe="/usr/bin/bash" key=(null) The above two records showed bash used execve to run "hello" and got blocked by IPE. Note that the IPE records are always prior to a SYSCALL record. AUDIT_IPE_CONFIG_CHANGE(1421): audit: AUDIT1421 old_active_pol_name="Allow_All" old_active_pol_version=0.0.0 old_policy_digest=sha256:E3B0C44298FC1C149AFBF4C8996FB92427AE41E4649 new_active_pol_name="boot_verified" new_active_pol_version=0.0.0 new_policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F auid=4294967295 ses=4294967295 lsm=ipe res=1 The above record showed the current IPE active policy switch from `Allow_All` to `boot_verified` along with the version and the hash digest of the two policies. Note IPE can only have one policy active at a time, all access decision evaluation is based on the current active policy. The normal procedure to deploy a policy is loading the policy to deploy into the kernel first, then switch the active policy to it. AUDIT_IPE_POLICY_LOAD(1422): audit: AUDIT1422 policy_name="boot_verified" policy_version=0.0.0 policy_digest=sha256:820EEA5B40CA42B51F68962354BA083122A20BB846F2676 auid=4294967295 ses=4294967295 lsm=ipe res=1 The above record showed a new policy has been loaded into the kernel with the policy name, policy version and policy hash. Signed-off-by: Deven Bowers <[email protected]> Signed-off-by: Fan Wu <[email protected]> [PM: subject line tweak] Signed-off-by: Paul Moore <[email protected]>
2024-08-20	ipv4: Centralize TOS matching	Ido Schimmel	1	-0/+2
	The TOS field in the IPv4 flow information structure ('flowi4_tos') is matched by the kernel against the TOS selector in IPv4 rules and routes. The field is initialized differently by different call sites. Some treat it as DSCP (RFC 2474) and initialize all six DSCP bits, some treat it as RFC 1349 TOS and initialize it using RT_TOS() and some treat it as RFC 791 TOS and initialize it using IPTOS_RT_MASK. What is common to all these call sites is that they all initialize the lower three DSCP bits, which fits the TOS definition in the initial IPv4 specification (RFC 791). Therefore, the kernel only allows configuring IPv4 FIB rules that match on the lower three DSCP bits which are always guaranteed to be initialized by all call sites: # ip -4 rule add tos 0x1c table 100 # ip -4 rule add tos 0x3c table 100 Error: Invalid tos. While this works, it is unlikely to be very useful. RFC 791 that initially defined the TOS and IP precedence fields was updated by RFC 2474 over twenty five years ago where these fields were replaced by a single six bits DSCP field. Extending FIB rules to match on DSCP can be done by adding a new DSCP selector while maintaining the existing semantics of the TOS selector for applications that rely on that. A prerequisite for allowing FIB rules to match on DSCP is to adjust all the call sites to initialize the high order DSCP bits and remove their masking along the path to the core where the field is matched on. However, making this change alone will result in a behavior change. For example, a forwarded IPv4 packet with a DS field of 0xfc will no longer match a FIB rule that was configured with 'tos 0x1c'. This behavior change can be avoided by masking the upper three DSCP bits in 'flowi4_tos' before comparing it against the TOS selectors in FIB rules and routes. Implement the above by adding a new function that checks whether a given DSCP value matches the one specified in the IPv4 flow information structure and invoke it from the three places that currently match on 'flowi4_tos'. Use RT_TOS() for the masking of 'flowi4_tos' instead of IPTOS_RT_MASK since the latter is not uAPI and we should be able to remove it at some point. Include <linux/ip.h> in <linux/in_route.h> since the former defines IPTOS_TOS_MASK which is used in the definition of RT_TOS() in <linux/in_route.h>. No regressions in FIB tests: # ./fib_tests.sh [...] Tests passed: 218 Tests failed: 0 And FIB rule tests: # ./fib_rule_tests.sh [...] Tests passed: 116 Tests failed: 0 Signed-off-by: Ido Schimmel <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-08-20	net/smc: introduce statistics for ringbufs usage of net namespace	Wen Gu	1	-0/+2
	The buffer size histograms in smc_stats, namely rx/tx_rmbsize, record the sizes of ringbufs for all connections that have ever appeared in the net namespace. They are incremental and we cannot know the actual ringbufs usage from these. So here introduces statistics for current ringbufs usage of existing smc connections in the net namespace into smc_stats, it will be incremented when new connection uses a ringbuf and decremented when the ringbuf is unused. Signed-off-by: Wen Gu <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-08-20	net/smc: introduce statistics for allocated ringbufs of link group	Wen Gu	1	-0/+4
	Currently we have the statistics on sndbuf/RMB sizes of all connections that have ever been on the link group, namely smc_stats_memsize. However these statistics are incremental and since the ringbufs of link group are allowed to be reused, we cannot know the actual allocated buffers through these. So here introduces the statistic on actual allocated ringbufs of the link group, it will be incremented when a new ringbuf is added into buf_list and decremented when it is deleted from buf_list. Signed-off-by: Wen Gu <[email protected]> Signed-off-by: Paolo Abeni <[email protected]>
2024-08-19	lsm: add IPE lsm	Deven Bowers	1	-0/+1
	Integrity Policy Enforcement (IPE) is an LSM that provides an complimentary approach to Mandatory Access Control than existing LSMs today. Existing LSMs have centered around the concept of access to a resource should be controlled by the current user's credentials. IPE's approach, is that access to a resource should be controlled by the system's trust of a current resource. The basis of this approach is defining a global policy to specify which resource can be trusted. Signed-off-by: Deven Bowers <[email protected]> Signed-off-by: Fan Wu <[email protected]> [PM: subject line tweak] Signed-off-by: Paul Moore <[email protected]>
2024-08-19	fcntl: add F_CREATED_QUERY	Christian Brauner	1	-0/+3
	Systemd has a helper called openat_report_new() that returns whether a file was created anew or it already existed before for cases where O_CREAT has to be used without O_EXCL (cf. [1]). That apparently isn't something that's specific to systemd but it's where I noticed it. The current logic is that it first attempts to open the file without O_CREAT \| O_EXCL and if it gets ENOENT the helper tries again with both flags. If that succeeds all is well. If it now reports EEXIST it retries. That works fairly well but some corner cases make this more involved. If this operates on a dangling symlink the first openat() without O_CREAT \| O_EXCL will return ENOENT but the second openat() with O_CREAT \| O_EXCL will fail with EEXIST. The reason is that openat() without O_CREAT \| O_EXCL follows the symlink while O_CREAT \| O_EXCL doesn't for security reasons. So it's not something we can really change unless we add an explicit opt-in via O_FOLLOW which seems really ugly. The caller could try and use fanotify() to register to listen for creation events in the directory before calling openat(). The caller could then compare the returned tid to its own tid to ensure that even in threaded environments it actually created the file. That might work but is a lot of work for something that should be fairly simple and I'm uncertain about it's reliability. The caller could use a bpf lsm hook to hook into security_file_open() to figure out whether they created the file. That also seems a bit wild. So let's add F_CREATED_QUERY which allows the caller to check whether they actually did create the file. That has caveats of course but I don't think they are problematic: * In multi-threaded environments a thread can only be sure that it did create the file if it calls openat() with O_CREAT. In other words, it's obviously not enough to just go through it's fdtable and check these fds because another thread could've created the file. * If there's any codepaths where an openat() with O_CREAT would yield the same struct file as that of another thread it would obviously cause wrong results. I'm not aware of any such codepaths from openat() itself. Imho, that would be a bug. * Related to the previous point, calling the new fcntl() on files created and opened via special-purpose system calls or ioctl()s would cause wrong results only if the affected subsystem a) raises FMODE_CREATED and b) may return the same struct file for two different calls. I'm not seeing anything outside of regular VFS code that raises FMODE_CREATED. There is code for b) in e.g., the drm layer where the same struct file is resurfaced but again FMODE_CREATED isn't used and it would be very misleading if it did. Link: https://github.com/systemd/systemd/blob/11d5e2b5fbf9f6bfa5763fd45b56829ad4f0777f/src/basic/fs-util.c#L1078 [1] Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Jeff Layton <[email protected]> Reviewed-by: Jan Kara <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-08-19	Merge 6.11-rc4 into usb-next	Greg Kroah-Hartman	3	-2/+4
	We need the usb / thunderbolt fixes in here as well. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-08-19	Merge 6.11-rc4 into char-misc-next	Greg Kroah-Hartman	3	-2/+4
	We need the char/misc fixes in here as well. Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-08-16	Merge tag 'io_uring-6.11-20240824' of git://git.kernel.dk/linux	Linus Torvalds	1	-1/+1
	Pull io_uring fixes from Jens Axboe: - Fix a comment in the uapi header using the wrong member name (Caleb) - Fix KCSAN warning for a debug check in sqpoll (me) - Two more NAPI tweaks (Olivier) * tag 'io_uring-6.11-20240824' of git://git.kernel.dk/linux: io_uring: fix user_data field name in comment io_uring/sqpoll: annotate debug task == current with data_race() io_uring/napi: remove duplicate io_napi_entry timeout assignation io_uring/napi: check napi_enabled in io_napi_add() before proceeding
2024-08-16	io_uring: fix user_data field name in comment	Caleb Sander Mateos	1	-1/+1
	io_uring_cqe's user_data field refers to `sqe->data`, but io_uring_sqe does not have a data field. Fix the comment to say `sqe->user_data`. Signed-off-by: Caleb Sander Mateos <[email protected]> Link: https://github.com/axboe/liburing/pull/1206 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-08-16	ethtool: Add new result codes for TDR diagnostics	Oleksij Rempel	1	-0/+4
	Add new result codes to support TDR diagnostics in preparation for Open Alliance 1000BaseT1 TDR support: - ETHTOOL_A_CABLE_RESULT_CODE_NOISE: TDR not possible due to high noise level. - ETHTOOL_A_CABLE_RESULT_CODE_RESOLUTION_NOT_POSSIBLE: TDR resolution not possible / out of distance. Reviewed-by: Andrew Lunn <[email protected]> Signed-off-by: Oleksij Rempel <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-08-15	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net	Jakub Kicinski	2	-1/+3
	Cross-merge networking fixes after downstream PR. Conflicts: Documentation/devicetree/bindings/net/fsl,qoriq-mc-dpmac.yaml c25504a0ba36 ("dt-bindings: net: fsl,qoriq-mc-dpmac: add missed property phys") be034ee6c33d ("dt-bindings: net: fsl,qoriq-mc-dpmac: using unevaluatedProperties") https://lore.kernel.org/[email protected] drivers/net/dsa/vitesse-vsc73xx-core.c 5b9eebc2c7a5 ("net: dsa: vsc73xx: pass value in phy_write operation") fa63c6434b6f ("net: dsa: vsc73xx: check busy flag in MDIO operations") 2524d6c28bdc ("net: dsa: vsc73xx: use defined values in phy operations") https://lore.kernel.org/[email protected] Resolve by using FIELD_PREP(), Stephen's resolution is simpler. Adjacent changes: net/vmw_vsock/af_vsock.c 69139d2919dd ("vsock: fix recursive ->recvmsg calls") 744500d81f81 ("vsock: add support for SIOCOUTQ ioctl") Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-08-14	UAPI: net/sched: Use __struct_group() in flex struct tc_u32_sel	Gustavo A. R. Silva	1	-10/+13
	Use the `__struct_group()` helper to create a new tagged `struct tc_u32_sel_hdr`. This structure groups together all the members of the flexible `struct tc_u32_sel` except the flexible array. As a result, the array is effectively separated from the rest of the members without modifying the memory layout of the flexible structure. This new tagged struct will be used to fix problematic declarations of middle-flex-arrays in composite structs[1]. [1] https://git.kernel.org/linus/d88cabfd9abc Signed-off-by: Gustavo A. R. Silva <[email protected]> Link: https://patch.msgid.link/e59fe833564ddc5b2cc83056a4c504be887d6193.1723586870.git.gustavoars@kernel.org Signed-off-by: Jakub Kicinski <[email protected]>
2024-08-14	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	1	-0/+1
	Pull kvm fixes from Paolo Bonzini: "s390: - Fix failure to start guests with kvm.use_gisa=0 - Panic if (un)share fails to maintain security. ARM: - Use kvfree() for the kvmalloc'd nested MMUs array - Set of fixes to address warnings in W=1 builds - Make KVM depend on assembler support for ARMv8.4 - Fix for vgic-debug interface for VMs without LPIs - Actually check ID_AA64MMFR3_EL1.S1PIE in get-reg-list selftest - Minor code / comment cleanups for configuring PAuth traps - Take kvm->arch.config_lock to prevent destruction / initialization race for a vCPU's CPUIF which may lead to a UAF x86: - Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX) - Fix smatch issues - Small cleanups - Make x2APIC ID 100% readonly - Fix typo in uapi constant Generic: - Use synchronize_srcu_expedited() on irqfd shutdown" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits) KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG KVM: x86: Disallow read-only memslots for SEV-ES and SEV-SNP (and TDX) KVM: eventfd: Use synchronize_srcu_expedited() on shutdown KVM: selftests: Add a testcase to verify x2APIC is fully readonly KVM: x86: Make x2APIC ID 100% readonly KVM: x86: Use this_cpu_ptr() instead of per_cpu_ptr(smp_processor_id()) KVM: x86: hyper-v: Remove unused inline function kvm_hv_free_pa_page() KVM: SVM: Fix an error code in sev_gmem_post_populate() KVM: SVM: Fix uninitialized variable bug KVM: arm64: vgic: Hold config_lock while tearing down a CPU interface KVM: selftests: arm64: Correct feature test for S1PIE in get-reg-list KVM: arm64: Tidying up PAuth code in KVM KVM: arm64: vgic-debug: Exit the iterator properly w/o LPI KVM: arm64: Enforce dependency on an ARMv8.4-aware toolchain s390/uv: Panic for set and remove shared access UVC errors KVM: s390: fix validity interception issue when gisa is switched off docs: KVM: Fix register ID of SPSR_FIQ KVM: arm64: vgic: fix unexpected unlock sparse warnings KVM: arm64: fix kdoc warnings in W=1 builds KVM: arm64: fix override-init warnings in W=1 builds ...
2024-08-14	KVM: SEV: uapi: fix typo in SEV_RET_INVALID_CONFIG	Amit Shah	1	-0/+1
	"INVALID" is misspelt in "SEV_RET_INAVLID_CONFIG". Since this is part of the UAPI, keep the current definition and add a new one with the fix. Fix-suggested-by: Marc Zyngier <[email protected]> Signed-off-by: Amit Shah <[email protected]> Message-ID: <[email protected]> Signed-off-by: Paolo Bonzini <[email protected]>
2024-08-14	media: rkisp1: Add support for the companding block	Paul Elder	1	-1/+88
	Add support to the rkisp1 driver for the companding block that exists on the i.MX8MP version of the ISP. This requires usage of the new extensible parameters format, and showcases how the format allows for extensions without breaking backward compatibility. Signed-off-by: Paul Elder <[email protected]> Reviewed-by: Jacopo Mondi <[email protected]> Reviewed-by: Paul Elder <[email protected]> Signed-off-by: Jacopo Mondi <[email protected]> Tested-by: Kieran Bingham <[email protected]> Acked-by: Sakari Ailus <[email protected]> Signed-off-by: Laurent Pinchart <[email protected]>
2024-08-13	usb: gadget: f_fs: add capability for dfu functional descriptor	David Sands	2	-10/+95
	Add the ability for the USB FunctionFS (FFS) gadget driver to be able to create Device Firmware Upgrade (DFU) functional descriptors. [1] This patch allows implementation of DFU in userspace using the FFS gadget. The DFU protocol uses the control pipe (ep0) for all messaging so only the addition of the DFU functional descriptor is needed in the kernel driver. The DFU functional descriptor is written to the ep0 file along with any other descriptors during FFS setup. DFU requires an interface descriptor followed by the DFU functional descriptor. This patch includes documentation of the added descriptor for DFU and conversion of some existing documentation to kernel-doc format so that it can be included in the generated docs. An implementation of DFU 1.1 that implements just the runtime descriptor using the FunctionFS gadget (with rebooting into u-boot for DFU mode) has been tested on an i.MX8 Nano. An implementation of DFU 1.1 that implements both runtime and DFU mode using the FunctionFS gadget has been tested on Xilinx Zynq UltraScale+. Note that for the best performance of firmware update file transfers, the userspace program should respond as quick as possible to the setup packets. [1] https://www.usb.org/sites/default/files/DFU_1.1.pdf Signed-off-by: David Sands <[email protected]> Co-developed-by: Chris Wulff <[email protected]> Signed-off-by: Chris Wulff <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-08-12	net: nexthop: Increase weight to u16	Petr Machata	1	-1/+6
	In CLOS networks, as link failures occur at various points in the network, ECMP weights of the involved nodes are adjusted to compensate. With high fan-out of the involved nodes, and overall high number of nodes, a (non-)ECMP weight ratio that we would like to configure does not fit into 8 bits. Instead of, say, 255:254, we might like to configure something like 1000:999. For these deployments, the 8-bit weight may not be enough. To that end, in this patch increase the next hop weight from u8 to u16. Increasing the width of an integral type can be tricky, because while the code still compiles, the types may not check out anymore, and numerical errors come up. To prevent this, the conversion was done in two steps. First the type was changed from u8 to a single-member structure, which invalidated all uses of the field. This allowed going through them one by one and audit for type correctness. Then the structure was replaced with a vanilla u16 again. This should ensure that no place was missed. The UAPI for configuring nexthop group members is that an attribute NHA_GROUP carries an array of struct nexthop_grp entries: struct nexthop_grp { __u32 id; /* nexthop id - must exist / __u8 weight; / weight of this nexthop / __u8 resvd1; __u16 resvd2; }; The field resvd1 is currently validated and required to be zero. We can lift this requirement and carry high-order bits of the weight in the reserved field: struct nexthop_grp { __u32 id; / nexthop id - must exist / __u8 weight; / weight of this nexthop */ __u8 weight_high; __u16 resvd2; }; Keeping the fields split this way was chosen in case an existing userspace makes assumptions about the width of the weight field, and to sidestep any endianness issues. The weight field is currently encoded as the weight value minus one, because weight of 0 is invalid. This same trick is impossible for the new weight_high field, because zero must mean actual zero. With this in place: - Old userspace is guaranteed to carry weight_high of 0, therefore configuring 8-bit weights as appropriate. When dumping nexthops with 16-bit weight, it would only show the lower 8 bits. But configuring such nexthops implies existence of userspace aware of the extension in the first place. - New userspace talking to an old kernel will work as long as it only attempts to configure 8-bit weights, where the high-order bits are zero. Old kernel will bounce attempts at configuring >8-bit weights. Renaming reserved fields as they are allocated for some purpose is commonly done in Linux. Whoever touches a reserved field is doing so at their own risk. nexthop_grp::resvd1 in particular is currently used by at least strace, however they carry an own copy of UAPI headers, and the conversion should be trivial. A helper is provided for decoding the weight out of the two fields. Forcing a conversion seems preferable to bending backwards and introducing anonymous unions or whatever. Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Reviewed-by: David Ahern <[email protected]> Reviewed-by: Przemek Kitszel <[email protected]> Link: https://patch.msgid.link/483e2fcf4beb0d9135d62e7d27b46fa2685479d4.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-08-12	net: nexthop: Add flag to assert that NHGRP reserved fields are zero	Petr Machata	1	-0/+3
	There are many unpatched kernel versions out there that do not initialize the reserved fields of struct nexthop_grp. The issue with that is that if those fields were to be used for some end (i.e. stop being reserved), old kernels would still keep sending random data through the field, and a new userspace could not rely on the value. In this patch, use the existing NHA_OP_FLAGS, which is currently inbound only, to carry flags back to the userspace. Add a flag to indicate that the reserved fields in struct nexthop_grp are zeroed before dumping. This is reliant on the actual fix from commit 6d745cd0e972 ("net: nexthop: Initialize all fields in dumped nexthops"). Signed-off-by: Petr Machata <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Link: https://patch.msgid.link/21037748d4f9d8ff486151f4c09083bcf12d5df8.1723036486.git.petrm@nvidia.com Signed-off-by: Jakub Kicinski <[email protected]>
2024-08-12	nsfs: fix ioctl declaration	Christian Brauner	1	-1/+2
	The kernel is writing an object of type __u64, so the ioctl has to be defined to _IOR(NSIO, 0x5, __u64) instead of _IO(NSIO, 0x5). Reported-by: Dmitry V. Levin <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
2024-08-12	ethtool: rss: support skipping contexts during dump	Jakub Kicinski	1	-0/+1
	Applications may want to deal with dynamic RSS contexts only. So dumping context 0 will be counter-productive for them. Support starting the dump from a given context ID. Alternative would be to implement a dump flag to skip just context 0, not sure which is better... Reviewed-by: Edward Cree <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2024-08-12	media: uapi: videodev2: Add V4L2_META_FMT_RK_ISP1_EXT_PARAMS	Jacopo Mondi	1	-0/+1
	The rkisp1 driver stores ISP configuration parameters in the fixed rkisp1_params_cfg structure. As the members of the structure are part of the userspace API, the structure layout is immutable and cannot be extended further. Introducing new parameters or modifying the existing ones would change the buffer layout and cause breakages in existing applications. The allow for future extensions to the ISP parameters, introduce a new extensible parameters format, with a new format 4CC. Document usage of the new format in the rkisp1 admin guide. Signed-off-by: Jacopo Mondi <[email protected]> Reviewed-by: Daniel Scally <[email protected]> Reviewed-by: Paul Elder <[email protected]> Reviewed-by: Laurent Pinchart <[email protected]> Tested-by: Kieran Bingham <[email protected]> Acked-by: Sakari Ailus <[email protected]> Signed-off-by: Laurent Pinchart <[email protected]>
2024-08-12	media: uapi: rkisp1-config: Add extensible params format	Jacopo Mondi	1	-0/+491
	Add to the rkisp1-config.h header data types and documentation of the extensible parameters format. Signed-off-by: Jacopo Mondi <[email protected]> Reviewed-by: Laurent Pinchart <[email protected]> Reviewed-by: Paul Elder <[email protected]> Tested-by: Kieran Bingham <[email protected]> Acked-by: Sakari Ailus <[email protected]> Signed-off-by: Laurent Pinchart <[email protected]>
2024-08-09	Merge patch series "nsfs: iterate through mount namespaces"	Christian Brauner	1	-0/+16
	Christian Brauner <[email protected]> says: Recently, we added the ability to list mounts in other mount namespaces and the ability to retrieve namespace file descriptors without having to go through procfs by deriving them from pidfds. This extends nsfs in two ways: (1) Add the ability to retrieve information about a mount namespace via NS_MNT_GET_INFO. This will return the mount namespace id and the number of mounts currently in the mount namespace. The number of mounts can be used to size the buffer that needs to be used for listmount() and is in general useful without having to actually iterate through all the mounts. The structure is extensible. (2) Add the ability to iterate through all mount namespaces over which the caller holds privilege returning the file descriptor for the next or previous mount namespace. To retrieve a mount namespace the caller must be privileged wrt to it's owning user namespace. This means that PID 1 on the host can list all mounts in all mount namespaces or that a container can list all mounts of its nested containers. Optionally pass a structure for NS_MNT_GET_INFO with NS_MNT_GET_{PREV,NEXT} to retrieve information about the mount namespace in one go. (1) and (2) can be implemented for other namespace types easily. Together with recent api additions this means one can iterate through all mounts in all mount namespaces without ever touching procfs. Here's a sample program list_all_mounts_everywhere.c: // SPDX-License-Identifier: GPL-2.0-or-later #define _GNU_SOURCE #include <asm/unistd.h> #include <assert.h> #include <errno.h> #include <fcntl.h> #include <getopt.h> #include <linux/stat.h> #include <sched.h> #include <stddef.h> #include <stdint.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/ioctl.h> #include <sys/param.h> #include <sys/pidfd.h> #include <sys/stat.h> #include <sys/statfs.h> #define die_errno(format, ...) \ do { \ fprintf(stderr, "%m \| %s: %d: %s: " format "\n", __FILE__, \ __LINE__, __func__, ##__VA_ARGS__); \ exit(EXIT_FAILURE); \ } while (0) /* Get the id for a mount namespace / #define NS_GET_MNTNS_ID _IO(0xb7, 0x5) / Get next mount namespace. / struct mnt_ns_info { __u32 size; __u32 nr_mounts; __u64 mnt_ns_id; }; #define MNT_NS_INFO_SIZE_VER0 16 / size of first published struct / / Get information about namespace. / #define NS_MNT_GET_INFO _IOR(0xb7, 10, struct mnt_ns_info) / Get next namespace. / #define NS_MNT_GET_NEXT _IOR(0xb7, 11, struct mnt_ns_info) / Get previous namespace. / #define NS_MNT_GET_PREV _IOR(0xb7, 12, struct mnt_ns_info) #define PIDFD_GET_MNT_NAMESPACE _IO(0xFF, 3) #define STATX_MNT_ID_UNIQUE 0x00004000U / Want/got extended stx_mount_id / #define __NR_listmount 458 #define __NR_statmount 457 / * @mask bits for statmount(2) / #define STATMOUNT_SB_BASIC 0x00000001U / Want/got sb_... / #define STATMOUNT_MNT_BASIC 0x00000002U / Want/got mnt_... / #define STATMOUNT_PROPAGATE_FROM 0x00000004U / Want/got propagate_from / #define STATMOUNT_MNT_ROOT 0x00000008U / Want/got mnt_root / #define STATMOUNT_MNT_POINT 0x00000010U / Want/got mnt_point / #define STATMOUNT_FS_TYPE 0x00000020U / Want/got fs_type / #define STATMOUNT_MNT_NS_ID 0x00000040U / Want/got mnt_ns_id / #define STATMOUNT_MNT_OPTS 0x00000080U / Want/got mnt_opts / struct statmount { __u32 size; / Total size, including strings / __u32 mnt_opts; __u64 mask; / What results were written / __u32 sb_dev_major; / Device ID / __u32 sb_dev_minor; __u64 sb_magic; / ..._SUPER_MAGIC / __u32 sb_flags; / SB_{RDONLY,SYNCHRONOUS,DIRSYNC,LAZYTIME} / __u32 fs_type; / [str] Filesystem type / __u64 mnt_id; / Unique ID of mount / __u64 mnt_parent_id; / Unique ID of parent (for root == mnt_id) / __u32 mnt_id_old; / Reused IDs used in proc/.../mountinfo / __u32 mnt_parent_id_old; __u64 mnt_attr; / MOUNT_ATTR_... / __u64 mnt_propagation; / MS_{SHARED,SLAVE,PRIVATE,UNBINDABLE} / __u64 mnt_peer_group; / ID of shared peer group / __u64 mnt_master; / Mount receives propagation from this ID / __u64 propagate_from; / Propagation from in current namespace / __u32 mnt_root; / [str] Root of mount relative to root of fs / __u32 mnt_point; / [str] Mountpoint relative to current root / __u64 mnt_ns_id; __u64 __spare2[49]; char str[]; / Variable size part containing strings / }; struct mnt_id_req { __u32 size; __u32 spare; __u64 mnt_id; __u64 param; __u64 mnt_ns_id; }; #define MNT_ID_REQ_SIZE_VER1 32 / sizeof second published struct / #define LSMT_ROOT 0xffffffffffffffff / root mount / static int __statmount(__u64 mnt_id, __u64 mnt_ns_id, __u64 mask, struct statmount stmnt, size_t bufsize, unsigned int flags) { struct mnt_id_req req = { .size = MNT_ID_REQ_SIZE_VER1, .mnt_id = mnt_id, .param = mask, .mnt_ns_id = mnt_ns_id, }; return syscall(__NR_statmount, &req, stmnt, bufsize, flags); } static struct statmount sys_statmount(__u64 mnt_id, __u64 mnt_ns_id, __u64 mask, unsigned int flags) { size_t bufsize = 1 << 15; struct statmount stmnt = NULL, tmp = NULL; int ret; for (;;) { tmp = realloc(stmnt, bufsize); if (!tmp) goto out; stmnt = tmp; ret = __statmount(mnt_id, mnt_ns_id, mask, stmnt, bufsize, flags); if (!ret) return stmnt; if (errno != EOVERFLOW) goto out; bufsize <<= 1; if (bufsize >= UINT_MAX / 2) goto out; } out: free(stmnt); printf("statmount failed"); return NULL; } static ssize_t sys_listmount(__u64 mnt_id, __u64 last_mnt_id, __u64 mnt_ns_id, __u64 list[], size_t num, unsigned int flags) { struct mnt_id_req req = { .size = MNT_ID_REQ_SIZE_VER1, .mnt_id = mnt_id, .param = last_mnt_id, .mnt_ns_id = mnt_ns_id, }; return syscall(__NR_listmount, &req, list, num, flags); } int main(int argc, char argv[]) { #define LISTMNT_BUFFER 10 __u64 list[LISTMNT_BUFFER], last_mnt_id = 0; int ret, pidfd, fd_mntns; struct mnt_ns_info info = {}; pidfd = pidfd_open(getpid(), 0); if (pidfd < 0) die_errno("pidfd_open failed"); fd_mntns = ioctl(pidfd, PIDFD_GET_MNT_NAMESPACE, 0); if (fd_mntns < 0) die_errno("ioctl(PIDFD_GET_MNT_NAMESPACE) failed"); ret = ioctl(fd_mntns, NS_MNT_GET_INFO, &info); if (ret < 0) die_errno("ioctl(NS_GET_MNTNS_ID) failed"); printf("Listing %u mounts for mount namespace %d:%llu\n", info.nr_mounts, fd_mntns, info.mnt_ns_id); for (;;) { ssize_t nr_mounts; next: nr_mounts = sys_listmount(LSMT_ROOT, last_mnt_id, info.mnt_ns_id, list, LISTMNT_BUFFER, 0); if (nr_mounts <= 0) { printf("Finished listing mounts for mount namespace %d:%llu\n\n", fd_mntns, info.mnt_ns_id); ret = ioctl(fd_mntns, NS_MNT_GET_NEXT, 0); if (ret < 0) die_errno("ioctl(NS_MNT_GET_NEXT) failed"); close(ret); ret = ioctl(fd_mntns, NS_MNT_GET_NEXT, &info); if (ret < 0) { if (errno == ENOENT) { printf("Finished listing all mount namespaces\n"); exit(0); } die_errno("ioctl(NS_MNT_GET_NEXT) failed"); } close(fd_mntns); fd_mntns = ret; last_mnt_id = 0; printf("Listing %u mounts for mount namespace %d:%llu\n", info.nr_mounts, fd_mntns, info.mnt_ns_id); goto next; } for (size_t cur = 0; cur < nr_mounts; cur++) { struct statmount stmnt; last_mnt_id = list[cur]; stmnt = sys_statmount(last_mnt_id, info.mnt_ns_id, STATMOUNT_SB_BASIC \| STATMOUNT_MNT_BASIC \| STATMOUNT_MNT_ROOT \| STATMOUNT_MNT_POINT \| STATMOUNT_MNT_NS_ID \| STATMOUNT_MNT_OPTS \| STATMOUNT_FS_TYPE, 0); if (!stmnt) { printf("Failed to statmount(%llu) in mount namespace(%llu)\n", last_mnt_id, info.mnt_ns_id); continue; } printf("mnt_id(%u/%llu) \| mnt_parent_id(%u/%llu): %s @ %s ==> %s with options: %s\n", stmnt->mnt_id_old, stmnt->mnt_id, stmnt->mnt_parent_id_old, stmnt->mnt_parent_id, stmnt->str + stmnt->fs_type, stmnt->str + stmnt->mnt_root, stmnt->str + stmnt->mnt_point, stmnt->str + stmnt->mnt_opts); free(stmnt); } } exit(0); } patches from https://lore.kernel.org/r/[email protected]: nsfs: iterate through mount namespaces file: add fput() cleanup helper fs: add put_mnt_ns() cleanup helper fs: allow mount namespace fd Signed-off-by: Christian Brauner <[email protected]>
2024-08-09	nsfs: iterate through mount namespaces	Christian Brauner	1	-0/+16
	It is already possible to list mounts in other mount namespaces and to retrieve namespace file descriptors without having to go through procfs by deriving them from pidfds. Augment these abilities by adding the ability to retrieve information about a mount namespace via NS_MNT_GET_INFO. This will return the mount namespace id and the number of mounts currently in the mount namespace. The number of mounts can be used to size the buffer that needs to be used for listmount() and is in general useful without having to actually iterate through all the mounts. The structure is extensible. And add the ability to iterate through all mount namespaces over which the caller holds privilege returning the file descriptor for the next or previous mount namespace. To retrieve a mount namespace the caller must be privileged wrt to it's owning user namespace. This means that PID 1 on the host can list all mounts in all mount namespaces or that a container can list all mounts of its nested containers. Optionally pass a structure for NS_MNT_GET_INFO with NS_MNT_GET_{PREV,NEXT} to retrieve information about the mount namespace in one go. Both ioctls can be implemented for other namespace types easily. Together with recent api additions this means one can iterate through all mounts in all mount namespaces without ever touching procfs. Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Josef Bacik <[email protected]> Reviewed-by: Jeff Layton <[email protected]> Signed-off-by: Christian Brauner <[email protected]>
2024-08-08	bpf/bpf_get,set_sockopt: add option to set TCP-BPF sock ops flags	Alan Maguire	1	-1/+2
	Currently the only opportunity to set sock ops flags dictating which callbacks fire for a socket is from within a TCP-BPF sockops program. This is problematic if the connection is already set up as there is no further chance to specify callbacks for that socket. Add TCP_BPF_SOCK_OPS_CB_FLAGS to bpf_setsockopt() and bpf_getsockopt() to allow users to specify callbacks later, either via an iterator over sockets or via a socket-specific program triggered by a setsockopt() on the socket. Previous discussion on this here [1]. [1] https://lore.kernel.org/bpf/[email protected]/ Signed-off-by: Alan Maguire <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2024-08-08	Merge tag 'drm-misc-next-2024-08-01' of ↵	Daniel Vetter	1	-0/+1
	https://gitlab.freedesktop.org/drm/misc/kernel into drm-next drm-misc-next for v6.12: UAPI Changes: virtio: - Define DRM capset Cross-subsystem Changes: dma-buf: - heaps: Clean up documentation printk: - Pass description to kmsg_dump() Core Changes: CI: - Update IGT tests - Point upstream repo to GitLab instance modesetting: - Introduce Power Saving Policy property for connectors - Add might_fault() to drm_modeset_lock priming - Add dynamic per-crtc vblank configuration support panic: - Avoid build-time interference with framebuffer console docs: - Document Colorspace property scheduler: - Remove full_recover from drm_sched_start TTM: - Make LRU walk restartable after dropping locks - Allow direct reclaim to allocate local memory Driver Changes: amdgpu: - Support Power Saving Policy connector property ast: - astdp: Support AST2600 with VGA; Clean up HPD bridge: - Silence error message on -EPROBE_DEFER - analogix: Clean aup - bridge-connector: Fix double free - lt6505: Disable interrupt when powered off - tc358767: Make default DP port preemphasis configurable gma500: - Update i2c terminology ivpu: - Add MODULE_FIRMWARE() lcdif: - Fix pixel clock loongson: - Use GEM refcount over TTM's mgag200: - Improve BMC handling - Support VBLANK intterupts nouveau: - Refactor and clean up internals - Use GEM refcount over TTM's panel: - Shutdown fixes plus documentation - Refactor several drivers for better code sharing - boe-th101mb31ig002: Support for starry-er88577 MIPI-DSI panel plus DT; Fix porch parameter - edp: Support AOU B116XTN02.3, AUO B116XAN06.1, AOU B116XAT04.1, BOE NV140WUM-N41, BOE NV133WUM-N63, BOE NV116WHM-A4D, CMN N116BCA-EA2, CMN N116BCP-EA2, CSW MNB601LS1-4 - himax-hx8394: Support Microchip AC40T08A MIPI Display panel plus DT - ilitek-ili9806e: Support Densitron DMT028VGHMCMI-1D TFT plus DT - jd9365da: Support Melfas lmfbx101117480 MIPI-DSI panel plus DT; Refactor for code sharing sti: - Fix module owner stm: - Avoid UAF wih managed plane and CRTC helpers - Fix module owner - Fix error handling in probe - Depend on COMMON_CLK - ltdc: Fix transparency after disabling plane; Remove unused interrupt tegra: - Call drm_atomic_helper_shutdown() v3d: - Clean up perfmon vkms: - Clean up Signed-off-by: Daniel Vetter <[email protected]> From: Thomas Zimmermann <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-08-08	media: uapi/linux/cec.h: cec_msg_set_reply_to: zero flags	Hans Verkuil	1	-1/+5
	The cec_msg_set_reply_to() helper function never zeroed the struct cec_msg flags field, this can cause unexpected behavior if flags was uninitialized to begin with. Signed-off-by: Hans Verkuil <[email protected]> Fixes: 0dbacebede1e ("[media] cec: move the CEC framework out of staging and to media") Cc: <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2024-08-05	media: cec: core: add new CEC_MSG_FL_REPLY_VENDOR_ID flag	Hans Verkuil	1	-0/+3
	If this flag is set, then the reply is expected to consist of the CEC_MSG_VENDOR_COMMAND_WITH_ID opcode followed by the Vendor ID (as used in bytes 1-4 of the message), followed by the struct cec_msg reply field. Note that this assumes that the byte after the Vendor ID is a vendor-specific opcode. This flag makes it easier to wait for replies to vendor commands, using the same CEC framework support for waiting for regular replies. Support for this flag is indicated by setting the new CEC_CAP_REPLY_VENDOR_ID capability. Signed-off-by: Hans Verkuil <[email protected]> Signed-off-by: Mauro Carvalho Chehab <[email protected]>
2024-07-31	binder: frozen notification	Yu-Ting Tseng	1	-0/+36
	Frozen processes present a significant challenge in binder transactions. When a process is frozen, it cannot, by design, accept and/or respond to binder transactions. As a result, the sender needs to adjust its behavior, such as postponing transactions until the peer process unfreezes. However, there is currently no way to subscribe to these state change events, making it impossible to implement frozen-aware behaviors efficiently. Introduce a binder API for subscribing to frozen state change events. This allows programs to react to changes in peer process state, mitigating issues related to binder transactions sent to frozen processes. Implementation details: For a given binder_ref, the state of frozen notification can be one of the followings: 1. Userspace doesn't want a notification. binder_ref->freeze is null. 2. Userspace wants a notification but none is in flight. list_empty(&binder_ref->freeze->work.entry) = true 3. A notification is in flight and waiting to be read by userspace. binder_ref_freeze.sent is false. 4. A notification was read by userspace and kernel is waiting for an ack. binder_ref_freeze.sent is true. When a notification is in flight, new state change events are coalesced into the existing binder_ref_freeze struct. If userspace hasn't picked up the notification yet, the driver simply rewrites the state. Otherwise, the notification is flagged as requiring a resend, which will be performed once userspace acks the original notification that's inflight. See https://r.android.com/3070045 for how userspace is going to use this feature. Signed-off-by: Yu-Ting Tseng <[email protected]> Acked-by: Carlos Llamas <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2024-07-30	Merge tag 'v6.11-rc1' into for-6.12	Tejun Heo	42	-105/+2286
	Linux 6.11-rc1
2024-07-29	x86/elf: Add a new FPU buffer layout info to x86 core files	Vignesh Balasubramanian	1	-0/+1
	Add a new .note section containing type, size, offset and flags of every xfeature that is present. This information will be used by debuggers to understand the XSAVE layout of the machine where the core file has been dumped, and to read XSAVE registers, especially during cross-platform debugging. The XSAVE layouts of modern AMD and Intel CPUs differ, especially since Memory Protection Keys and the AVX-512 features have been inculcated into the AMD CPUs. Since AMD never adopted (and hence never left room in the XSAVE layout for) the Intel MPX feature, tools like GDB had assumed a fixed XSAVE layout matching that of Intel (based on the XCR0 mask). Hence, core dumps from AMD CPUs didn't match the known size for the XCR0 mask. This resulted in GDB and other tools not being able to access the values of the AVX-512 and PKRU registers on AMD CPUs. To solve this, an interim solution has been accepted into GDB, and is already a part of GDB 14, see https://sourceware.org/pipermail/gdb-patches/2023-March/198081.html. But it depends on heuristics based on the total XSAVE register set size and the XCR0 mask to infer the layouts of the various register blocks for core dumps, and hence, is not a foolproof mechanism to determine the layout of the XSAVE area. Therefore, add a new core dump note in order to allow GDB/LLDB and other relevant tools to determine the layout of the XSAVE area of the machine where the corefile was dumped. The new core dump note (which is being proposed as a per-process .note section), NT_X86_XSAVE_LAYOUT (0x205) contains an array of structures. Each structure describes an individual extended feature containing offset, size and flags in this format: struct x86_xfeat_component { u32 type; u32 size; u32 offset; u32 flags; }; and in an independent manner, allowing for future extensions without depending on hw arch specifics like CPUID etc. [ bp: Massage commit message, zap trailing whitespace. ] Co-developed-by: Jini Susan George <[email protected]> Signed-off-by: Jini Susan George <[email protected]> Co-developed-by: Borislav Petkov (AMD) <[email protected]> Signed-off-by: Borislav Petkov (AMD) <[email protected]> Signed-off-by: Vignesh Balasubramanian <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2024-07-29	Merge drm/drm-next into drm-misc-next	Thomas Zimmermann	7	-62/+66
	Backmerging to get a late RC of v6.10 before moving into v6.11. Signed-off-by: Thomas Zimmermann <[email protected]>
2024-07-29	spi: Enable controllers to extend the SPI protocol with MOSI idle configuration	Marcelo Schmitt	1	-2/+3
	The behavior of an SPI controller data output line (SDO or MOSI or COPI (Controller Output Peripheral Input) for disambiguation) is usually not specified when the controller is not clocking out data on SCLK edges. However, there do exist SPI peripherals that require specific MOSI line state when data is not being clocked out of the controller. Conventional SPI controllers may set the MOSI line on SCLK edges then bring it low when no data is going out or leave the line the state of the last transfer bit. More elaborated controllers are capable to set the MOSI idle state according to different configurable levels and thus are more suitable for interfacing with demanding peripherals. Add SPI mode bits to allow peripherals to request explicit MOSI idle state when needed. When supporting a particular MOSI idle configuration, the data output line state is expected to remain at the configured level when the controller is not clocking out data. When a device that needs a specific MOSI idle state is identified, its driver should request the MOSI idle configuration by setting the proper SPI mode bit. Acked-by: Nuno Sa <[email protected]> Reviewed-by: Jonathan Cameron <[email protected]> Reviewed-by: David Lechner <[email protected]> Tested-by: David Lechner <[email protected]> Signed-off-by: Marcelo Schmitt <[email protected]> Link: https://patch.msgid.link/9802160b5e5baed7f83ee43ac819cb757a19be55.1720810545.git.marcelo.schmitt@analog.com Signed-off-by: Mark Brown <[email protected]>
2024-07-28	nbd: add support for rotational devices	Wouter Verhelst	1	-1/+2
	The NBD protocol defines the flag NBD_FLAG_ROTATIONAL to flag that the export in use should be treated as a rotational device. Add support for that flag to the kernel driver. Signed-off-by: Wouter Verhelst <[email protected]> Reviewed-by: Eric Blake <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-25	drm/amdkfd: allow users to target recommended SDMA engines	Jonathan Kim	1	-1/+5
	Certain GPUs have better copy performance over xGMI on specific SDMA engines depending on the source and destination GPU. Allow users to create SDMA queues on these recommended engines. Close to 2x overall performance has been observed with this optimization. Signed-off-by: Jonathan Kim <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]>
2024-07-25	Merge tag 'net-6.11-rc1' of ↵	Linus Torvalds	1	-0/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf and netfilter. A lot of networking people were at a conference last week, busy catching COVID, so relatively short PR. Current release - regressions: - tcp: process the 3rd ACK with sk_socket for TFO and MPTCP Current release - new code bugs: - l2tp: protect session IDR and tunnel session list with one lock, make sure the state is coherent to avoid a warning - eth: bnxt_en: update xdp_rxq_info in queue restart logic - eth: airoha: fix location of the MBI_RX_AGE_SEL_MASK field Previous releases - regressions: - xsk: require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len, the field reuses previously un-validated pad Previous releases - always broken: - tap/tun: drop short frames to prevent crashes later in the stack - eth: ice: add a per-VF limit on number of FDIR filters - af_unix: disable MSG_OOB handling for sockets in sockmap/sockhash" * tag 'net-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (34 commits) tun: add missing verification for short frame tap: add missing verification for short frame mISDN: Fix a use after free in hfcmulti_tx() gve: Fix an edge case for TSO skb validity check bnxt_en: update xdp_rxq_info in queue restart logic tcp: process the 3rd ACK with sk_socket for TFO/MPTCP selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len bpf: Fix a segment issue when downgrading gso_size net: mediatek: Fix potential NULL pointer dereference in dummy net_device handling MAINTAINERS: make Breno the netconsole maintainer MAINTAINERS: Update bonding entry net: nexthop: Initialize all fields in dumped nexthops net: stmmac: Correct byte order of perfect_match selftests: forwarding: skip if kernel not support setting bridge fdb learning limit tipc: Return non-zero value from tipc_udp_addr2str() on error netfilter: nft_set_pipapo_avx2: disable softinterrupts ice: Fix recipe read procedure ice: Add a per-VF limit on number of FDIR filters net: bonding: correctly annotate RCU in bond_should_notify_peers() ...
2024-07-25	Merge tag 'uml-for-linus-6.11-rc1' of ↵	Linus Torvalds	1	-14/+176
	git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux Pull UML updates from Richard Weinberger: - Support for preemption - i386 Rust support - Huge cleanup by Benjamin Berg - UBSAN support - Removal of dead code * tag 'uml-for-linus-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/uml/linux: (41 commits) um: vector: always reset vp->opened um: vector: remove vp->lock um: register power-off handler um: line: always fill *error_out in setup_one_line() um: remove pcap driver from documentation um: Enable preemption in UML um: refactor TLB update handling um: simplify and consolidate TLB updates um: remove force_flush_all from fork_handler um: Do not flush MM in flush_thread um: Delay flushing syscalls until the thread is restarted um: remove copy_context_skas0 um: remove LDT support um: compress memory related stub syscalls while adding them um: Rework syscall handling um: Add generic stub_syscall6 function um: Create signal stack memory assignment in stub_data um: Remove stub-data.h include from common-offsets.h um: time-travel: fix signal blocking race/hang um: time-travel: remove time_exit() ...
2024-07-25	Merge tag 'for-netdev' of ↵	Jakub Kicinski	1	-0/+4
	https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-07-25 We've added 14 non-merge commits during the last 8 day(s) which contain a total of 19 files changed, 177 insertions(+), 70 deletions(-). The main changes are: 1) Fix af_unix to disable MSG_OOB handling for sockets in BPF sockmap and BPF sockhash. Also add test coverage for this case, from Michal Luczaj. 2) Fix a segmentation issue when downgrading gso_size in the BPF helper bpf_skb_adjust_room(), from Fred Li. 3) Fix a compiler warning in resolve_btfids due to a missing type cast, from Liwei Song. 4) Fix stack allocation for arm64 to align the stack pointer at a 16 byte boundary in the fexit_sleep BPF selftest, from Puranjay Mohan. 5) Fix a xsk regression to require a flag when actuating tx_metadata_len, from Stanislav Fomichev. 6) Fix function prototype BTF dumping in libbpf for prototypes that have no input arguments, from Andrii Nakryiko. 7) Fix stacktrace symbol resolution in perf script for BPF programs containing subprograms, from Hou Tao. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add XDP_UMEM_TX_METADATA_LEN to XSK TX metadata test xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len bpf: Fix a segment issue when downgrading gso_size tools/resolve_btfids: Fix comparison of distinct pointer types warning in resolve_btfids bpf, events: Use prog to emit ksymbol event for main program selftests/bpf: Test sockmap redirect for AF_UNIX MSG_OOB selftests/bpf: Parametrize AF_UNIX redir functions to accept send() flags selftests/bpf: Support SOCK_STREAM in unix_inet_redir_to_connected() af_unix: Disable MSG_OOB handling for sockets in sockmap/sockhash bpftool: Fix typo in usage help libbpf: Fix no-args func prototype BTF dumping syntax MAINTAINERS: Update powerpc BPF JIT maintainers MAINTAINERS: Update email address of Naveen selftests/bpf: fexit_sleep: Fix stack allocation for arm64 ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2024-07-25	xsk: Require XDP_UMEM_TX_METADATA_LEN to actuate tx_metadata_len	Stanislav Fomichev	1	-0/+4
	Julian reports that commit 341ac980eab9 ("xsk: Support tx_metadata_len") can break existing use cases which don't zero-initialize xdp_umem_reg padding. Introduce new XDP_UMEM_TX_METADATA_LEN to make sure we interpret the padding as tx_metadata_len only when being explicitly asked. Fixes: 341ac980eab9 ("xsk: Support tx_metadata_len") Reported-by: Julian Schindel <[email protected]> Signed-off-by: Stanislav Fomichev <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Reviewed-by: Maciej Fijalkowski <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2024-07-24	drm/virtio: Add DRM capset definition	Dmitry Osipenko	1	-0/+1
	Define DRM native context capset in the VirtIO-GPU protocol header. Signed-off-by: Dmitry Osipenko <[email protected]> Reviewed-by: Rob Clark <[email protected]> Reviewed-by: Pierre-Eric Pelloux-Prayer <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2024-07-24	Merge tag 'random-6.11-rc1-for-linus' of ↵	Linus Torvalds	2	-1/+17
	git://git.kernel.org/pub/scm/linux/kernel/git/crng/random Pull random number generator updates from Jason Donenfeld: "This adds getrandom() support to the vDSO. First, it adds a new kind of mapping to mmap(2), MAP_DROPPABLE, which lets the kernel zero out pages anytime under memory pressure, which enables allocating memory that never gets swapped to disk but also doesn't count as being mlocked. Then, the vDSO implementation of getrandom() is introduced in a generic manner and hooked into random.c. Next, this is implemented on x86. (Also, though it's not ready for this pull, somebody has begun an arm64 implementation already) Finally, two vDSO selftests are added. There are also two housekeeping cleanup commits" * tag 'random-6.11-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: MAINTAINERS: add random.h headers to RNG subsection random: note that RNDGETPOOL was removed in 2.6.9-rc2 selftests/vDSO: add tests for vgetrandom x86: vdso: Wire up getrandom() vDSO implementation random: introduce generic vDSO getrandom() implementation mm: add MAP_DROPPABLE for designating always lazily freeable mappings
2024-07-21	Merge tag 'mm-stable-2024-07-21-14-50' of ↵	Linus Torvalds	1	-1/+157
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page(), vm_map_pages() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd\|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...
2024-07-20	Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm	Linus Torvalds	3	-1/+56
	Pull kvm updates from Paolo Bonzini: "ARM: - Initial infrastructure for shadow stage-2 MMUs, as part of nested virtualization enablement - Support for userspace changes to the guest CTR_EL0 value, enabling (in part) migration of VMs between heterogenous hardware - Fixes + improvements to pKVM's FF-A proxy, adding support for v1.1 of the protocol - FPSIMD/SVE support for nested, including merged trap configuration and exception routing - New command-line parameter to control the WFx trap behavior under KVM - Introduce kCFI hardening in the EL2 hypervisor - Fixes + cleanups for handling presence/absence of FEAT_TCRX - Miscellaneous fixes + documentation updates LoongArch: - Add paravirt steal time support - Add support for KVM_DIRTY_LOG_INITIALLY_SET - Add perf kvm-stat support for loongarch RISC-V: - Redirect AMO load/store access fault traps to guest - perf kvm stat support - Use guest files for IMSIC virtualization, when available s390: - Assortment of tiny fixes which are not time critical x86: - Fixes for Xen emulation - Add a global struct to consolidate tracking of host values, e.g. EFER - Add KVM_CAP_X86_APIC_BUS_CYCLES_NS to allow configuring the effective APIC bus frequency, because TDX - Print the name of the APICv/AVIC inhibits in the relevant tracepoint - Clean up KVM's handling of vendor specific emulation to consistently act on "compatible with Intel/AMD", versus checking for a specific vendor - Drop MTRR virtualization, and instead always honor guest PAT on CPUs that support self-snoop - Update to the newfangled Intel CPU FMS infrastructure - Don't advertise IA32_PERF_GLOBAL_OVF_CTRL as an MSR-to-be-saved, as it reads '0' and writes from userspace are ignored - Misc cleanups x86 - MMU: - Small cleanups, renames and refactoring extracted from the upcoming Intel TDX support - Don't allocate kvm_mmu_page.shadowed_translation for shadow pages that can't hold leafs SPTEs - Unconditionally drop mmu_lock when allocating TDP MMU page tables for eager page splitting, to avoid stalling vCPUs when splitting huge pages - Bug the VM instead of simply warning if KVM tries to split a SPTE that is non-present or not-huge. KVM is guaranteed to end up in a broken state because the callers fully expect a valid SPTE, it's all but dangerous to let more MMU changes happen afterwards x86 - AMD: - Make per-CPU save_area allocations NUMA-aware - Force sev_es_host_save_area() to be inlined to avoid calling into an instrumentable function from noinstr code - Base support for running SEV-SNP guests. API-wise, this includes a new KVM_X86_SNP_VM type, encrypting/measure the initial image into guest memory, and finalizing it before launching it. Internally, there are some gmem/mmu hooks needed to prepare gmem-allocated pages before mapping them into guest private memory ranges This includes basic support for attestation guest requests, enough to say that KVM supports the GHCB 2.0 specification There is no support yet for loading into the firmware those signing keys to be used for attestation requests, and therefore no need yet for the host to provide certificate data for those keys. To support fetching certificate data from userspace, a new KVM exit type will be needed to handle fetching the certificate from userspace. An attempt to define a new KVM_EXIT_COCO / KVM_EXIT_COCO_REQ_CERTS exit type to handle this was introduced in v1 of this patchset, but is still being discussed by community, so for now this patchset only implements a stub version of SNP Extended Guest Requests that does not provide certificate data x86 - Intel: - Remove an unnecessary EPT TLB flush when enabling hardware - Fix a series of bugs that cause KVM to fail to detect nested pending posted interrupts as valid wake eents for a vCPU executing HLT in L2 (with HLT-exiting disable by L1) - KVM: x86: Suppress MMIO that is triggered during task switch emulation Explicitly suppress userspace emulated MMIO exits that are triggered when emulating a task switch as KVM doesn't support userspace MMIO during complex (multi-step) emulation Silently ignoring the exit request can result in the WARN_ON_ONCE(vcpu->mmio_needed) firing if KVM exits to userspace for some other reason prior to purging mmio_needed See commit 0dc902267cb3 ("KVM: x86: Suppress pending MMIO write exits if emulator detects exception") for more details on KVM's limitations with respect to emulated MMIO during complex emulator flows Generic: - Rename the AS_UNMOVABLE flag that was introduced for KVM to AS_INACCESSIBLE, because the special casing needed by these pages is not due to just unmovability (and in fact they are only unmovable because the CPU cannot access them) - New ioctl to populate the KVM page tables in advance, which is useful to mitigate KVM page faults during guest boot or after live migration. The code will also be used by TDX, but (probably) not through the ioctl - Enable halt poll shrinking by default, as Intel found it to be a clear win - Setup empty IRQ routing when creating a VM to avoid having to synchronize SRCU when creating a split IRQCHIP on x86 - Rework the sched_in/out() paths to replace kvm_arch_sched_in() with a flag that arch code can use for hooking both sched_in() and sched_out() - Take the vCPU @id as an "unsigned long" instead of "u32" to avoid truncating a bogus value from userspace, e.g. to help userspace detect bugs - Mark a vCPU as preempted if and only if it's scheduled out while in the KVM_RUN loop, e.g. to avoid marking it preempted and thus writing guest memory when retrieving guest state during live migration blackout Selftests: - Remove dead code in the memslot modification stress test - Treat "branch instructions retired" as supported on all AMD Family 17h+ CPUs - Print the guest pseudo-RNG seed only when it changes, to avoid spamming the log for tests that create lots of VMs - Make the PMU counters test less flaky when counting LLC cache misses by doing CLFLUSH{OPT} in every loop iteration" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (227 commits) crypto: ccp: Add the SNP_VLEK_LOAD command KVM: x86/pmu: Add kvm_pmu_call() to simplify static calls of kvm_pmu_ops KVM: x86: Introduce kvm_x86_call() to simplify static calls of kvm_x86_ops KVM: x86: Replace static_call_cond() with static_call() KVM: SEV: Provide support for SNP_EXTENDED_GUEST_REQUEST NAE event x86/sev: Move sev_guest.h into common SEV header KVM: SEV: Provide support for SNP_GUEST_REQUEST NAE event KVM: x86: Suppress MMIO that is triggered during task switch emulation KVM: x86/mmu: Clean up make_huge_page_split_spte() definition and intro KVM: x86/mmu: Bug the VM if KVM tries to split a !hugepage SPTE KVM: selftests: x86: Add test for KVM_PRE_FAULT_MEMORY KVM: x86: Implement kvm_arch_vcpu_pre_fault_memory() KVM: x86/mmu: Make kvm_mmu_do_page_fault() return mapped level KVM: x86/mmu: Account pf_{fixed,emulate,spurious} in callers of "do page fault" KVM: x86/mmu: Bump pf_taken stat only in the "real" page fault handler KVM: Add KVM_PRE_FAULT_MEMORY vcpu ioctl to pre-populate guest memory KVM: Document KVM_PRE_FAULT_MEMORY ioctl mm, virt: merge AS_UNMOVABLE and AS_INACCESSIBLE perf kvm: Add kvm-stat for loongarch64 LoongArch: KVM: Add PV steal time support in guest side ...
2024-07-20	Merge tag 'landlock-6.11-rc1' of ↵	Linus Torvalds	1	-29/+37
	git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux Pull landlock updates from Mickaël Salaün: "This simplifies code and improves documentation" * tag 'landlock-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mic/linux: landlock: Various documentation improvements landlock: Clarify documentation for struct landlock_ruleset_attr landlock: Use bit-fields for storing handled layer access masks
2024-07-19	Merge tag 'char-misc-6.11-rc1' of ↵	Linus Torvalds	1	-0/+22
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc Pull char / misc and other driver updates from Greg KH: "Here is the "big" set of char/misc and other driver subsystem changes for 6.11-rc1. Nothing major in here, just loads of new drivers and updates. Included in here are: - IIO api updates and new drivers added - wait_interruptable_timeout() api cleanups for some drivers - MODULE_DESCRIPTION() additions for loads of drivers - parport out-of-bounds fix - interconnect driver updates and additions - mhi driver updates and additions - w1 driver fixes - binder speedups and fixes - eeprom driver updates - coresight driver updates - counter driver update - new misc driver additions - other minor api updates All of these, EXCEPT for the final Kconfig build fix for 32bit systems, have been in linux-next for a while with no reported issues. The Kconfig fixup went in 29 hours ago, so might have missed the latest linux-next, but was acked by everyone involved" * tag 'char-misc-6.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (330 commits) misc: Kconfig: exclude mrvl-cn10k-dpi compilation for 32-bit systems misc: delete Makefile.rej binder: fix hang of unregistered readers misc: Kconfig: add a new dependency for MARVELL_CN10K_DPI virtio: add missing MODULE_DESCRIPTION() macro agp: uninorth: add missing MODULE_DESCRIPTION() macro spmi: add missing MODULE_DESCRIPTION() macros dev/parport: fix the array out-of-bounds risk samples: configfs: add missing MODULE_DESCRIPTION() macro misc: mrvl-cn10k-dpi: add Octeon CN10K DPI administrative driver misc: keba: Fix missing AUXILIARY_BUS dependency slimbus: Fix struct and documentation alignment in stream.c MAINTAINERS: CC dri-devel list on Qualcomm FastRPC patches misc: fastrpc: use coherent pool for untranslated Compute Banks misc: fastrpc: support complete DMA pool access to the DSP misc: fastrpc: add missing MODULE_DESCRIPTION() macro misc: fastrpc: Add missing dev_err newlines misc: fastrpc: Use memdup_user() nvmem: core: Implement force_ro sysfs attribute nvmem: Use sysfs_emit() for type attribute ...