aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2024-07-19x86: vdso: Wire up getrandom() vDSO implementationJason A. Donenfeld9-1/+275
Hook up the generic vDSO implementation to the x86 vDSO data page. Since the existing vDSO infrastructure is heavily based on the timekeeping functionality, which works over arrays of bases, a new macro is introduced for vvars that are not arrays. The vDSO function requires a ChaCha20 implementation that does not write to the stack, yet can still do an entire ChaCha20 permutation, so provide this using SSE2, since this is userland code that must work on all x86-64 processors. Reviewed-by: Thomas Gleixner <[email protected]> Reviewed-by: Samuel Neves <[email protected]> # for vgetrandom-chacha.S Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-07-19random: introduce generic vDSO getrandom() implementationJason A. Donenfeld7-1/+347
Provide a generic C vDSO getrandom() implementation, which operates on an opaque state returned by vgetrandom_alloc() and produces random bytes the same way as getrandom(). This has the following API signature: ssize_t vgetrandom(void *buffer, size_t len, unsigned int flags, void *opaque_state, size_t opaque_len); The return value and the first three arguments are the same as ordinary getrandom(), while the last two arguments are a pointer to the opaque allocated state and its size. Were all five arguments passed to the getrandom() syscall, nothing different would happen, and the functions would have the exact same behavior. The actual vDSO RNG algorithm implemented is the same one implemented by drivers/char/random.c, using the same fast-erasure techniques as that. Should the in-kernel implementation change, so too will the vDSO one. It requires an implementation of ChaCha20 that does not use any stack, in order to maintain forward secrecy if a multi-threaded program forks (though this does not account for a similar issue with SA_SIGINFO copying registers to the stack), so this is left as an architecture-specific fill-in. Stack-less ChaCha20 is an easy algorithm to implement on a variety of architectures, so this shouldn't be too onerous. Initially, the state is keyless, and so the first call makes a getrandom() syscall to generate that key, and then uses it for subsequent calls. By keeping track of a generation counter, it knows when its key is invalidated and it should fetch a new one using the syscall. Later, more than just a generation counter might be used. Since MADV_WIPEONFORK is set on the opaque state, the key and related state is wiped during a fork(), so secrets don't roll over into new processes, and the same state doesn't accidentally generate the same random stream. The generation counter, as well, is always >0, so that the 0 counter is a useful indication of a fork() or otherwise uninitialized state. If the kernel RNG is not yet initialized, then the vDSO always calls the syscall, because that behavior cannot be emulated in userspace, but fortunately that state is short lived and only during early boot. If it has been initialized, then there is no need to inspect the `flags` argument, because the behavior does not change post-initialization regardless of the `flags` value. Since the opaque state passed to it is mutated, vDSO getrandom() is not reentrant, when used with the same opaque state, which libc should be mindful of. The function works over an opaque per-thread state of a particular size, which must be marked VM_WIPEONFORK, VM_DONTDUMP, VM_NORESERVE, and VM_DROPPABLE for proper operation. Over time, the nuances of these allocations may change or grow or even differ based on architectural features. The opaque state passed to vDSO getrandom() must be allocated using the mmap_flags and mmap_prot parameters provided by the vgetrandom_opaque_params struct, which also contains the size of each state. That struct can be obtained with a call to vgetrandom(NULL, 0, 0, &params, ~0UL). Then, libc can call mmap(2) and slice up the returned array into a state per each thread, while ensuring that no single state straddles a page boundary. Libc is expected to allocate a chunk of these on first use, and then dole them out to threads as they're created, allocating more when needed. vDSO getrandom() provides the ability for userspace to generate random bytes quickly and safely, and is intended to be integrated into libc's thread management. As an illustrative example, the introduced code in the vdso_test_getrandom self test later in this series might be used to do the same outside of libc. In a libc the various pthread-isms are expected to be elided into libc internals. Reviewed-by: Thomas Gleixner <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-07-19mm: add MAP_DROPPABLE for designating always lazily freeable mappingsJason A. Donenfeld17-15/+146
The vDSO getrandom() implementation works with a buffer allocated with a new system call that has certain requirements: - It shouldn't be written to core dumps. * Easy: VM_DONTDUMP. - It should be zeroed on fork. * Easy: VM_WIPEONFORK. - It shouldn't be written to swap. * Uh-oh: mlock is rlimited. * Uh-oh: mlock isn't inherited by forks. - It shouldn't reserve actual memory, but it also shouldn't crash when page faulting in memory if none is available * Uh-oh: VM_NORESERVE means segfaults. It turns out that the vDSO getrandom() function has three really nice characteristics that we can exploit to solve this problem: 1) Due to being wiped during fork(), the vDSO code is already robust to having the contents of the pages it reads zeroed out midway through the function's execution. 2) In the absolute worst case of whatever contingency we're coding for, we have the option to fallback to the getrandom() syscall, and everything is fine. 3) The buffers the function uses are only ever useful for a maximum of 60 seconds -- a sort of cache, rather than a long term allocation. These characteristics mean that we can introduce VM_DROPPABLE, which has the following semantics: a) It never is written out to swap. b) Under memory pressure, mm can just drop the pages (so that they're zero when read back again). c) It is inherited by fork. d) It doesn't count against the mlock budget, since nothing is locked. e) If there's not enough memory to service a page fault, it's not fatal, and no signal is sent. This way, allocations used by vDSO getrandom() can use: VM_DROPPABLE | VM_DONTDUMP | VM_WIPEONFORK | VM_NORESERVE And there will be no problem with OOMing, crashing on overcommitment, using memory when not in use, not wiping on fork(), coredumps, or writing out to swap. In order to let vDSO getrandom() use this, expose these via mmap(2) as MAP_DROPPABLE. Note that this involves removing the MADV_FREE special case from sort_folio(), which according to Yu Zhao is unnecessary and will simply result in an extra call to shrink_folio_list() in the worst case. The chunk removed reenables the swapbacked flag, which we don't want for VM_DROPPABLE, and we can't conditionalize it here because there isn't a vma reference available. Finally, the provided self test ensures that this is working as desired. Cc: [email protected] Acked-by: David Hildenbrand <[email protected]> Signed-off-by: Jason A. Donenfeld <[email protected]>
2024-07-19Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds65-615/+3141
Pull SCSI updates from James Bottomley: "Updates to the usual drivers (ufs, lpfc, qla2xxx, mpi3mr) plus some misc small fixes. The only core changes are to both bsg and scsi to pass in the device instead of setting it afterwards as q->queuedata, so no functional change" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (69 commits) scsi: aha152x: Use DECLARE_COMPLETION_ONSTACK for non-constant completion scsi: qla2xxx: Convert comma to semicolon scsi: qla2xxx: Update version to 10.02.09.300-k scsi: qla2xxx: Use QP lock to search for bsg scsi: qla2xxx: Reduce fabric scan duplicate code scsi: qla2xxx: Fix optrom version displayed in FDMI scsi: qla2xxx: During vport delete send async logout explicitly scsi: qla2xxx: Complete command early within lock scsi: qla2xxx: Fix flash read failure scsi: qla2xxx: Return ENOBUFS if sg_cnt is more than one for ELS cmds scsi: qla2xxx: Fix for possible memory corruption scsi: qla2xxx: validate nvme_local_port correctly scsi: qla2xxx: Unable to act on RSCN for port online scsi: ufs: exynos: Add support for Flash Memory Protector (FMP) scsi: ufs: core: Add UFSHCD_QUIRK_KEYS_IN_PRDT scsi: ufs: core: Add fill_crypto_prdt variant op scsi: ufs: core: Add UFSHCD_QUIRK_BROKEN_CRYPTO_ENABLE scsi: ufs: core: fold ufshcd_clear_keyslot() into its caller scsi: ufs: core: Add UFSHCD_QUIRK_CUSTOM_CRYPTO_PROFILE scsi: ufs: mcq: Make .get_hba_mac() optional ...
2024-07-19Merge tag 'for-6.11/dm-changes' of ↵Linus Torvalds42-582/+947
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm Pull device mapper updates from Mikulas Patocka: - Optimize processing of flush bios in the dm-linear and dm-stripe targets - Dm-io cleansups and refactoring - Remove unused 'struct thunk' in dm-cache - Handle minor device numbers > 255 in dm-init - Dm-verity refactoring & enabling platform keyring - Fix warning in dm-raid - Improve dm-crypt performance - split bios to smaller pieces, so that They could be processed concurrently - Stop using blk_limits_io_{min,opt} - Dm-vdo cleanup and refactoring - Remove max_write_zeroes_granularity and max_secure_erase_granularity - Dm-multipath cleanup & refactoring - Add dm-crypt and dm-integrity support for non-power-of-2 sector size - Fix reshape in dm-raid - Make dm_block_validator const * tag 'for-6.11/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (33 commits) dm vdo: fix a minor formatting issue in vdo.rst dm vdo int-map: fix kerneldoc formatting dm vdo repair: add missing kerneldoc fields dm: Constify struct dm_block_validator dm-integrity: introduce the Inline mode dm: introduce the target flag mempool_needs_integrity dm raid: fix stripes adding reshape size issues dm raid: move _get_reshape_sectors() as prerequisite to fixing reshape size issues dm-crypt: support for per-sector NVMe metadata dm mpath: don't call dm_get_device in multipath_message dm: factor out helper function from dm_get_device dm-verity: fix dm_is_verity_target() when dm-verity is builtin dm: Remove max_secure_erase_granularity dm: Remove max_write_zeroes_granularity dm vdo indexer: use swap() instead of open coding it dm vdo: remove unused struct 'uds_attribute' dm: stop using blk_limits_io_{min,opt} dm-crypt: limit the size of encryption requests dm verity: add support for signature verification with platform keyring dm-raid: Fix WARN_ON_ONCE check for sync_thread in raid_resume ...
2024-07-19Merge tag 'dma-mapping-6.11-2024-07-19' of ↵Linus Torvalds8-106/+146
git://git.infradead.org/users/hch/dma-mapping Pull dma-mapping updates from Christoph Hellwig: - reduce duplicate swiotlb pool lookups (Michael Kelley) - minor small fixes (Yicong Yang, Yang Li) * tag 'dma-mapping-6.11-2024-07-19' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: fix kernel-doc description for swiotlb_del_transient swiotlb: reduce swiotlb pool lookups dma-mapping: benchmark: Don't starve others when doing the test
2024-07-19Merge tag 'iommu-updates-v6.11' of ↵Linus Torvalds56-1236/+1624
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu updates from Will Deacon: "Core: - Support for the "ats-supported" device-tree property - Removal of the 'ops' field from 'struct iommu_fwspec' - Introduction of iommu_paging_domain_alloc() and partial conversion of existing users - Introduce 'struct iommu_attach_handle' and provide corresponding IOMMU interfaces which will be used by the IOMMUFD subsystem - Remove stale documentation - Add missing MODULE_DESCRIPTION() macro - Misc cleanups Allwinner Sun50i: - Ensure bypass mode is disabled on H616 SoCs - Ensure page-tables are allocated below 4GiB for the 32-bit page-table walker - Add new device-tree compatible strings AMD Vi: - Use try_cmpxchg64() instead of cmpxchg64() when updating pte Arm SMMUv2: - Print much more useful information on context faults - Fix Qualcomm TBU probing when CONFIG_ARM_SMMU_QCOM_DEBUG=n - Add new Qualcomm device-tree bindings Arm SMMUv3: - Support for hardware update of access/dirty bits and reporting via IOMMUFD - More driver rework from Jason, this time updating the PASID/SVA support to prepare for full IOMMUFD support - Add missing MODULE_DESCRIPTION() macro - Minor fixes and cleanups NVIDIA Tegra: - Fix for benign fwspec initialisation issue exposed by rework on the core branch Intel VT-d: - Use try_cmpxchg64() instead of cmpxchg64() when updating pte - Use READ_ONCE() to read volatile descriptor status - Remove support for handling Execute-Requested requests - Avoid calling iommu_domain_alloc() - Minor fixes and refactoring Qualcomm MSM: - Updates to the device-tree bindings" * tag 'iommu-updates-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (72 commits) iommu/tegra-smmu: Pass correct fwnode to iommu_fwspec_init() iommu/vt-d: Fix identity map bounds in si_domain_init() iommu: Move IOMMU_DIRTY_NO_CLEAR define dt-bindings: iommu: Convert msm,iommu-v0 to yaml iommu/vt-d: Fix aligned pages in calculate_psi_aligned_address() iommu/vt-d: Limit max address mask to MAX_AGAW_PFN_WIDTH docs: iommu: Remove outdated Documentation/userspace-api/iommu.rst arm64: dts: fvp: Enable PCIe ATS for Base RevC FVP iommu/of: Support ats-supported device-tree property dt-bindings: PCI: generic: Add ats-supported property iommu: Remove iommu_fwspec ops OF: Simplify of_iommu_configure() ACPI: Retire acpi_iommu_fwspec_ops() iommu: Resolve fwspec ops automatically iommu/mediatek-v1: Clean up redundant fwspec checks RDMA/usnic: Use iommu_paging_domain_alloc() wifi: ath11k: Use iommu_paging_domain_alloc() wifi: ath10k: Use iommu_paging_domain_alloc() drm/msm: Use iommu_paging_domain_alloc() vhost-vdpa: Use iommu_paging_domain_alloc() ...
2024-07-19Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds106-414/+1785
Pull rdma updates from Jason Gunthorpe: "Usual collection of small improvements and fixes: - Bug fixes and minor improvments in efa, irdma, mlx4, mlx5, rxe, hf1, qib, ocrdma - bnxt_re support for MSN, which is a new retransmit logic - Initial mana support for RC qps - Use after free bug and cleanups in iwcm - Reduce resource usage in mlx5 when RDMA verbs features are not used - New verb to drain shared recieve queues, similar to normal recieve queues. This is necessary to allow ULPs a clean shutdown. Used in the iscsi rdma target - mlx5 support for more than 16 bits of doorbell indexes - Doorbell moderation support for bnxt_re - IB multi-plane support for mlx5 - New EFA adaptor PCI IDs - RDMA_NAME_ASSIGN_TYPE_USER to hint to userspace that it shouldn't rename the device - A collection of hns bugs - Fix long standing bug in bnxt_re with incorrect endian handling of immediate data" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (65 commits) IB/hfi1: Constify struct flag_table RDMA/mana_ib: Set correct device into ib bnxt_re: Fix imm_data endianness RDMA: Fix netdev tracker in ib_device_set_netdev RDMA/hns: Fix mbx timing out before CMD execution is completed RDMA/hns: Fix insufficient extend DB for VFs. RDMA/hns: Fix undifined behavior caused by invalid max_sge RDMA/hns: Fix shift-out-bounds when max_inline_data is 0 RDMA/hns: Fix missing pagesize and alignment check in FRMR RDMA/hns: Fix unmatch exception handling when init eq table fails RDMA/hns: Fix soft lockup under heavy CEQE load RDMA/hns: Check atomic wr length RDMA/ocrdma: Don't inline statistics functions RDMA/core: Introduce "name_assign_type" for an IB device RDMA/qib: Fix truncation compilation warnings in qib_verbs.c RDMA/qib: Fix truncation compilation warnings in qib_init.c RDMA/efa: Add EFA 0xefa3 PCI ID RDMA/mlx5: Support per-plane port IB counters by querying PPCNT register net/mlx5: mlx5_ifc update for accessing ppcnt register of plane ports RDMA/mlx5: Add plane index support when querying PTYS registers ...
2024-07-19Merge tag 'for-linus-iommufd' of ↵Linus Torvalds19-239/+1206
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd Pull iommufd updates from Jason Gunthorpe: - The iova_bitmap logic for efficiently reporting dirty pages back to userspace has a few more tricky corner case bugs that have been resolved and backed with new tests. The revised version has simpler logic. - Shared branch with iommu for handle support when doing domain attach. Handles allow the domain owner to include additional private data on a per-device basis. - IO Page Fault Reporting to userspace via iommufd. Page faults can be generated on fault capable HWPTs when a translation is not present. Routing them to userspace would allow a VMM to be able to virtualize them into an emulated vIOMMU. This is the next step to fully enabling vSVA support. * tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (26 commits) iommufd: Put constants for all the uAPI enums iommufd: Fix error pointer checking iommufd: Add check on user response code iommufd: Remove IOMMUFD_PAGE_RESP_FAILURE iommufd: Require drivers to supply the cache_invalidate_user ops iommufd/selftest: Add coverage for IOPF test iommufd/selftest: Add IOPF support for mock device iommufd: Associate fault object with iommufd_hw_pgtable iommufd: Fault-capable hwpt attach/detach/replace iommufd: Add iommufd fault object iommufd: Add fault and response message definitions iommu: Extend domain attach group with handle support iommu: Add attach handle to struct iopf_group iommu: Remove sva handle list iommu: Introduce domain attachment handle iommufd/iova_bitmap: Remove iterator logic iommufd/iova_bitmap: Dynamic pinning on iova_bitmap_set() iommufd/iova_bitmap: Consolidate iova_bitmap_set exit conditionals iommufd/iova_bitmap: Move initial pinning to iova_bitmap_for_each() iommufd/iova_bitmap: Cache mapped length in iova_bitmap_map struct ...
2024-07-19Merge tag 'tpmdd-next-6.11-rc1-roundtwo' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd Pull tpm fix from Jarkko Sakkinen: "An additional fix that supplements my earlier fixes for handling auth, which I unfortunately missed last time" * tag 'tpmdd-next-6.11-rc1-roundtwo' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: tpm: Use auth only after NULL check in tpm_buf_check_hmac_response()
2024-07-19cifs: Add a tracepoint to track credits involved in R/W requestsDavid Howells7-23/+173
Add a tracepoint to track the credit changes and server in_flight value involved in the lifetime of a R/W request, logging it against the request/subreq debugging ID. This requires the debugging IDs to be recorded in the cifs_credits struct. The tracepoint can be enabled with: echo 1 >/sys/kernel/debug/tracing/events/cifs/smb3_rw_credits/enable Also add a three-state flag to struct cifs_credits to note if we're interested in determining when the in_flight contribution ends and, if so, to track whether we've decremented the contribution yet. Signed-off-by: David Howells <[email protected]> Reviewed-by: Paulo Alcantara (Red Hat) <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Steve French <[email protected]>
2024-07-19cifs: Fix setting of zero_point after DIO writeDavid Howells1-5/+10
At the moment, at the end of a DIO write, cifs calls netfs_resize_file() to adjust the size of the file if it needs it. This will reduce the zero_point (the point above which we assume a read will just return zeros) if it's more than the new i_size, but won't increase it. With DIO writes, however, we definitely want to increase it as we have clobbered the local pagecache and then written some data that's not available locally. Fix cifs to make the zero_point above the end of a DIO or unbuffered write. This fixes corruption seen occasionally with the generic/708 xfs-test. In that case, the read-back of some of the written data is being short-circuited and replaced with zeroes. Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Cc: [email protected] Reported-by: Steve French <[email protected]> Signed-off-by: David Howells <[email protected]> Reviewed-by: Paulo Alcantara (Red Hat) <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Steve French <[email protected]>
2024-07-19cifs: Fix missing error code setDavid Howells1-0/+2
In cifs_strict_readv(), the default rc (-EACCES) is accidentally cleared by a successful return from netfs_start_io_direct(), such that if cifs_find_lock_conflict() fails, we don't return an error. Fix this by resetting the default error code. Fixes: 14b1cd25346b ("cifs: Fix locking in cifs_strict_readv()") Cc: [email protected] Signed-off-by: David Howells <[email protected]> Reviewed-by: Paulo Alcantara (Red Hat) <[email protected]> cc: Jeff Layton <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Steve French <[email protected]>
2024-07-19cifs: Fix server re-repick on subrequest retryDavid Howells1-3/+0
When a subrequest is marked for needing retry, netfs will call cifs_prepare_write() which will make cifs repick the server for the op before renegotiating credits; it then calls cifs_issue_write() which invokes smb2_async_writev() - which re-repicks the server. If a different server is then selected, this causes the increment of server->in_flight to happen against one record and the decrement to happen against another, leading to misaccounting. Fix this by just removing the repick code in smb2_async_writev(). As this is only called from netfslib-driven code, cifs_prepare_write() should always have been called first, and so server should never be NULL and the preparatory step is repeated in the event that we do a retry. The problem manifests as a warning looking something like: WARNING: CPU: 4 PID: 72896 at fs/smb/client/smb2ops.c:97 smb2_add_credits+0x3f0/0x9e0 [cifs] ... RIP: 0010:smb2_add_credits+0x3f0/0x9e0 [cifs] ... smb2_writev_callback+0x334/0x560 [cifs] cifs_demultiplex_thread+0x77a/0x11b0 [cifs] kthread+0x187/0x1d0 ret_from_fork+0x34/0x60 ret_from_fork_asm+0x1a/0x30 Which may be triggered by a number of different xfstests running against an Azure server in multichannel mode. generic/249 seems the most repeatable, but generic/215, generic/249 and generic/308 may also show it. Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib") Cc: [email protected] Reported-by: Steve French <[email protected]> Reviewed-by: Paulo Alcantara (Red Hat) <[email protected]> Acked-by: Tom Talpey <[email protected]> Signed-off-by: David Howells <[email protected]> cc: Jeff Layton <[email protected]> cc: Aurelien Aptel <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] Signed-off-by: Steve French <[email protected]>
2024-07-19cifs: fix noisy message on copy_file_rangeSteve French1-1/+1
There are common cases where copy_file_range can noisily log "source and target of copy not on same server" e.g. the mv command across mounts to two different server's shares. Change this to informational rather than logging as an error. A followon patch will add dynamic trace points e.g. for cifs_file_copychunk_range Cc: [email protected] Reviewed-by: Shyam Prasad N <[email protected]> Signed-off-by: Steve French <[email protected]>
2024-07-19Merge tag 'v6.11-p1' of ↵Linus Torvalds114-5897/+5484
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto update from Herbert Xu: "API: - Test setkey in no-SIMD context - Add skcipher speed test for user-specified algorithm Algorithms: - Add x25519 support on ppc64le - Add VAES and AVX512 / AVX10 optimized AES-GCM on x86 - Remove sm2 algorithm Drivers: - Add Allwinner H616 support to sun8i-ce - Use DMA in stm32 - Add Exynos850 hwrng support to exynos" * tag 'v6.11-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (81 commits) hwrng: core - remove (un)register_miscdev() crypto: lib/mpi - delete unnecessary condition crypto: testmgr - generate power-of-2 lengths more often crypto: mxs-dcp - Ensure payload is zero when using key slot hwrng: Kconfig - Do not enable by default CN10K driver crypto: starfive - Fix nent assignment in rsa dec crypto: starfive - Align rsa input data to 32-bit crypto: qat - fix unintentional re-enabling of error interrupts crypto: qat - extend scope of lock in adf_cfg_add_key_value_param() Documentation: qat: fix auto_reset attribute details crypto: sun8i-ce - add Allwinner H616 support crypto: sun8i-ce - wrap accesses to descriptor address fields dt-bindings: crypto: sun8i-ce: Add compatible for H616 hwrng: core - Fix wrong quality calculation at hw rng registration hwrng: exynos - Enable Exynos850 support hwrng: exynos - Add SMC based TRNG operation hwrng: exynos - Implement bus clock control hwrng: exynos - Use devm_clk_get_enabled() to get the clock hwrng: exynos - Improve coding style dt-bindings: rng: Add Exynos850 support to exynos-trng ...
2024-07-19blk-cgroup: move congestion_count to struct blkcgXiu Jianfeng3-10/+10
The congestion_count was introduced into the struct cgroup by commit d09d8df3a294 ("blkcg: add generic throttling mechanism"), but since it is closely related to the blkio subsys, it is not appropriate to put it in the struct cgroup, so let's move it to struct blkcg. There should be no functional changes because blkcg is per cgroup. Signed-off-by: Xiu Jianfeng <[email protected]> Acked-by: Tejun Heo <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19sbitmap: fix io hung due to race on sbitmap_word::clearedYang Yang2-7/+34
Configuration for sbq: depth=64, wake_batch=6, shift=6, map_nr=1 1. There are 64 requests in progress: map->word = 0xFFFFFFFFFFFFFFFF 2. After all the 64 requests complete, and no more requests come: map->word = 0xFFFFFFFFFFFFFFFF, map->cleared = 0xFFFFFFFFFFFFFFFF 3. Now two tasks try to allocate requests: T1: T2: __blk_mq_get_tag . __sbitmap_queue_get . sbitmap_get . sbitmap_find_bit . sbitmap_find_bit_in_word . __sbitmap_get_word -> nr=-1 __blk_mq_get_tag sbitmap_deferred_clear __sbitmap_queue_get /* map->cleared=0xFFFFFFFFFFFFFFFF */ sbitmap_find_bit if (!READ_ONCE(map->cleared)) sbitmap_find_bit_in_word return false; __sbitmap_get_word -> nr=-1 mask = xchg(&map->cleared, 0) sbitmap_deferred_clear atomic_long_andnot() /* map->cleared=0 */ if (!(map->cleared)) return false; /* * map->cleared is cleared by T1 * T2 fail to acquire the tag */ 4. T2 is the sole tag waiter. When T1 puts the tag, T2 cannot be woken up due to the wake_batch being set at 6. If no more requests come, T1 will wait here indefinitely. This patch achieves two purposes: 1. Check on ->cleared and update on both ->cleared and ->word need to be done atomically, and using spinlock could be the simplest solution. 2. Add extra check in sbitmap_deferred_clear(), to identify whether ->word has free bits. Fixes: ea86ea2cdced ("sbitmap: ammortize cost of clearing bits") Signed-off-by: Yang Yang <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: avoid polling configuration errorshexue1-1/+4
This patch adds a poll queue check, aiming to help users use polled IO accurately. If users do polled IO but the device doesn't have poll queues, they will get suboptimal performance data and waste CPU resources. Add a poll queue check batching this. If users don't have the device properly configured, or if it simply doesn't support polled IO, it will error the IO with -EOPNOTSUPP. This is similar to what we used to do for sync polled IO, which is no longer supported. Signed-off-by: hexue <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: Catch possible entries missing from rqf_name[]John Garry2-0/+2
Add a BUILD_BUG_ON() call to ensure that we are not missing entries in rqf_name[]. Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: Simplify definition of RQF_NAME()John Garry1-1/+1
Now that we have a bit index for RQF_x in __RQF_x, use __RQF_x to simplify the definition of RQF_NAME() by not using ilog2((__force u32()). Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: Use enum to define RQF_x bit indexesJohn Garry1-32/+54
Similar to what we do for enum req_flag_bits, divide the definition of RQF_x flags into an enum to declare the bits and an actual flag. Tweak some comments to not spill onto new lines. Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: Catch possible entries missing from cmd_flag_name[]John Garry2-0/+3
Add a BUILD_BUG_ON() call to ensure that we are not missing entries in cmd_flag_name[]. Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: Catch possible entries missing from alloc_policy_name[]John Garry2-2/+7
Make BLK_TAG_ALLOC_x an enum and add a "max" entry. Add a BUILD_BUG_ON() call to ensure that we are not missing entries in hctx_flag_name[]. Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: Catch possible entries missing from hctx_flag_name[]John Garry2-4/+9
Refresh values in BLK_MQ_F_x enum, and then re-arrange members in hctx_flag_name[] to match that enum. Renumber BLK_MQ_F_ALLOC_POLICY_START_BIT to match the value refresh. Add a BUILD_BUG_ON() call to ensure that we are not missing entries in hctx_flag_name[]. Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: Catch possible entries missing from hctx_state_name[]John Garry2-7/+11
Add a build-time assert that we are not missing entries from hctx_state_name[]. For this, create a separate enum for state flags and add a "max" entry for BLK_MQ_S_x flags. The numbering for those enum values is as default, so don't explicitly number. Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: Catch possible entries missing from blk_queue_flag_name[]John Garry1-0/+2
Assert that we are not missing flag entries in blk_queue_flag_name[]. Signed-off-by: John Garry <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: Make QUEUE_FLAG_x as an enumJohn Garry1-13/+16
This will allow us better keep in sync with blk_queue_flag_name[]. Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: Relocate BLK_MQ_MAX_DEPTHJohn Garry1-2/+1
BLK_MQ_MAX_DEPTH is defined as an enumerated value, but has no real relation to the other members in its enum, so just use #define to provide the definition. Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: Relocate BLK_MQ_CPU_WORK_BATCHJohn Garry2-2/+2
BLK_MQ_CPU_WORK_BATCH is defined in include/linux/blk-mq.h, but only used in blk-mq.c, so relocate to block/blk-mq.h Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: remove QUEUE_FLAG_STOPPEDChristoph Hellwig2-3/+0
QUEUE_FLAG_STOPPED is entirely unused. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Johannes Thumshirn <[email protected]> Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: Add missing entry to hctx_flag_name[]John Garry1-2/+3
Add missing entry for NO_SCHED_BY_DEFAULT and reorder to match the enum. Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: Add zone write plugging entry to rqf_name[]John Garry1-0/+1
Add missing entry. Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19block: Add missing entries from cmd_flag_name[]John Garry1-1/+6
Add missing entries for req_flag_bits. Reviewed-by: Bart Van Assche <[email protected]> Signed-off-by: John Garry <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2024-07-19bpf, events: Use prog to emit ksymbol event for main programHou Tao1-15/+13
Since commit 0108a4e9f358 ("bpf: ensure main program has an extable"), prog->aux->func[0]->kallsyms is left as uninitialized. For BPF programs with subprogs, the symbol for the main program is missing just as shown in the output of perf script below: ffffffff81284b69 qp_trie_lookup_elem+0xb9 ([kernel.kallsyms]) ffffffffc0011125 bpf_prog_a4a0eb0651e6af8b_lookup_qp_trie+0x5d (bpf...) ffffffff8127bc2b bpf_for_each_array_elem+0x7b ([kernel.kallsyms]) ffffffffc00110a1 +0x25 () ffffffff8121a89a trace_call_bpf+0xca ([kernel.kallsyms]) Fix it by always using prog instead prog->aux->func[0] to emit ksymbol event for the main program. After the fix, the output of perf script will be correct: ffffffff81284b96 qp_trie_lookup_elem+0xe6 ([kernel.kallsyms]) ffffffffc001382d bpf_prog_a4a0eb0651e6af8b_lookup_qp_trie+0x5d (bpf...) ffffffff8127bc2b bpf_for_each_array_elem+0x7b ([kernel.kallsyms]) ffffffffc0013779 bpf_prog_245c55ab25cfcf40_qp_trie_lookup+0x25 (bpf...) ffffffff8121a89a trace_call_bpf+0xca ([kernel.kallsyms]) Fixes: 0108a4e9f358 ("bpf: ensure main program has an extable") Signed-off-by: Hou Tao <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: Yonghong Song <[email protected]> Reviewed-by: Krister Johansen <[email protected]> Reviewed-by: Jiri Olsa <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2024-07-19btrfs: change BTRFS_MOUNT_* flags to 64bit typeQu Wenruo5-42/+46
Currently the BTRFS_MOUNT_* flags are already beyond 32 bits, this is going to cause compilation errors for some 32 bit systems, as their unsigned long is only 32 bits long, thus flag BTRFS_MOUNT_IGNORESUPERFLAGS overflows and can lead to errors. Fix the problem by: - Migrate all existing BTRFS_MOUNT_* flags to unsigned long long - Migrate all mount option related variables to unsigned long long * btrfs_fs_info::mount_opt * btrfs_fs_context::mount_opt * mount_opt parameter of btrfs_check_options() * old_opts parameter of btrfs_remount_begin() * old_opts parameter of btrfs_remount_cleanup() * mount_opt parameter of btrfs_check_mountopts_zoned() * mount_opt and opt parameters of check_ro_option() Fixes: 32e6216512b4 ("btrfs: introduce new "rescue=ignoresuperflags" mount option") Signed-off-by: Qu Wenruo <[email protected]> Reviewed-by: David Sterba <[email protected]> Signed-off-by: David Sterba <[email protected]>
2024-07-19Merge branch 'pci/misc'Bjorn Helgaas19-12/+19
- Remove unused struct 'acpi_handle_node' (Dr. David Alan Gilbert) - Use array notation for portdrv .id_table consistently (Masahiro Yamada) - Switch to new Intel CPU model defines (Tony Luck) - Add missing MODULE_DESCRIPTION() macros (Jeff Johnson) * pci/misc: PCI: controller: Add missing MODULE_DESCRIPTION() macros PCI: Add missing MODULE_DESCRIPTION() macros PCI/PM: Switch to new Intel CPU model defines PCI: Use array for .id_table consistently ACPI: PCI: Remove unused struct 'acpi_handle_node'
2024-07-19Merge branch 'pci/switchtec'Bjorn Helgaas3-10/+10
- Make switchtec_class constant (Greg Kroah-Hartman) * pci/switchtec: PCI: switchtec: Make switchtec_class constant
2024-07-19Merge branch 'pci/controller/vmd'Bjorn Helgaas1-4/+4
- Create "domain" symlink for vmd before adding devices below the VMD bridge so it's available when mdadm assembles RAID devices from them (Jiwei Sun) * pci/controller/vmd: PCI: vmd: Create domain symlink before pci_bus_add_devices()
2024-07-19Merge branch 'pci/controller/tegra194'Bjorn Helgaas1-4/+1
- Ensure Tegra194 and Tegra234 inbound ATU entries are 64KB-aligned to match the hardware restriction (Jon Hunter) - Remove unused struct 'tegra_pcie_soc' (Dr. David Alan Gilbert) * pci/controller/tegra194: PCI: tegra: Remove unused struct 'tegra_pcie_soc' PCI: tegra194: Set EP alignment restriction for inbound ATU
2024-07-19Merge branch 'pci/controller/rockchip'Bjorn Helgaas6-44/+322
- Use dev_err_probe() in dw-rockchip probe error path so the failures aren't silent (Uwe Kleine-König) - Sleep PCIE_T_PVPERL_MS (100ms) before deasserting PERST# (Damien Le Moal) - Sleep PCIE_T_RRS_READY_MS (100ms) after conventional reset, before a config access (Damien Le Moal) - Request the PERST# GPIO with GPIOD_OUT_LOW so it matches the POR value, which avoids a spurious PERST# assertion and fixes a Qcom modem firmware crash and issues with WLAN controllers, e.g., RTL8822CE (Manivannan Sadhasivam for rockchip, Niklas Cassel for dw-rockchip) - Refactor dw-rockchip and add support for Endpoint mode for rk3568 and rk3588 (Niklas Cassel) * pci/controller/rockchip: PCI: dw-rockchip: Use pci_epc_init_notify() directly PCI: dw-rockchip: Add endpoint mode support PCI: dw-rockchip: Refactor the driver to prepare for EP mode PCI: dw-rockchip: Add rockchip_pcie_get_ltssm() helper PCI: dw-rockchip: Fix weird indentation PCI: dw-rockchip: Fix initial PERST# GPIO value PCI: dw-rockchip: Add error messages in .probe() error paths PCI: rockchip: Use GPIOD_OUT_LOW flag while requesting ep_gpio PCI: rockchip-host: Wait 100ms after reset before starting configuration PCI: rockchip-host: Fix rockchip_pcie_host_init_port() PERST# handling
2024-07-19Merge branch 'pci/controller/rcar-gen4'Bjorn Helgaas2-34/+276
- Add Synopsys DWC macros for lane skew configuration (Yoshihiro Shimoda) - Add struct rcar_gen4_pcie_drvdata to provide for future SoCs with different initialization requirements (Yoshihiro Shimoda) - Add .ltssm_control() method for SoC dependencies (Yoshihiro Shimoda) - Add r8a779g0 (R-Car V4H) support (Yoshihiro Shimoda) * pci/controller/rcar-gen4: PCI: rcar-gen4: Add support for R-Car V4H PCI: rcar-gen4: Add .ltssm_control() for other SoC support PCI: rcar-gen4: Add struct rcar_gen4_pcie_drvdata PCI: dwc: Add PCIE_PORT_{FORCE,LANE_SKEW} macros
2024-07-19Merge branch 'pci/controller/rcar'Bjorn Helgaas1-1/+5
- Demote WARN() to dev_warn_ratelimited() in rcar_pcie_wakeup() to avoid excessive warnings when the driver is confused about link state when resuming (Marek Vasut) * pci/controller/rcar: PCI: rcar: Demote WARN() to dev_warn_ratelimited() in rcar_pcie_wakeup()
2024-07-19Merge branch 'pci/controller/qcom'Bjorn Helgaas9-188/+376
- Use devm_clk_bulk_get_all() to get all the clocks from DT to avoid writing out all the clock names (Manivannan Sadhasivam) - Add DT binding and driver support for the SA8775P SoC (Mrinmay Sarkar) - Refactor dw_pcie_edma_find_chip() to enable adding support for Hyper DMA (HDMA) (Manivannan Sadhasivam) - Enable drivers to supply the eDMA channel count since some can't auto detect this (Manivannan Sadhasivam) - Add HDMA support for the SA8775P SoC (Mrinmay Sarkar) - Override the SA8775P NO_SNOOP default to avoid possible memory corruption (Mrinmay Sarkar) - Make sure resources are disabled during PERST# assertion, even if the link is already disabled (Manivannan Sadhasivam) - Vote for the CPU-PCIe ICC (interconnect) path to ensure it stays active even if other drivers don't vote for it (Krishna chaitanya chundru) - Add Operating Performance Points (OPP) to scale performance state based on aggregate link bandwidth to improve SoC power efficiency (Krishna chaitanya chundru) - Return failure instead of success if dev_pm_opp_find_freq_floor() fails (Dan Carpenter) - Avoid an error pointer dereference if dev_pm_opp_find_freq_exact() fails (Dan Carpenter) - Prevent use of uninitialized data in qcom_pcie_suspend_noirq() (Dan Carpenter) * pci/controller/qcom: PCI: qcom: Prevent use of uninitialized data in qcom_pcie_suspend_noirq() PCI: qcom: Prevent potential error pointer dereference PCI: qcom: Fix missing error code in qcom_pcie_probe() PCI: qcom: Add OPP support to scale performance PCI: Bring the PCIe speed to MBps logic to new pcie_dev_speed_mbps() PCI: qcom: Add ICC bandwidth vote for CPU to PCIe path PCI: qcom-ep: Disable resources unconditionally during PERST# assert PCI: qcom-ep: Override NO_SNOOP attribute for SA8775P EP PCI: qcom: Override NO_SNOOP attribute for SA8775P RC PCI: epf-mhi: Enable HDMA for SA8775P SoC PCI: qcom-ep: Add HDMA support for SA8775P SoC PCI: dwc: Pass the eDMA mapping format flag directly from glue drivers PCI: dwc: Skip finding eDMA channels count for HDMA platforms PCI: dwc: Refactor dw_pcie_edma_find_chip() API PCI: qcom-ep: Add support for SA8775P SOC dt-bindings: PCI: qcom-ep: Add support for SA8775P SoC PCI: qcom: Use devm_clk_bulk_get_all() API
2024-07-19Merge branch 'pci/controller/microchip'Bjorn Helgaas13-616/+1740
- Move PLDA XpressRICH generic DT binding properties to plda,xpressrich3-axi-common.yaml where they can be shared across PLDA-based drivers (Minda Chen) - Create a drivers/pci/controller/plda/ directory for PLDA-based drivers and move pcie-microchip-host.c there (Minda Chen) - Move PLDA generic macros to pcie-plda.h where they can be shared across drivers (Minda Chen) - Extract PLDA generic structures from pcie-microchip-host.c, rename them to be generic, and move them to pcie-plda-host.c where they can be shared across drivers (Minda Chen) - Add a .request_event_irq() callback for requesting device-specific interrupts in addition to PLDA-generic interrupts (Minda Chen) - Add DT binding and driver for the StarFive JH7110 SoC, based on PLDA IP (Minda Chen) * pci/controller/microchip: PCI: starfive: Add JH7110 PCIe controller dt-bindings: PCI: Add StarFive JH7110 PCIe controller PCI: Add PCIE_RESET_CONFIG_DEVICE_WAIT_MS waiting time value PCI: plda: Pass pci_host_bridge to plda_pcie_setup_iomems() PCI: plda: Add host init/deinit and map bus functions PCI: plda: Add event bitmap field to struct plda_pcie_rp PCI: microchip: Move IRQ functions to pcie-plda-host.c PCI: microchip: Add event irqchip field to host port and add PLDA irqchip PCI: microchip: Add get_events() callback and PLDA get_event() PCI: microchip: Add INTx and MSI event num to struct plda_event PCI: microchip: Add request_event_irq() callback function PCI: microchip: Add num_events field to struct plda_pcie_rp PCI: microchip: Rename interrupt related functions PCI: microchip: Move PLDA functions to pcie-plda-host.c PCI: microchip: Rename PLDA functions to be generic PCI: microchip: Move PLDA structures to plda-pcie.h PCI: microchip: Rename PLDA structures to be generic PCI: microchip: Add bridge_addr field to struct mc_pcie PCI: microchip: Move PLDA IP register macros to pcie-plda.h PCI: microchip: Move pcie-microchip-host.c to PLDA directory dt-bindings: PCI: Add PLDA XpressRICH PCIe host common properties # Conflicts: # drivers/pci/pci.h
2024-07-19Merge branch 'pci/controller/loongson'Bjorn Helgaas1-0/+13
* pci/controller/loongson: PCI: loongson: Enable MSI in LS7A Root Complex
2024-07-19Merge branch 'pci/controller/layerscape'Bjorn Helgaas2-2/+2
- Make the ls-gen4 struct mobiveil_rp_ops constant (Christophe JAILLET) * pci/controller/layerscape: PCI: ls-gen4: Make struct mobiveil_rp_ops constant
2024-07-19Merge branch 'pci/controller/keystone'Bjorn Helgaas1-82/+118
- Enable BAR 0 only for v3.65a to avoid Completion Timeouts that cause a 45 second boot delay on the v4.90a-based AM654x SoC (Siddharth Vadapalli) - Avoid a NULL pointer dereference if DT failed to provide a host bridge memory window (Aleksandr Mishin) * pci/controller/keystone: PCI: keystone: Add workaround for Errata #i2037 (AM65x SR 1.0) PCI: keystone: Fix NULL pointer dereference in case of DT error in ks_pcie_setup_rc_app_regs() PCI: keystone: Don't enable BAR 0 for AM654x PCI: keystone: Relocate ks_pcie_set/clear_dbi_mode()
2024-07-19Merge branch 'pci/controller/hyperv'Bjorn Helgaas1-2/+2
- Return zero, not garbage, when reading PCI_INTERRUPT_PIN from a Hyper-V device (Wei Liu) * pci/controller/hyperv: PCI: hv: Return zero, not garbage, when reading PCI_INTERRUPT_PIN
2024-07-19Merge branch 'pci/controller/exynos'Bjorn Helgaas1-50/+4
- Use devm_clk_bulk_get_all_enable() to simplify clock setup (Shradha Todi) * pci/controller/exynos: PCI: exynos: Adapt to use bulk clock APIs