aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2022-11-04bpf: Convert BPF_DISPATCHER to use static_call() (not ftrace)Peter Zijlstra1-1/+38
The dispatcher function is currently abusing the ftrace __fentry__ call location for its own purposes -- this obviously gives trouble when the dispatcher and ftrace are both in use. A previous solution tried using __attribute__((patchable_function_entry())) which works, except it is GCC-8+ only, breaking the build on the earlier still supported compilers. Instead use static_call() -- which has its own annotations and does not conflict with ftrace -- to rewrite the dispatch function. By using: return static_call()(ctx, insni, bpf_func) you get a perfect forwarding tail call as function body (iow a single jmp instruction). By having the default static_call() target be bpf_dispatcher_nop_func() it retains the default behaviour (an indirect call to the argument function). Only once a dispatcher program is attached is the target rewritten to directly call the JIT'ed image. Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: Björn Töpel <[email protected]> Tested-by: Jiri Olsa <[email protected]> Acked-by: Björn Töpel <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/Y1/oBlK0yFk5c/[email protected] Link: https://lore.kernel.org/bpf/[email protected]
2022-11-04bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop")Peter Zijlstra1-20/+1
Because __attribute__((patchable_function_entry)) is only available since GCC-8 this solution fails to build on the minimum required GCC version. Undo these changes so we might try again -- without cluttering up the patches with too many changes. This is an almost complete revert of: dbe69b299884 ("bpf: Fix dispatcher patchable function entry to 5 bytes nop") ceea991a019c ("bpf: Move bpf_dispatcher function out of ftrace locations") (notably the arch/x86/Kconfig hunk is kept). Reported-by: David Laight <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: Björn Töpel <[email protected]> Tested-by: Jiri Olsa <[email protected]> Acked-by: Björn Töpel <[email protected]> Acked-by: Jiri Olsa <[email protected]> Link: https://lkml.kernel.org/r/[email protected] Link: https://lore.kernel.org/bpf/[email protected]
2022-11-04Merge tag 'hardening-v6.1-rc4' of ↵Linus Torvalds1-4/+9
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull hardening fix from Kees Cook: - Correctly report struct member size on memcpy overflow (Kees Cook) * tag 'hardening-v6.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: fortify: Capture __bos() results in const temp vars
2022-11-04Merge tag 'efi-fixes-for-v6.1-2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fixes from Ard Biesheuvel: - A pair of tweaks to the EFI random seed code so that externally provided version of this config table are handled more robustly - Another fix for the v6.0 EFI variable refactor that turned out to break Apple machines which don't provide QueryVariableInfo() - Add some guard rails to the EFI runtime service call wrapper so we can recover from synchronous exceptions caused by firmware * tag 'efi-fixes-for-v6.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: arm64: efi: Recover from synchronous exceptions occurring in firmware efi: efivars: Fix variable writes with unsupported query_variable_store() efi: random: Use 'ACPI reclaim' memory for random seed efi: random: reduce seed size to 32 bytes efi/tpm: Pass correct address to memblock_reserve
2022-11-04mm/slab: remove !CONFIG_TRACING variants of kmalloc_[node_]trace()Vlastimil Babka1-23/+0
For !CONFIG_TRACING kernels, the kmalloc() implementation tries (in cases where the allocation size is build-time constant) to save a function call, by inlining kmalloc_trace() to a kmem_cache_alloc() call. However since commit 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc") this path now fails to pass the original request size to be eventually recorded (for kmalloc caches with debugging enabled). We could adjust the code to call __kmem_cache_alloc_node() as the CONFIG_TRACING variant, but that would as a result inline a call with 5 parameters, bloating the kmalloc() call sites. The cost of extra function call (to kmalloc_trace()) seems like a lesser evil. It also appears that the !CONFIG_TRACING variant is incompatible with upcoming hardening efforts [1] so it's easier if we just remove it now. Kernels with no tracing are rare these days and the benefit is dubious anyway. [1] https://lore.kernel.org/linux-mm/[email protected]/T/#m20ecf14390e406247bde0ea9cce368f469c539ed Link: https://lore.kernel.org/all/[email protected]/ Suggested-by: Hyeonggon Yoo <[email protected]> Signed-off-by: Vlastimil Babka <[email protected]>
2022-11-04spi: Merge spi_controller.{slave,target}_abort()Geert Uytterhoeven1-2/+4
Mixing SPI slave/target handlers and SPI slave/target controllers using legacy and modern naming does not work well: there are now two different callbacks for aborting a slave/target operation, of which only one is populated, while spi_{slave,target}_abort() check and use only one, which may be the unpopulated one. Fix this by merging the slave/target abort callbacks into a single callback using a union, like is already done for the slave/target flags. Fixes: b8d3b056a78dcc94 ("spi: introduce new helpers with using modern naming") Signed-off-by: Geert Uytterhoeven <[email protected]> Link: https://lore.kernel.org/r/809c82d54b85dd87ef7ee69fc93016085be85cec.1667555967.git.geert+renesas@glider.be Signed-off-by: Mark Brown <[email protected]>
2022-11-04bcma: Use the proper gpio includeLinus Walleij1-1/+1
The <linux/bcma/bcma_driver_chipcommon.h> is including the legacy header <linux/gpio.h> to obtain struct gpio_chip. Instead, include <linux/gpio/driver.h> where this struct is defined. It turns out that the brcm80211 brcmsmac depends on this to bring in the symbol gpio_is_valid(). The driver looks up the BCMA parent GPIO driver and checks that this succeeds, but then it goes on to use the deprecated GPIO call gpio_is_valid() to check the consistency of the .base member of the BCMA GPIO struct. The whole check can be dropped because the bcma_gpio is initialized in the declarations: struct gpio_chip *bcma_gpio = &cc_drv->gpio; And this can never be NULL. Cc: Jonas Gorski <[email protected]> Acked-by: Arend van Spriel <[email protected]> Signed-off-by: Linus Walleij <[email protected]> Signed-off-by: Kalle Valo <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-04Merge tag 'drm-intel-gt-next-2022-11-03' of ↵Dave Airlie1-0/+6
git://anongit.freedesktop.org/drm/drm-intel into drm-next Driver Changes: - Fix for #7306: [Arc A380] white flickering when using arc as a secondary gpu (Matt A) - Add Wa_18017747507 for DG2 (Wayne) - Avoid spurious WARN on DG1 due to incorrect cache_dirty flag (Niranjana, Matt A) - Corrections to CS timestamp support for Gen5 and earlier (Ville) - Fix a build error used with clang compiler on hwmon (GG) - Improvements to LMEM handling with RPM (Anshuman, Matt A) - Cleanups in dmabuf code (Mike) - Selftest improvements (Matt A) Signed-off-by: Dave Airlie <[email protected]> From: Joonas Lahtinen <[email protected]> Link: https://patchwork.freedesktop.org/patch/msgid/[email protected]
2022-11-03bpf: Refactor map->off_arr handlingKumar Kartikeya Dwivedi2-17/+25
Refactor map->off_arr handling into generic functions that can work on their own without hardcoding map specific code. The btf_fields_offs structure is now returned from btf_parse_field_offs, which can be reused later for types in program BTF. All functions like copy_map_value, zero_map_value call generic underlying functions so that they can also be reused later for copying to values allocated in programs which encode specific fields. Later, some helper functions will also require access to this btf_field_offs structure to be able to skip over special fields at runtime. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-11-03bpf: Consolidate spin_lock, timer management into btf_recordKumar Kartikeya Dwivedi2-22/+34
Now that kptr_off_tab has been refactored into btf_record, and can hold more than one specific field type, accomodate bpf_spin_lock and bpf_timer as well. While they don't require any more metadata than offset, having all special fields in one place allows us to share the same code for allocated user defined types and handle both map values and these allocated objects in a similar fashion. As an optimization, we still keep spin_lock_off and timer_off offsets in the btf_record structure, just to avoid having to find the btf_field struct each time their offset is needed. This is mostly needed to manipulate such objects in a map value at runtime. It's ok to hardcode just one offset as more than one field is disallowed. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-11-03bpf: Refactor kptr_off_tab into btf_recordKumar Kartikeya Dwivedi2-46/+82
To prepare the BPF verifier to handle special fields in both map values and program allocated types coming from program BTF, we need to refactor the kptr_off_tab handling code into something more generic and reusable across both cases to avoid code duplication. Later patches also require passing this data to helpers at runtime, so that they can work on user defined types, initialize them, destruct them, etc. The main observation is that both map values and such allocated types point to a type in program BTF, hence they can be handled similarly. We can prepare a field metadata table for both cases and store them in struct bpf_map or struct btf depending on the use case. Hence, refactor the code into generic btf_record and btf_field member structs. The btf_record represents the fields of a specific btf_type in user BTF. The cnt indicates the number of special fields we successfully recognized, and field_mask is a bitmask of fields that were found, to enable quick determination of availability of a certain field. Subsequently, refactor the rest of the code to work with these generic types, remove assumptions about kptr and kptr_off_tab, rename variables to more meaningful names, etc. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-11-03net: remove unused ndo_get_devlink_portJiri Pirko1-5/+0
Remove ndo_get_devlink_port which is no longer used alongside with the implementations in drivers. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-03net: devlink: track netdev with devlink_port assignedJiri Pirko1-0/+19
Currently, ethernet drivers are using devlink_port_type_eth_set() and devlink_port_type_clear() to set devlink port type and link to related netdev. Instead of calling them directly, let the driver use SET_NETDEV_DEVLINK_PORT macro to assign devlink_port pointer and let devlink to track it. Note the devlink port pointer is static during the time netdevice is registered. In devlink code, use per-namespace netdev notifier to track the netdevices with devlink_port assigned and change the internal devlink_port type and related type pointer accordingly. Signed-off-by: Jiri Pirko <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-03bridge: Add MAC Authentication Bypass (MAB) supportHans J. Schultz1-0/+1
Hosts that support 802.1X authentication are able to authenticate themselves by exchanging EAPOL frames with an authenticator (Ethernet bridge, in this case) and an authentication server. Access to the network is only granted by the authenticator to successfully authenticated hosts. The above is implemented in the bridge using the "locked" bridge port option. When enabled, link-local frames (e.g., EAPOL) can be locally received by the bridge, but all other frames are dropped unless the host is authenticated. That is, unless the user space control plane installed an FDB entry according to which the source address of the frame is located behind the locked ingress port. The entry can be dynamic, in which case learning needs to be enabled so that the entry will be refreshed by incoming traffic. There are deployments in which not all the devices connected to the authenticator (the bridge) support 802.1X. Such devices can include printers and cameras. One option to support such deployments is to unlock the bridge ports connecting these devices, but a slightly more secure option is to use MAB. When MAB is enabled, the MAC address of the connected device is used as the user name and password for the authentication. For MAB to work, the user space control plane needs to be notified about MAC addresses that are trying to gain access so that they will be compared against an allow list. This can be implemented via the regular learning process with the sole difference that learned FDB entries are installed with a new "locked" flag indicating that the entry cannot be used to authenticate the device. The flag cannot be set by user space, but user space can clear the flag by replacing the entry, thereby authenticating the device. Locked FDB entries implement the following semantics with regards to roaming, aging and forwarding: 1. Roaming: Locked FDB entries can roam to unlocked (authorized) ports, in which case the "locked" flag is cleared. FDB entries cannot roam to locked ports regardless of MAB being enabled or not. Therefore, locked FDB entries are only created if an FDB entry with the given {MAC, VID} does not already exist. This behavior prevents unauthenticated devices from disrupting traffic destined to already authenticated devices. 2. Aging: Locked FDB entries age and refresh by incoming traffic like regular entries. 3. Forwarding: Locked FDB entries forward traffic like regular entries. If user space detects an unauthorized MAC behind a locked port and wishes to prevent traffic with this MAC DA from reaching the host, it can do so using tc or a different mechanism. Enable the above behavior using a new bridge port option called "mab". It can only be enabled on a bridge port that is both locked and has learning enabled. Locked FDB entries are flushed from the port once MAB is disabled. A new option is added because there are pure 802.1X deployments that are not interested in notifications about locked FDB entries. Signed-off-by: Hans J. Schultz <[email protected]> Signed-off-by: Ido Schimmel <[email protected]> Acked-by: Nikolay Aleksandrov <[email protected]> Reviewed-by: Vladimir Oltean <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-03Merge tag 'for-netdev' of ↵Jakub Kicinski1-1/+1
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== bpf 2022-11-04 We've added 8 non-merge commits during the last 3 day(s) which contain a total of 10 files changed, 113 insertions(+), 16 deletions(-). The main changes are: 1) Fix memory leak upon allocation failure in BPF verifier's stack state tracking, from Kees Cook. 2) Fix address leakage when BPF progs release reference to an object, from Youlin Li. 3) Fix BPF CI breakage from buggy in.h uapi header dependency, from Andrii Nakryiko. 4) Fix bpftool pin sub-command's argument parsing, from Pu Lehui. 5) Fix BPF sockmap lockdep warning by cancelling psock work outside of socket lock, from Cong Wang. 6) Follow-up for BPF sockmap to fix sk_forward_alloc accounting, from Wang Yufen. bpf-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: Add verifier test for release_reference() bpf: Fix wrong reg type conversion in release_reference() bpf, sock_map: Move cancel_work_sync() out of sock lock tools/headers: Pull in stddef.h to uapi to fix BPF selftests build in CI net/ipv4: Fix linux/in.h header dependencies bpftool: Fix NULL pointer dereference when pin {PROG, MAP, LINK} without FILE bpf, sockmap: Fix the sk->sk_forward_alloc warning of sk_stream_kill_queues bpf, verifier: Fix memory leak in array reallocation for stack state ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-03bpf: Allow specifying volatile type modifier for kptrsKumar Kartikeya Dwivedi1-0/+5
This is useful in particular to mark the pointer as volatile, so that compiler treats each load and store to the field as a volatile access. The alternative is having to define and use READ_ONCE and WRITE_ONCE in the BPF program. Signed-off-by: Kumar Kartikeya Dwivedi <[email protected]> Acked-by: David Vernet <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2022-11-03Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski7-17/+61
No conflicts. Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-03Merge tag 'for-joerg' of ↵Joerg Roedel1-0/+12
git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd into core iommu: Define EINVAL as device/domain incompatibility This series is to replace the previous EMEDIUMTYPE patch in a VFIO series: https://lore.kernel.org/kvm/[email protected]/ The purpose is to regulate all existing ->attach_dev callback functions to use EINVAL exclusively for an incompatibility error between a device and a domain. This allows VFIO and IOMMUFD to detect such a soft error, and then try a different domain with the same device. Among all the patches, the first two are preparatory changes. And then one patch to update kdocs and another three patches for the enforcement effort. Link: https://lore.kernel.org/r/[email protected]
2022-11-03iommu: Prepare IOMMU domain for IOPFLu Baolu1-0/+3
This adds some mechanisms around the iommu_domain so that the I/O page fault handling framework could route a page fault to the domain and call the fault handler from it. Add pointers to the page fault handler and its private data in struct iommu_domain. The fault handler will be called with the private data as a parameter once a page fault is routed to the domain. Any kernel component which owns an iommu domain could install handler and its private parameter so that the page fault could be further routed and handled. This also prepares the SVA implementation to be the first consumer of the per-domain page fault handling model. The I/O page fault handler for SVA is copied to the SVA file with mmget_not_zero() added before mmap_read_lock(). Suggested-by: Jean-Philippe Brucker <[email protected]> Signed-off-by: Lu Baolu <[email protected]> Reviewed-by: Jean-Philippe Brucker <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Tested-by: Zhangfei Gao <[email protected]> Tested-by: Tony Zhu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2022-11-03iommu: Remove SVA related callbacks from iommu opsLu Baolu1-7/+0
These ops'es have been deprecated. There's no need for them anymore. Remove them to avoid dead code. Signed-off-by: Lu Baolu <[email protected]> Reviewed-by: Jean-Philippe Brucker <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Yi Liu <[email protected]> Tested-by: Zhangfei Gao <[email protected]> Tested-by: Tony Zhu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2022-11-03iommu/sva: Refactoring iommu_sva_bind/unbind_device()Lu Baolu1-20/+23
The existing iommu SVA interfaces are implemented by calling the SVA specific iommu ops provided by the IOMMU drivers. There's no need for any SVA specific ops in iommu_ops vector anymore as we can achieve this through the generic attach/detach_dev_pasid domain ops. This refactors the IOMMU SVA interfaces implementation by using the iommu_attach/detach_device_pasid interfaces and align them with the concept of the SVA iommu domain. Put the new SVA code in the SVA related file in order to make it self-contained. Signed-off-by: Lu Baolu <[email protected]> Reviewed-by: Jean-Philippe Brucker <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Tested-by: Zhangfei Gao <[email protected]> Tested-by: Tony Zhu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2022-11-03iommu: Add IOMMU SVA domain supportLu Baolu1-2/+23
The SVA iommu_domain represents a hardware pagetable that the IOMMU hardware could use for SVA translation. This adds some infrastructures to support SVA domain in the iommu core. It includes: - Extend the iommu_domain to support a new IOMMU_DOMAIN_SVA domain type. The IOMMU drivers that support allocation of the SVA domain should provide its own SVA domain specific iommu_domain_ops. - Add a helper to allocate an SVA domain. The iommu_domain_free() is still used to free an SVA domain. The report_iommu_fault() should be replaced by the new iommu_report_device_fault(). Leave the existing fault handler with the existing users and the newly added SVA members excludes it. Suggested-by: Jean-Philippe Brucker <[email protected]> Suggested-by: Jason Gunthorpe <[email protected]> Signed-off-by: Lu Baolu <[email protected]> Reviewed-by: Jean-Philippe Brucker <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Yi Liu <[email protected]> Tested-by: Zhangfei Gao <[email protected]> Tested-by: Tony Zhu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2022-11-03iommu: Add attach/detach_dev_pasid iommu interfacesLu Baolu1-0/+32
Attaching an IOMMU domain to a PASID of a device is a generic operation for modern IOMMU drivers which support PASID-granular DMA address translation. Currently visible usage scenarios include (but not limited): - SVA (Shared Virtual Address) - kernel DMA with PASID - hardware-assist mediated device This adds the set_dev_pasid domain ops for setting the domain onto a PASID of a device and remove_dev_pasid iommu ops for removing any setup on a PASID of device. This also adds interfaces for device drivers to attach/detach/retrieve a domain for a PASID of a device. If multiple devices share a single group, it's fine as long the fabric always routes every TLP marked with a PASID to the host bridge and only the host bridge. For example, ACS achieves this universally and has been checked when pci_enable_pasid() is called. As we can't reliably tell the source apart in a group, all the devices in a group have to be considered as the same source, and mapped to the same PASID table. The DMA ownership is about the whole device (more precisely, iommu group), including the RID and PASIDs. When the ownership is converted, the pasid array must be empty. This also adds necessary checks in the DMA ownership interfaces. Signed-off-by: Lu Baolu <[email protected]> Reviewed-by: Jean-Philippe Brucker <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Yi Liu <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Tested-by: Zhangfei Gao <[email protected]> Tested-by: Tony Zhu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2022-11-03iommu: Remove SVM_FLAG_SUPERVISOR_MODE supportLu Baolu2-18/+3
The current kernel DMA with PASID support is based on the SVA with a flag SVM_FLAG_SUPERVISOR_MODE. The IOMMU driver binds the kernel memory address space to a PASID of the device. The device driver programs the device with kernel virtual address (KVA) for DMA access. There have been security and functional issues with this approach: - The lack of IOTLB synchronization upon kernel page table updates. (vmalloc, module/BPF loading, CONFIG_DEBUG_PAGEALLOC etc.) - Other than slight more protection, using kernel virtual address (KVA) has little advantage over physical address. There are also no use cases yet where DMA engines need kernel virtual addresses for in-kernel DMA. This removes SVM_FLAG_SUPERVISOR_MODE support from the IOMMU interface. The device drivers are suggested to handle kernel DMA with PASID through the kernel DMA APIs. The drvdata parameter in iommu_sva_bind_device() and all callbacks is not needed anymore. Cleanup them as well. Link: https://lore.kernel.org/linux-iommu/[email protected]/ Signed-off-by: Jacob Pan <[email protected]> Signed-off-by: Lu Baolu <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Jean-Philippe Brucker <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Fenghua Yu <[email protected]> Tested-by: Zhangfei Gao <[email protected]> Tested-by: Tony Zhu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2022-11-03iommu: Add max_pasids field in struct dev_iommuLu Baolu1-0/+2
Use this field to save the number of PASIDs that a device is able to consume. It is a generic attribute of a device and lifting it into the per-device dev_iommu struct could help to avoid the boilerplate code in various IOMMU drivers. Signed-off-by: Lu Baolu <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Yi Liu <[email protected]> Tested-by: Zhangfei Gao <[email protected]> Tested-by: Tony Zhu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2022-11-03iommu: Add max_pasids field in struct iommu_deviceLu Baolu1-0/+2
Use this field to keep the number of supported PASIDs that an IOMMU hardware is able to support. This is a generic attribute of an IOMMU and lifting it into the per-IOMMU device structure makes it possible to allocate a PASID for device without calls into the IOMMU drivers. Any iommu driver that supports PASID related features should set this field before enabling them on the devices. In the Intel IOMMU driver, intel_iommu_sm is moved to CONFIG_INTEL_IOMMU enclave so that the pasid_supported() helper could be used in dmar.c without compilation errors. Signed-off-by: Lu Baolu <[email protected]> Reviewed-by: Jean-Philippe Brucker <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Yi Liu <[email protected]> Tested-by: Zhangfei Gao <[email protected]> Tested-by: Tony Zhu <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
2022-11-03regulator: userspace-consumer: Handle regulator-output DT nodesZev Weiss1-0/+1
In addition to adding some fairly simple OF support code, we make some slight adjustments to the userspace-consumer driver to properly support use with regulator-output hardware: - We now do an exclusive get of the supply regulators so as to prevent regulator_init_complete_work from automatically disabling them. - Instead of assuming that the supply is initially disabled, we now query its state to determine the initial value of drvdata->enabled. Signed-off-by: Zev Weiss <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2022-11-03regulator: devres: Add devm_regulator_bulk_get_exclusive()Zev Weiss1-0/+2
We had an exclusive variant of the devm_regulator_get() API, but no corresponding variant for the bulk API; let's add one now. We add a generalized version of the existing regulator_bulk_get() function that additionally takes a get_type parameter and redefine regulator_bulk_get() in terms of it, then do similarly with devm_regulator_bulk_get(), and finally add the new devm_regulator_bulk_get_exclusive(). Signed-off-by: Zev Weiss <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2022-11-03bpf, sock_map: Move cancel_work_sync() out of sock lockCong Wang1-1/+1
Stanislav reported a lockdep warning, which is caused by the cancel_work_sync() called inside sock_map_close(), as analyzed below by Jakub: psock->work.func = sk_psock_backlog() ACQUIRE psock->work_mutex sk_psock_handle_skb() skb_send_sock() __skb_send_sock() sendpage_unlocked() kernel_sendpage() sock->ops->sendpage = inet_sendpage() sk->sk_prot->sendpage = tcp_sendpage() ACQUIRE sk->sk_lock tcp_sendpage_locked() RELEASE sk->sk_lock RELEASE psock->work_mutex sock_map_close() ACQUIRE sk->sk_lock sk_psock_stop() sk_psock_clear_state(psock, SK_PSOCK_TX_ENABLED) cancel_work_sync() __cancel_work_timer() __flush_work() // wait for psock->work to finish RELEASE sk->sk_lock We can move the cancel_work_sync() out of the sock lock protection, but still before saved_close() was called. Fixes: 799aa7f98d53 ("skmsg: Avoid lock_sock() in sk_psock_backlog()") Reported-by: Stanislav Fomichev <[email protected]> Signed-off-by: Cong Wang <[email protected]> Signed-off-by: Daniel Borkmann <[email protected]> Tested-by: Jakub Sitnicki <[email protected]> Acked-by: John Fastabend <[email protected]> Acked-by: Jakub Sitnicki <[email protected]> Link: https://lore.kernel.org/bpf/[email protected]
2022-11-03tty: serial: introduce transmit helpersJiri Slaby (SUSE)1-0/+80
Many serial drivers do the same thing: * send x_char if set * keep sending from the xmit circular buffer until either - the loop reaches the end of the xmit buffer - TX is stopped - HW fifo is full * check for pending characters and: - wake up tty writers to fill for more data into xmit buffer - stop TX if there is nothing in the xmit buffer The only differences are: * how to write the character to the HW fifo * the check of the end condition: - is the HW fifo full? - is limit of the written characters reached? So unify the above into two helpers: * uart_port_tx_limited() -- it performs the above taking the written characters limit into account, and * uart_port_tx() -- the same as above, except it only checks the HW readiness, not the characters limit. The HW specific operations (as stated as "differences" above) are passed as arguments to the macros. They are: * tx_ready -- returns true if HW can accept more data. * put_char -- write a character to the device. * tx_done -- when the write loop is done, perform arbitrary action before potential invocation of ops->stop_tx() happens. Note that the above are macros. This means the code is generated in place and the above 3 arguments are "inlined". I.e. no added penalty by generating call instructions for every single character. Nor any indirect calls. (As in some previous versions of this patchset.) Reviewed-by: Ilpo Järvinen <[email protected]> Signed-off-by: Jiri Slaby (SUSE) <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Greg Kroah-Hartman <[email protected]>
2022-11-02overflow: Introduce overflows_type() and castable_to_type()Kees Cook2-0/+48
Implement a robust overflows_type() macro to test if a variable or constant value would overflow another variable or type. This can be used as a constant expression for static_assert() (which requires a constant expression[1][2]) when used on constant values. This must be constructed manually, since __builtin_add_overflow() does not produce a constant expression[3]. Additionally adds castable_to_type(), similar to __same_type(), but for checking if a constant value would overflow if cast to a given type. Add unit tests for overflows_type(), __same_type(), and castable_to_type() to the existing KUnit "overflow" test: [16:03:33] ================== overflow (21 subtests) ================== ... [16:03:33] [PASSED] overflows_type_test [16:03:33] [PASSED] same_type_test [16:03:33] [PASSED] castable_to_type_test [16:03:33] ==================== [PASSED] overflow ===================== [16:03:33] ============================================================ [16:03:33] Testing complete. Ran 21 tests: passed: 21 [16:03:33] Elapsed time: 24.022s total, 0.002s configuring, 22.598s building, 0.767s running [1] https://en.cppreference.com/w/c/language/_Static_assert [2] C11 standard (ISO/IEC 9899:2011): 6.7.10 Static assertions [3] https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html 6.56 Built-in Functions to Perform Arithmetic with Overflow Checking Built-in Function: bool __builtin_add_overflow (type1 a, type2 b, Cc: Luc Van Oostenryck <[email protected]> Cc: Nathan Chancellor <[email protected]> Cc: Nick Desaulniers <[email protected]> Cc: Tom Rix <[email protected]> Cc: Daniel Latypov <[email protected]> Cc: Vitor Massaru Iha <[email protected]> Cc: "Gustavo A. R. Silva" <[email protected]> Cc: Jani Nikula <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: [email protected] Cc: [email protected] Co-developed-by: Gwan-gyeong Mun <[email protected]> Signed-off-by: Gwan-gyeong Mun <[email protected]> Signed-off-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-11-02Merge tag 'for-netdev' of ↵Jakub Kicinski7-25/+55
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2022-11-02 We've added 70 non-merge commits during the last 14 day(s) which contain a total of 96 files changed, 3203 insertions(+), 640 deletions(-). The main changes are: 1) Make cgroup local storage available to non-cgroup attached BPF programs such as tc BPF ones, from Yonghong Song. 2) Avoid unnecessary deadlock detection and failures wrt BPF task storage helpers, from Martin KaFai Lau. 3) Add LLVM disassembler as default library for dumping JITed code in bpftool, from Quentin Monnet. 4) Various kprobe_multi_link fixes related to kernel modules, from Jiri Olsa. 5) Optimize x86-64 JIT with emitting BMI2-based shift instructions, from Jie Meng. 6) Improve BPF verifier's memory type compatibility for map key/value arguments, from Dave Marchevsky. 7) Only create mmap-able data section maps in libbpf when data is exposed via skeletons, from Andrii Nakryiko. 8) Add an autoattach option for bpftool to load all object assets, from Wang Yufen. 9) Various memory handling fixes for libbpf and BPF selftests, from Xu Kuohai. 10) Initial support for BPF selftest's vmtest.sh on arm64, from Manu Bretelle. 11) Improve libbpf's BTF handling to dedup identical structs, from Alan Maguire. 12) Add BPF CI and denylist documentation for BPF selftests, from Daniel Müller. 13) Check BPF cpumap max_entries before doing allocation work, from Florian Lehner. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (70 commits) samples/bpf: Fix typo in README bpf: Remove the obsolte u64_stats_fetch_*_irq() users. bpf: check max_entries before allocating memory bpf: Fix a typo in comment for DFS algorithm bpftool: Fix spelling mistake "disasembler" -> "disassembler" selftests/bpf: Fix bpftool synctypes checking failure selftests/bpf: Panic on hard/soft lockup docs/bpf: Add documentation for new cgroup local storage selftests/bpf: Add test cgrp_local_storage to DENYLIST.s390x selftests/bpf: Add selftests for new cgroup local storage selftests/bpf: Fix test test_libbpf_str/bpf_map_type_str bpftool: Support new cgroup local storage libbpf: Support new cgroup local storage bpf: Implement cgroup storage available to non-cgroup-attached bpf progs bpf: Refactor some inode/task/sk storage functions for reuse bpf: Make struct cgroup btf id global selftests/bpf: Tracing prog can still do lookup under busy lock selftests/bpf: Ensure no task storage failure for bpf_lsm.s prog due to deadlock detection bpf: Add new bpf_task_storage_delete proto with no deadlock detection bpf: bpf_task_storage_delete_recur does lookup first before the deadlock check ... ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jakub Kicinski <[email protected]>
2022-11-02blk-mq: add tagset quiesce interfaceChao Leng2-0/+5
Drivers that have shared tagsets may need to quiesce potentially a lot of request queues that all share a single tagset (e.g. nvme). Add an interface to quiesce all the queues on a given tagset. This interface is useful because it can speedup the quiesce by doing it in parallel. Because some queues should not need to be quiesced (e.g. the nvme connect_q) when quiescing the tagset, introduce a QUEUE_FLAG_SKIP_TAGSET_QUIESCE flag to allow this new interface to ski quiescing a particular queue. Signed-off-by: Chao Leng <[email protected]> [hch: simplify for the per-tag_set srcu_struct] Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Chao Leng <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-02blk-mq: pass a tagset to blk_mq_wait_quiesce_doneChristoph Hellwig1-1/+1
Nothing in blk_mq_wait_quiesce_done needs the request_queue now, so just pass the tagset, and move the non-mq check into the only caller that needs it. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Chao Leng <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Jens Axboe <[email protected]>
2022-11-02blk-mq: move the srcu_struct used for quiescing to the tagsetChristoph Hellwig2-9/+4
All I/O submissions have fairly similar latencies, and a tagset-wide quiesce is a fairly common operation. Signed-off-by: Christoph Hellwig <[email protected]> Reviewed-by: Keith Busch <[email protected]> Reviewed-by: Ming Lei <[email protected]> Reviewed-by: Chao Leng <[email protected]> Reviewed-by: Sagi Grimberg <[email protected]> Reviewed-by: Hannes Reinecke <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Link: https://lore.kernel.org/r/[email protected] [axboe: fix whitespace] Signed-off-by: Jens Axboe <[email protected]>
2022-11-02memory: omap-gpmc: wait pin additionsBenedikt Niedermayr1-0/+8
This patch introduces support for setting the wait-pin polarity as well as using the same wait-pin for different CS regions. The waitpin polarity can be configured via the WAITPIN<X>POLARITY bits in the GPMC_CONFIG register. This is currently not supported by the driver. This patch adds support for setting the required register bits with the "ti,wait-pin-polarity" dt-property. The wait-pin can also be shared between different CS regions for special usecases. Therefore GPMC must keep track of wait-pin allocations, so it knows that either GPMC itself or another driver has the ownership. Signed-off-by: Benedikt Niedermayr <[email protected]> Link: https://lore.kernel.org/r/[email protected] Reviewed-by: Roger Quadros <[email protected]> Signed-off-by: Krzysztof Kozlowski <[email protected]>
2022-11-02net: wwan: iosm: add rpc interface for xmm modemsShane Parslow1-0/+2
Add a new iosm wwan port that connects to the modem rpc interface. This interface provides a configuration channel, and in the case of the 7360, is the only way to configure the modem (as it does not support mbim). The new interface is compatible with existing software, such as open_xdatachannel.py from the xmm7360-pci project [1]. [1] https://github.com/xmm7360/xmm7360-pci Signed-off-by: Shane Parslow <[email protected]> Reviewed-by: Loic Poulain <[email protected]> Signed-off-by: David S. Miller <[email protected]>
2022-11-02device property: Introduce fwnode_device_is_compatible() helperAndy Shevchenko1-1/+9
The fwnode_device_is_compatible() helper searches for the given string in the "compatible" string array property and, if found, returns true. Signed-off-by: Andy Shevchenko <[email protected]> Reviewed-by: Sakari Ailus <[email protected]>
2022-11-02cpufreq: Generalize of_perf_domain_get_sharing_cpumask phandle formatHector Martin1-12/+16
of_perf_domain_get_sharing_cpumask currently assumes a 1-argument phandle format, and directly returns the argument. Generalize this to return the full of_phandle_args, so it can be used by drivers which use other phandle styles (e.g. separate nodes). This also requires changing the CPU sharing match to compare the full args structure. Also, make sure to of_node_put(args.np) (the original code was leaking a reference). Signed-off-by: Hector Martin <[email protected]> Signed-off-by: Viresh Kumar <[email protected]>
2022-11-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-7/+17
Pull kvm fixes from Paolo Bonzini: "x86: - fix lock initialization race in gfn-to-pfn cache (+selftests) - fix two refcounting errors - emulator fixes - mask off reserved bits in CPUID - fix bug with disabling SGX RISC-V: - update MAINTAINERS" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: x86/xen: Fix eventfd error handling in kvm_xen_eventfd_assign() KVM: x86: smm: number of GPRs in the SMRAM image depends on the image format KVM: x86: emulator: update the emulation mode after CR0 write KVM: x86: emulator: update the emulation mode after rsm KVM: x86: emulator: introduce emulator_recalc_and_set_mode KVM: x86: emulator: em_sysexit should update ctxt->mode KVM: selftests: Mark "guest_saw_irq" as volatile in xen_shinfo_test KVM: selftests: Add tests in xen_shinfo_test to detect lock races KVM: Reject attempts to consume or refresh inactive gfn_to_pfn_cache KVM: Initialize gfn_to_pfn_cache locks in dedicated helper KVM: VMX: fully disable SGX if SECONDARY_EXEC_ENCLS_EXITING unavailable KVM: x86: Exempt pending triple fault from event injection sanity check MAINTAINERS: git://github -> https://github.com for kvm-riscv KVM: debugfs: Return retval of simple_attr_open() if it fails KVM: x86: Reduce refcount if single_open() fails in kvm_mmu_rmaps_stat_open() KVM: x86: Mask off reserved bits in CPUID.8000001FH KVM: x86: Mask off reserved bits in CPUID.8000001AH KVM: x86: Mask off reserved bits in CPUID.80000008H KVM: x86: Mask off reserved bits in CPUID.80000006H KVM: x86: Mask off reserved bits in CPUID.80000001H
2022-11-01spi: introduce new helpers with using modern namingYang Yingliang1-2/+45
For using modern names host/target to instead of all the legacy names, I think it takes 3 steps: - step1: introduce new helpers with modern naming. - step2: switch to use these new helpers in all drivers. - step3: remove all legacy helpers and update all legacy names. This patch is for step1, it introduces new helpers with host/target naming for drivers using. Signed-off-by: Yang Yingliang <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Mark Brown <[email protected]>
2022-11-01iommu: Add return value rules to attach_dev op and APIsNicolin Chen1-0/+12
Cases like VFIO wish to attach a device to an existing domain that was not allocated specifically from the device. This raises a condition where the IOMMU driver can fail the domain attach because the domain and device are incompatible with each other. This is a soft failure that can be resolved by using a different domain. Provide a dedicated errno EINVAL from the IOMMU driver during attach that the reason why the attach failed is because of domain incompatibility. VFIO can use this to know that the attach is a soft failure and it should continue searching. Otherwise, the attach will be a hard failure and VFIO will return the code to userspace. Update kdocs to add rules of return value to the attach_dev op and APIs. Link: https://lore.kernel.org/r/bd56d93c18621104a0fa1b0de31e9b760b81b769.1666042872.git.nicolinc@nvidia.com Suggested-by: Jason Gunthorpe <[email protected]> Reviewed-by: Kevin Tian <[email protected]> Reviewed-by: Jason Gunthorpe <[email protected]> Signed-off-by: Nicolin Chen <[email protected]> Reviewed-by: Lu Baolu <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
2022-11-01fortify: Do not cast to "unsigned char"Kees Cook1-1/+1
Do not cast to "unsigned char", as this needlessly creates type problems when attempting builds without -Wno-pointer-sign[1]. The intent of the cast is to drop possible "const" types. [1] https://lore.kernel.org/lkml/CAHk-=wgz3Uba8w7kdXhsqR1qvfemYL+OFQdefJnkeqXG8qZ_pA@mail.gmail.com/ Suggested-by: Linus Torvalds <[email protected]> Fixes: 3009f891bb9f ("fortify: Allow strlen() and strnlen() to pass compile-time known lengths") Cc: [email protected] Signed-off-by: Kees Cook <[email protected]>
2022-11-01fortify: Short-circuit known-safe calls to strscpy()Kees Cook1-0/+10
Replacing compile-time safe calls of strcpy()-related functions with strscpy() was always calling the full strscpy() logic when a builtin would be better. For example: char buf[16]; strcpy(buf, "yes"); would reduce to __builtin_memcpy(buf, "yes", 4), but not if it was: strscpy(buf, yes, sizeof(buf)); Fix this by checking if all sizes are known at compile-time. Cc: [email protected] Tested-by: Nathan Chancellor <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2022-11-01string: Add __realloc_size hint to kmemdup()Kees Cook2-2/+3
Add __realloc_size() hint to kmemdup() so the compiler can reason about the length of the returned buffer. (These must not use __alloc_size, since those include __malloc which says the contents aren't defined[1]). [1] https://lore.kernel.org/linux-hardening/[email protected]/ Cc: Rasmus Villemoes <[email protected]> Cc: Guenter Roeck <[email protected]> Cc: Andy Shevchenko <[email protected]> Cc: Paolo Abeni <[email protected]> Cc: Geert Uytterhoeven <[email protected]> Signed-off-by: Kees Cook <[email protected]>
2022-11-01x86/ibt: Implement FineIBTPeter Zijlstra1-1/+1
Implement an alternative CFI scheme that merges both the fine-grained nature of kCFI but also takes full advantage of the coarse grained hardware CFI as provided by IBT. To contrast: kCFI is a pure software CFI scheme and relies on being able to read text -- specifically the instruction *before* the target symbol, and does the hash validation *before* doing the call (otherwise control flow is compromised already). FineIBT is a software and hardware hybrid scheme; by ensuring every branch target starts with a hash validation it is possible to place the hash validation after the branch. This has several advantages: o the (hash) load is avoided; no memop; no RX requirement. o IBT WAIT-FOR-ENDBR state is a speculation stop; by placing the hash validation in the immediate instruction after the branch target there is a minimal speculation window and the whole is a viable defence against SpectreBHB. o Kees feels obliged to mention it is slightly more vulnerable when the attacker can write code. Obviously this patch relies on kCFI, but additionally it also relies on the padding from the call-depth-tracking patches. It uses this padding to place the hash-validation while the call-sites are re-written to modify the indirect target to be 16 bytes in front of the original target, thus hitting this new preamble. Notably, there is no hardware that needs call-depth-tracking (Skylake) and supports IBT (Tigerlake and onwards). Suggested-by: Joao Moreira (Intel) <[email protected]> Signed-off-by: Peter Zijlstra (Intel) <[email protected]> Reviewed-by: Kees Cook <[email protected]> Link: https://lore.kernel.org/r/[email protected]
2022-10-31rtnetlink: pass netlink message header and portid to rtnl_configure_link()Hangbin Liu2-6/+5
This patch pass netlink message header and portid to rtnl_configure_link() All the functions in this call chain need to add the parameters so we can use them in the last call rtnl_notify(), and notify the userspace about the new link info if NLM_F_ECHO flag is set. - rtnl_configure_link() - __dev_notify_flags() - rtmsg_ifinfo() - rtmsg_ifinfo_event() - rtmsg_ifinfo_build_skb() - rtmsg_ifinfo_send() - rtnl_notify() Also move __dev_notify_flags() declaration to net/core/dev.h, as Jakub suggested. Signed-off-by: Hangbin Liu <[email protected]> Reviewed-by: Guillaume Nault <[email protected]> Signed-off-by: Jakub Kicinski <[email protected]>
2022-10-31cgroup: cgroup refcnt functions should be exported when CONFIG_DEBUG_CGROUP_REFTejun Heo2-0/+7
6ab428604f72 ("cgroup: Implement DEBUG_CGROUP_REF") added a config option which forces cgroup refcnt functions to be not inlined so that they can be kprobed for debugging. However, it forgot export them when the config is enabled breaking modules which make use of css reference counting. Fix it by adding CGROUP_REF_EXPORT() macro to cgroup_refcnt.h which is defined to EXPORT_SYMBOL_GPL when CONFIG_DEBUG_CGROUP_REF is set. Signed-off-by: Tejun Heo <[email protected]> Fixes: 6ab428604f72 ("cgroup: Implement DEBUG_CGROUP_REF")
2022-10-31fs: introduce dedicated idmap type for mountsChristian Brauner3-11/+16
Last cycle we've already made the interaction with idmapped mounts more robust and type safe by introducing the vfs{g,u}id_t type. This cycle we concluded the conversion and removed the legacy helpers. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate filesystem and mount namespaces and what different roles they have to play. Especially for filesystem developers without much experience in this area this is an easy source for bugs. Instead of passing the plain namespace we introduce a dedicated type struct mnt_idmap and replace the pointer with a pointer to a struct mnt_idmap. There are no semantic or size changes for the mount struct caused by this. We then start converting all places aware of idmapped mounts to rely on struct mnt_idmap. Once the conversion is done all helpers down to the really low-level make_vfs{g,u}id() and from_vfs{g,u}id() will take a struct mnt_idmap argument instead of two namespace arguments. This way it becomes impossible to conflate the two, removing and thus eliminating the possibility of any bugs. Fwiw, I fixed some issues in that area a while ago in ntfs3 and ksmbd in the past. Afterwards, only low-level code can ultimately use the associated namespace for any permission checks. Even most of the vfs can be ultimately completely oblivious about this and filesystems will never interact with it directly in any form in the future. A struct mnt_idmap currently encompasses a simple refcount and a pointer to the relevant namespace the mount is idmapped to. If a mount isn't idmapped then it will point to a static nop_mnt_idmap. If it is an idmapped mount it will point to a new struct mnt_idmap. As usual there are no allocations or anything happening for non-idmapped mounts. Everthing is carefully written to be a nop for non-idmapped mounts as has always been the case. If an idmapped mount or mount tree is created a new struct mnt_idmap is allocated and a reference taken on the relevant namespace. For each mount in a mount tree that gets idmapped or a mount that inherits the idmap when it is cloned the reference count on the associated struct mnt_idmap is bumped. Just a reminder that we only allow a mount to change it's idmapping a single time and only if it hasn't already been attached to the filesystems and has no active writers. The actual changes are fairly straightforward. This will have huge benefits for maintenance and security in the long run even if it causes some churn. I'm aware that there's some cost for all of you. And I'll commit to doing this work and make this as painless as I can. Note that this also makes it possible to extend struct mount_idmap in the future. For example, it would be possible to place the namespace pointer in an anonymous union together with an idmapping struct. This would allow us to expose an api to userspace that would let it specify idmappings directly instead of having to go through the detour of setting up namespaces at all. This just adds the infrastructure and doesn't do any conversions. Reviewed-by: Seth Forshee (DigitalOcean) <[email protected]> Signed-off-by: Christian Brauner (Microsoft) <[email protected]>
2022-10-31block: simplify blksize_bits() implementationDawei Li1-6/+1
Convert current looping-based implementation into bit operation, which can bring improvement for: 1) bitops is more efficient for its arch-level optimization. 2) Given that blksize_bits() is inline, _if_ @size is compile-time constant, it's possible that order_base_2() _may_ make output compile-time evaluated, depending on code context and compiler behavior. Signed-off-by: Dawei Li <[email protected]> Reviewed-by: Bart Van Assche <[email protected]> Reviewed-by: Chaitanya Kulkarni <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Link: https://lore.kernel.org/r/TYCP286MB23238842958D7C083D6B67CECA349@TYCP286MB2323.JPNP286.PROD.OUTLOOK.COM Signed-off-by: Jens Axboe <[email protected]>